Maximum Likelihood Parameter Estimation

RUBRIC:

If you provide just the plots/code, you will be granted 50% of the points. To win the remaining 50% you will need clear documentation.
We need to see a clear explanation of all the stages of developing the solutions which means that you need to explain the code in a way that the writeup is understood by fellow students.
Type inline with your Colab notebook your tutorial explanations. Equations can be typed in markdown using Latex syntax notation. If you prefer plain Python you can also include markdown as a separate file but you do need to ensure that all plots are inline to that markdown document and are parsed correctly by Github.
Submit in the learning management system of your school (Canvas, Brightspace etc.) by first sharing the Colab or Github repo with the TA so that they can open it without any access restrictions. Any notebook that cant be accessed will receive the grade of 0.

Intro:

What is the exponential distribution?

The exponential distribution is a probability distribution that describes time between events in a Poisson process. There is a strong relationship between the Poisson distribution and the Exponential distribution. For example, let’s say a Poisson distribution models the number of births in a given time period. The time in between each birth can be modeled with an exponential distribution (Young & Young, 1998).

Poisson vs Exponential?

Let’s say a Youtube channel is interested in the number of views per hour. Arrivals per hour has a Poisson 120 arrival rate, which means that 120 viewers arrive per hour. This could also be said that “The expected mean inter-arrival time is 0.5 minutes”, because a viewer can be expected every 1/2 minute (30 seconds).

The exponential distribution models this process, so we could write: Poisson 120 = Exponential 0.5

The units for the Poisson process are viewers and the units for the exponential are minutes.

Videos that may help: 1. https://www.youtube.com/watch?v=p3T-_LMrvBc&ab_channel=StatQuestwithJoshStarmer 2. https://www.youtube.com/watch?v=2kg1O0j1J9c&ab_channel=zedstatistics

Part 1 (30 points):

A) Let X = amount of time (in minutes) a ice cream man gets a new customer at his cart. The time is known to have an exponential distribution with the average amount of time between a new customer being four minutes.

Plot the probablity density function of the exponential distribution of this ice cream man getting a customer every 4 minutes. (10 points)

You may use the Python libraries to calculate the exponential distribution and to plot. https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.expon.html

B) Now assume on a very hot day the ice cream man gets X customers and each new customer comes every 4 minutes. Generate X samples from the exponential distribution where X = 200 and the rate = 4. Plot the samples on a graph to show how they look graphically. Does it look similar to the graph above? (20 points)

Part 2 (30 points)

Assume that you are given the customer data generated in Part 1, implement a Gradient Descent algorithm from scratch that will estimate the Exponential distribution according to the Maximum Likelihood criterion.

Answer the following:

Plot the negative log likelihood of the exponential distribution. (10 points)
What is the lambda MLE of the generated data? (10 points)
Plot the estimated lambda vs iterations to showcase convergence towards the true lambda (10 points)

Read this article to obtain the likelihood and negative log likelihood function of the exponential distribution: https://www.statlect.com/fundamentals-of-statistics/exponential-distribution-maximum-likelihood

Part 3: (40 points)

Suppose we have a training set of $m$ independently distributed samples

$\{ (x_1,y_1), (x_2,y_2), (x_3,y_3), (x_3,y_3), (x_m,y_m)\}$

that is generated from a distribution $p_{data}(x,y)$

Assumming a Gaussian model

$p_{model}(y_i | x_i; w) = exp(-) $

Write the expression of the Negative Log Likelihood function $NLL$. (10 points)

Write the parameters $\mathbf w$ and the $\sigma^2$ that minimize the NLL (10 points)

Write a Python script that uses SGD to converge to $\mathbf w_{ML}$ and $\sigma_{ML}^2$ for the following dataset (20 points)

HINT: You may need to estimate the conditional mean first and then the variance of the Gaussian $p_{model}$

import numpy as np

x = np.array([8, 16, 22, 33, 50, 51])
y = np.array([5, 20, 14, 32, 42, 58])