Probability Assignment#
To get full credit in this assignment you need to use only numpy
or numba
libraries and include adequate explanation of the code in either markdown cells or code comments. Sometimes you need to type equations - type equations in latex math notation.
Question 1 (30 points)#
We monitor server faults in a data center over a period of T units of time and represents each occurring fault as a point on the line. What is the probability that k faults take place between \(t_1\) and \(t_2\) if \(n\) total points were recorded ? (10 points)
We collected a very large set of faults \(n \rightarrow \infty\) over a long measurement interval \(T \rightarrow \infty\) and we observed that on average faults occur with a rate of \(\lambda = \frac{n}{T}\). This allows us to model the probability of \(k_a\) points in an interval \(t_a\) as Poisson.
Suppose that we measure \(k_a\) and \(k_b\) faults in two consecutive intervals of durations \(t_a=(t_1, t_2)\) and \(t_b=(t_2, t_3)\) respectively where \(t_a+t_b < T\), write the expression of the joint probability \(p(k_a \in t_a, k_b \in t_b)\). (10 points)Suppose now that we need to schedule personnel to replace these servers and we are interested to use the fault data to estimate the probability of \(p(k_a \in t_a | k_c \in t_c)\) where \(t_c=t_a + t_b\). Write the expression of this conditional probability. (10 points)
Write your answer here and use Latex math notation for math. If you prefer write the math with pencil and scan the writeup in a png image that you can insert here using
![](your-png-file.png)
Question 2 (10 points)#
Generate N uniform distributed over the interval [0,1] random variables \(\{x_1, \dots x_N\}\).
Compute their mean and after repeating such computation \(m\) times, plot the histogram as \(N\) takes values \({1, 5, 10, 20}\).
Provide a justification of the resultant histogram by reading about the Cenral Limit Theorem.
# Type the Python code here
Question 3 (10 points)#
The exercise refers to Example 6.6 of the Math for ML book.
Simulate and plot the bivariate normal distribution with the shown parameters.
You need to use the Cholesky factorization for the simulation.
Question 4 (25 points)#
The exercise refers to Example 6.6 of the Math for ML book.
Simulate for \(m=10, 100, 1000\) samples and plot the conditional distribution as given by the analytical expressions of the conditional mean and covariance matrix in Python. (5 points)
Use maximum likelihood estimation (MLE) with Stochastic Gradient Descent (SGD) to estimate the parameters of resultant distribution. (15 points)
Plot the estimates as a function of \(m\) - include the analytical mean and variance in the plots for comparison. (5 points)
You may use these derivatives for implementing the SGD-based estimator.
# Type the Python code here
Question 5 (25 points)#
You smartphone has an microphone array i.e. a number of sensors that are spatially separated in the circumference of the device. The array is used to do interference suppression and it does so by beaming to the direction of your voice suppressing background noises. To do so it needs to measure the spatial correlation matrix.
The data (sound in this case) are assumed to arrive sequentially one at a time (the co-called online learning setting). Introduce the index \(i\) to represent the ith arriving data sample \(\mathbf x_i\).
Write the expression of the sample correlation matrix (5 points)
Write the expression of the sample correlation matrix that can be estimated recursively (15 points).
Simulate \(m=50\) samples assuming a correlation matrix of your choice. Plot the elements of the estimated correlation elements (row, column) of the correlation matrix as they estimated recursively over time assuming \(n=2, 4 and 8\) sensors (mics). Comment if and if so how the estimated element variance is affected by the ratio \(n/m\). (5 points)
# Type the python code here