Probability Assignment#
To get full credit in this assignment you need to use only numpy
or jax
libraries and include adequate explanation of the code in either markdown cells or code comments. Sometimes you need to type equations - type equations in latex math notation.
PS: Please note that we run through chatGPT the questions and you will be referred to the Dean if we find that a robot answered your questions. .
Question 1a (10 points)#
In a private subreddit people are posting their opinions on the CEO of the company you work for. Lets assume that the employees that are posting are random logging in to that subreddit and that each post indicates whether the employee approves or not the job that the CEO is doing. Let \(x_i\) be the binary random variable where \(x_i=1\) indicates approval. You can assume that \(x\) is distributed according to a Bernoulli distribution with parameter \(p=1/2\).
Your job is to sample \(n=50\) posts and estimate the approval rate of the CEO by considering the statistics of \(y=x_1+x_2+ \dots + x_n\). What is the probability that 25 employees approve the CEO?
Question 1b (10 points)#
Following your findings in Q1a, read about the Cenral Limit Theorem and recognize that
is normally distributed with mean 0 and variance 1.
Can you find the probability that 25 employees approve the CEO using the Gaussian approximation?
Type the answer here using the latex syntax or handwrite the answer, upload the picture in the same folder and use a new markdown cell with markdown syntax ![title](image_name.png)
Question 2 (20 points)#
A sequential experiment involves repeatedly drawing a ball from one of the two urns, noting the number on the ball and replacing the ball in the urn. Urn 0 contains a ball with the number 0 and two balls with the number 1. Urn 1 contains five balls with the number 0 and one ball with the number 1.
The urn from which the first ball is drawn is selected by flipping a fair coin. Urn 0 is used if the outcome is H and urn 1 is used if the outcome is T. The urn used in a subsequent draws corresponds to the number on the ball drawn in the previous draw.
What is the probability of a specific sequence of the numbers on drawn balls being 0011 ?
Type the answer here using the latex syntax or handwrite the answer, upload the picture in the same folder and use a new markdown cell with markdown syntax ![title](image_name.png)
Question 3 (20 points)#
Referring to Example 6.6 of the Math for ML book, simulate and plot the bivariate normal distribution with the shown parameters using the Cholesky factorization for the simulation.
# Type the Python code here and ensure you save the notebook with the results of the code execution.
Question 4 (20 points)#
Go through the provided links on Poisson and exponential distributions as the Math for ML
textbook in your course site is not covering enough these important distributions.
Watch this video https://www.youtube.com/watch?v=Asto3RS46ks where the author is explaining how to simulate a Poisson distribution from scratch.
Using the Kaggle API download this dataset and plot the histogram of the number of cyclists that cross the Brooklyn bridge per day.
Simulate the number of cyclists that cross the Brooklyn bridge per day using the Poisson distribution. Ensure that the simulated counts are similar distribution-wise to the observed counts.
# Type the Python code here and ensure you save the notebook with the results of the code execution.
Question 5 (20 points)#
You are asked to stress test an cloud API endpoint and are told that the API exposes a database server that can be abstracted as an M/M/1 queue. Go through this introductory page to just understand the queuing domain and the notation M/M/1. Go also through the elements of the MM1 queue here. Make sure you click on the links and learn about the random process called Poisson process.
Your task is to simulate the behavior of the queue and plot the number of requests that are waiting in the queue as a function of time. You are given three arrival rates of the API requests \(\lambda=[1, 3, 4]\) and the service time of the requests as an exponential random variable with rate \(\mu=4\).
# Type the Python code here and ensure you save the notebook with the results of the code execution.