In this assignment you will be working on setting up your system and refreshing basic probability theory or basic linear algebra concepts. You are mandated to use the Pytorch namespace libraries such as pytorch.linalg, pytorch.rand and in general libraries in the pytorch.xyz namespace but not any derived or any other libraries (but you can use plotting libraries such as plotly or matplotlib). The idea is to implement from scratch the following without implementing every minute component such as random number generators etc.
Points:
Dev Environnment : 20 points
Information Theory: 20 points
MLE : 30 points
Linear Regression : 30 points
General Instructions
We need to see a clear explanation of all the stages of developing the solutions which means that you need to explain the code in a way that the writeup is understood by others such as in a tutorial. Obviously this does not apply to mechanical tasks scuh as setting up your development environment.
Type inline with your notebook code your tutorial explanations. Equations can be typed in markdown using Latex syntax notation. If you prefer plain Python you can also include markdown as a separate file but you do need to ensure that all plots are inline to that markdown document and are parsed correctly by Github.
Submit in the learning management system of your school (Canvas, Brightspace etc.) following the instructions in your course site (under resources). You submit a private Github repo URL with a README therein that points to the notebook of each assignment. Each notebook must have its output saved.
Students that havent changed their nickname in the Discord Back2classroom server to “Firstname Lastname” will get a 10 point penalty. To grade your assignment include in the README of your repo a screenshot of your Discord profile page.
Development Environment Setup
Ubuntu and MAC users
Install docker in your system and the VSCode docker and remote extensions.
Ensure that you also follow this tutorial to setup VSCode properly aka the VSCode can access the WSL2 filesystem and work with the remote docker containers.
If you have an NVIDIA GPU in your system, ensure you have enabled it.
Clone the repo (For windows users ensure that you clone it on the WSL2 filesystem.) Show this by a screenshot below of the terminal where you have cloned the repo.
Build and launch the docker container inside your desired IDE (if you havent used an IDE before you can start with VSCode).
Launch the virtual environment with rye sync inside the container and then show a screenshot of your IDE and the terminal with the (your virtual env) prefix.
Select the kernel of your virtual environment (.venv folder) and execute the following code. Save the output of all cells of this notebook before submitting.
Information theory was introduced by Claude Shannon in 1948. It is a mathematical theory that deals with the transmission, processing, utilization, and extraction of information. It has given rise to a wide range of applications, including data compression, cryptography, error correction and fueled other industries such as AI, cellular communications and others.
Using this reference, that you need to study before answering the following question, let (x, y) have the following joint distribution:
The exponential distribution is a probability distribution that describes time between events in a Poisson process. There is a strong relationship between the Poisson distribution and the Exponential distribution.
Let’s say that you try to model the number of api calls (arrivals) per second towards an LLM inference server. Arrivals per second has a Poisson distribution with arrival rate 100, which means that 100 api calls are made per second. “The expected mean inter-arrival time is 0.05 seconds, because an api can be expected every 0.05 seconds. The inter-arrival process is modeled by the exponential distribution. The units for the Poisson process are api calls and the units for the exponential are seconds.
Videos that may help you understand these distributions:
Task 1
Simulate the interarrival times using an exponential distribution with the rate parameter 𝜆=100
Task 2
Use the stochastic gradient descent (SGD) algorithm to minimize the negative log-likelihood of the exponential distribution. Output: (a) the estimated parameter after a number of iterations of your choice and (b) plot the value of objective function over the iteration index.
In class we covered the baseline stochastic gradient descent. Using the linear regression dataset from the class website, develop from scratch the baseline SGD algorithm that can estimate the L2-norm regularized model.
Clearly state the hyper-parameters you used and present the loss vs epoch plot that demonstrates the convergence of the algorithm and the final values of the parameters \(\mathbf w\) of the model.
You can generate the dataset with any number of examples \(m\) you need to demonstrate that the algorithm works.