In this assignment you will be working on extending the ML estimation to probability distributions that can be modeled by Gaussian Mixtures.
You are mandated to use numpy and the Pytorch namespace libraries such as pytorch.linalg, pytorch.rand and in general libraries in the pytorch.xyz namespace. The idea is to implement from scratch the following without implementing every minute component such as random number generators, plotting etc.
If you are familiar with Keras and not Pytorch the same rule applies.
Points:
Development environment (10 points)
Code is commented throughout either inline or via markdown cells. (10 points)
Dataset: 20 points
Derivation: 20 points
SGD: 40 points
Development Environment Setup
Ubuntu and MAC users
Install docker in your system and the VSCode docker and remote extensions.
Ensure that you also follow this tutorial to setup VSCode properly aka the VSCode can access the WSL2 filesystem and work with the remote docker containers.
If you have an NVIDIA GPU in your system, ensure you have enabled it.
Clone the repo (For windows users ensure that you clone it on the WSL2 filesystem.) Show this by a screenshot below of the terminal where you have cloned the repo.
Build and launch the docker container inside your desired IDE (if you havent used an IDE before you can start with VSCode).
Launch the virtual environment with make start inside the container and then show a screenshot of your IDE and the terminal with the (your virtual env) prefix.
Select the kernel of your virtual environment (.venv folder) and execute the following code. Save the output of all cells of this notebook before submitting.
Develop a toy dataset of \(m=1000\) sample points for Mixture of Gaussians (MoG):
Feature dimensions: 2
Number of Gaussian components: 3
Means: random.
Covariance matrices: diagonal.
Create visualizations for dataset.
Gradient Formulas
We consider a 2-component Mixture of Gaussians (MoG) with 1-dimensional data. Show that the gradients you need for solving the estimation problem are as follows. Use Latex math in markdown format for this task.
Implement Stochastic Gradient Descent (SGD) from scratch with the Negative Log-Likelihood (NLL) objective and analytic derivatives to optimize the parameters.
Notes:
Initialize covariance matrices as diagonal.
For a mini-batch \(B\), provide expressions for the gradients below.
Re-parameterize variance as \(\log \sigma\) to keep the model stable and avoid invalid variances while applying SGD.
Provide and use the following in your write-up/code:
Responsibility function
The responsibility function for component \(k\) and data point \(i\):