Projections, Linear Regression and SGD

For the exercises below you can use the numpy and scipy libraries.

Problem 1: Simulation (20 points)

Review any of the probability theory links provided in your course site. The exercise refers to Example 6.6 of the Math for ML book.

Problem 1A (15 points)

Simulate (sample from) the bivariate normal distribution with the shown parameters obtaining a plot similar to Figure 6.8b that shows the simulation result from a different bivariate Gaussian distribution. You can generate \(m=200\) samples/points (10 points)

Problem 1B (5 points)

Plot the contours of the bivariate Gaussian distribution and the simulated points in the same plot. (5 points)

# Insert your answer here and fee free to add markdown cells as needed

Problem 2: Projection (20 points)

You may want to review these linear algebra videos or the other linear algebra links provided in your course site.

Simulate a 3-dimensional (3d) Gaussian random vector with the following covariance matrix by sampling m = 1000 3D vectors from this distribution.

\[ \begin{bmatrix} 4 & 2 & 1 \\ 2 & 3 & 1.5 \\ 1 & 1.5 & 2 \\ \end{bmatrix} \]

Using the Singular Value Decomposition (SVD) of the covariance matrix compute the projection of the m simulated vectors onto the subspace spanned by the first two principal components (or left singular vectors of the covariance matrix).

Problem 2A (5 points)

What determines the principal components ? Show the vectors which denote the first 2 principal components.

Problem 2B (5 points)

Plot the projected vectors in the subspace of first 2 principal components.

Problem 2C (10 points)

Reverse the projection to map back to the original 3D space and create a scatter plot to show the reconstructed points. Do the reconstructed points have identical/similar but not identical/different correlations in respective components as the original matrix?

# Insert your answer here and fee free to add markdown cells as needed

Problem 3: Stochastic Gradient Descent (30 points)

In class we covered the baseline stochastic gradient descent. Using the linear regression example from the class notes, develop from scratch the baseline SGD algorithm. :

Clearly state the hyper-parameters you used and present the loss vs epoch plot that demonstrates the convergence of the algorithm.

# Insert your answer here and fee free to add markdown cells as needed

Problem 4: SGD Enhancements (30 points)

In this exercise you will implement some enhancements to the implementation of Problem 3 (the linear regression problem) that can improve the convergence speed of the algorithm. Implement from scratch the following enhancements and compare the convergence speed of each algorithm to the baseline SGD algorithm

Momentum (15 points)
Adam (15 points)

Clearly state the hyperparameters you used and present the loss vs epoch plot that demonstrates the convergence of each algorithm and compared to the baseline SGD algorithm. You can include all plots in the same figure.

# Insert your answer here and fee free to add markdown cells as needed