Optimization algorithms for linear regression

Task 1: Stochastic Gradient Descent

In class we covered the baseline stochastic gradient descent and solved a regularized linear regression problem. For the same dataset we covered develop from scratch the baseline SGD algorithm using only torch.xyz namespaced libraries and show explicitly the equations related to calculating the gradients and updating the weights. You can type the equations in your notebook in latex format.

Clearly state the hyperparameters you used and present the loss vs epoch plot that demonstrates the convergence of the algorithm by plotting the final hypothesis.

Task 2: Momentum

In this exercise you start from the implementation of Task 1 (the regularized linear regression problem) and study the convergence speed by implement from scratch an alternative: the momentum algorithm.

Clearly state the hyperparameters you used and present the loss vs epoch plot that demonstrates the convergence of each algorithm and plotting the final hypothesis - you can include all compatible plots in the same figure.