Optimization algorithms for linear regression#

Problem 1: Stochastic Gradient Descent (50 points)#

In class we covered the baseline stochastic gradient descent. Using the linear regression example from the class notes, develop from scratch the baseline SGD algorithm. Feel free nto generate more examples than shown in .

Clearly state the hyperparameters you used and present the loss vs epoch plot that demonstrates the convergence of the algorithm by plotting the final hypothesis.

Problem 2: SGD Enhancements (50 points)#

In this exercise you will implement some enhancements to the implementation of Problem 1 (the linear regression problem) that can improve the convergence speed of the algorithm. Implement from scratch the following enhancements and compare the convergence speed of each algorithm to the baseline SGD algorithm

  1. Momentum (15 points)

  2. Adam (15 points)

Clearly state the hyperparameters you used and present the loss vs epoch plot that demonstrates the convergence of each algorithm and plotting the final hypothesis. Include the result of Problem 1 to compare their performance to the baseline SGD algorithm -you can include all compatible plots in the same figure.

Note: You will find example implementations of the above algorithms in the https://d2l.ai/ book. You are free to use the code from the book as a reference but you should implement the algorithms from scratch.