Assignment 1b

In this assignment you will implement from scratch using Pytorch a regualization method training a linear regression model and a logistic classifier. You are mandated to use the Pytorch namespace libraries such as pytorch.linalg, pytorch.rand and in general libraries in the torch.xyz namespace but not any derived or any other libraries. The idea is to implement from scratch the following without implementing every minute component such as random number generators etc.

Points:


Optimization algorithms for linear regression

Task 1: Stochastic Gradient Descent

In class we covered the baseline stochastic gradient descent and solved a regularized linear regression problem. For the same dataset we covered develop from scratch the baseline SGD algorithm using only torch.xyz namespaced libraries and show explicitly the equations related to calculating the gradients and updating the weights. You can type the equations in your notebook in latex format.

Clearly state the hyperparameters you used and present the loss vs epoch plot that demonstrates the convergence of the algorithm by plotting the final hypothesis.

Task 2: Momentum

In this exercise you start from the implementation of Task 1 (the regularized linear regression problem) and study the convergence speed by implement from scratch an alternative: the momentum algorithm.

Clearly state the hyperparameters you used and present the loss vs epoch plot that demonstrates the convergence of each algorithm and plotting the final hypothesis - you can include all compatible plots in the same figure.

Source: Optimization algorithms for linear regression

Logistic Regression

You are interviewing with Google’s data science team having the responsibility of predicting the Click Through Rate (CTR) of ads they place on multiple web properties. Your hiring manager keen on testing you out, suggests to download this dataset and asks you to code up a model that predicts the CTR based on Logistic Regression.

Task 1: Data Preprocessing

Preprocess the data you are given to your liking. This may include dropping some columns you wont use, addressing noisy or missing data etc.

Use Pandas as a dataframe abstraction for this task and you can easily convert dataframes to pytorch tensors for later processing You can learn about Pandas here:

https://www.youtube.com/watch?v=PcvsOaixUh8

Task 2: Logistic Regression

Implement the logistic regression solution to the prediction problem that can work with Stochastic Gradient Descent (SGD).

Show clearly all equations of the gradient and include comments in markdown explaining every stage of processing. Also, highlight any enhancements you may have done to improve performance.

Plot the final precision vs recall curve of your classifier. Clearly explain the tradeoff between the two quantities and the shape of the curve.

Source: Logistic Regression
Back to top