In this assignment you will be working on demonstrating via Monte-Carlo simulation the curse of dimensionality and perform a logistic regression classification task with SGD.
You are mandated to use numpy and the Pytorch namespace libraries such as pytorch.linalg, pytorch.rand and in general libraries in the pytorch.xyz namespace. The idea is to implement from scratch the following without implementing every minute component such as random number generators, plotting etc.
If you are familiar with Keras and not Pytorch the same rule applies.
Points:
Dev Environnment : 10 points
Simulation : 25 points
Logistic Regression : 50 points
Code is commented throughout either inline or via markdown cells. (15 points)
Development Environment Setup
Ubuntu and MAC users
Install docker in your system and the VSCode docker and remote extensions.
Ensure that you also follow this tutorial to setup VSCode properly aka the VSCode can access the WSL2 filesystem and work with the remote docker containers.
If you have an NVIDIA GPU in your system, ensure you have enabled it.
Clone the repo (For windows users ensure that you clone it on the WSL2 filesystem.) Show this by a screenshot below of the terminal where you have cloned the repo.
Build and launch the docker container inside your desired IDE (if you havent used an IDE before you can start with VSCode).
Launch the virtual environment with make start inside the container and then show a screenshot of your IDE and the terminal with the (your virtual env) prefix.
Select the kernel of your virtual environment (.venv folder) and execute the following code. Save the output of all cells of this notebook before submitting.
You are interviewing with Google’s ad team and one of their tasks is predicting the Click Through Rate (CTR) of ads they place on web or mobile properties. Your hiring manager keen on testing you out, suggests to download this dataset and asks you to code up a model that predicts the CTR based on Logistic Regression.
Data Preprocessing
Preprocess the data you are given to your liking. This may include dropping some columns you wont use, addressing noisy or missing data etc.
Use Pandas as a dataframe abstraction for this task and you can easily convert dataframes to pytorch tensors for later processing You can learn about Pandas here:
SGD
Implement the logistic regression solution to the prediction problem that can work with Stochastic Gradient Descent (SGD).
Show clearly all equations of the gradient and include comments in markdown explaining every stage of processing. Also, highlight any enhancements you may have done to improve performance.
Plot the final precision vs recall curve of your classifier. Clearly explain the tradeoff between the two quantities and the shape of the curve.