Recommendation Systems

A recommendation system helps users find compelling content in a large corpora. For example, the Google Play Store provides millions of apps, while YouTube provides billions of videos. More apps and videos are added every day. How can users find new and compelling content? Yes, one can use search to access content. However, a recommendation engine can display items that users might not have thought to search for on their own.

In this assignment you will use the MovieLens dataset called ml-latest. You are free to experiment and develop your code with the ml-latest-small dataset but you need to submit your work with the ml-latest dataset which is much larger and will lead to better results. The ml-latest dataset consists of several files:

genome-scores.csv: a relevance score that a particular movie has a particular tag. Tags are things like “action”, “romance”, “violence”, etc. The relevance score ranges from 0 to 1. The higher the score, the more relevant the tag is to the movie.
genome-tags.csv: the tag name for each tag id.
links.csv: the id of the movie on MovieLens, the id of the movie on IMDB, and the id of the movie on TMDB.
movies.csv: the id of the movie on MovieLens, the title of the movie, and a list of genres associated with the movie.
ratings.csv: the id of the movie on MovieLens, the id of the user rating the movie, the rating, and the timestamp of the rating.
tags.csv: the id of the movie on MovieLens, the id of the user tagging the movie, the tag, and the timestamp of the tag.

Task 1: SVD for Recommendations (30 points)

Go over the recommendation system overview and write below what matrix factorization can offer and what challenges it faces. To do so, consult the Surprise library and choose the SVD API of surpise.prediction_algorithms. Describe all equations involved and why they turn out to be the way they are.

NOTE1: In this task you are expected to do some reading and consultation of APIs. Be as descriptive as possible so that a novice computer scientist can understand your answer. There is no minimum of max “page limit”. Pay particular attention to this reference

NOTE2: Although we have recommended the Surprise library, it is worth mentioning that Microsoft has a recommender system toolkit. Since recommendation systems are almost everywhere you look in commercial / consumer applications, you may want to spend the time after this assignment to learn more about the field.

Write your answer here or in a separate markdown file + images

Task 2: Implement the SVD based matrix factorization algorithm (40 points)

Implement the matrix factorization algorithm on the movielens data using the surprise API. You need to provide a clear explanation of the code you wrote and you need to report and explain the recommendation metric on a test dataset.

# Task 2 code starts here

Task 3: Perform hyperparameter tuning (30 points)

In this task, you will borrow candidate hyperparameter values from here but implement a simple random search to tune them. You can use Optuna for such exercise. You need to report the best hyperparameters and the corresponding recommendation metric on a test dataset.

# Task 3 code starts here