CNN Featurizers and Similarity Search

Introduction

Typically when we train a Convolutional Neural Network as an end-to-end image classifier, we input an image to the network, which gets propagated through the network (feed forward).

We then obtain our posterior probabilities at the end of the network.

However, there is no “rule” that says we must allow the image to forward propagate through the entire network that includes the head. Instead, we can:

Stop propagation at a layer before the head of the network (such as an activation or pooling layer).
Extract the logits at this layer.
Treat the values as a feature vector.

Now these feature vectors can be utilized in other downstream tasks like classification. Our aim is to create a system where an input query image will be responded by a number of images that have strong resemblance to the query image. This particular task is called similarity search. A naive way to perform this task, would be to compare images based on pixel values, patches, or some other high level feature taken from the image itself. You are askd to use the ResNet-50 architecture to produce features that can represent a concept aka a face with specific characteristics.

Part 1 Loading of Dataset in Colab (10 points)

Create a jupyter notebook (eg on Google Colab) and download the LFW dataset, from here.

You can manually download the dataset using the above link and then upload to colab or altelnatively you can issue in colab the commands shown below

!wget http://vis-www.cs.umass.edu/lfw/lfw.tgz
!tar -xvf /content/lfw.tgz

Part 2 Using CNN for Feature Extraction (30 points)

Use ResNet50 to extract features vectors from raw images. You can use TF or Pytorch APIs to:

Obtain a ResNet-50 model pre-trained on a dataset such as ImageNet.
Perform necessary preprocessing on the images before feeding them into the network.
Extract the features from the penultimate layer of the network (before the fully connected layer - the classification head).
Store the features in a dictionary, where the key is the name of the image and the value is the feature vector.

Part 3 Retrieving most similar images (30 points)

Use a nearest neighbor algorithm such as this to obtain the 10 most similar images to each query image.

Part 4 - Streamlit App (30 points)

See this example app and develop an app that will showcase the similarity search system you have developed. The app should offer two possibiities: select an image from the test dataset and upload an image. It should then display the 10 most similar images to the query image. The app must be deployed to a Streamlit space in Hugging Face.