Reverse Visual Search #
In many domains we are interested in finding artifacts that are similar to a query artifact. In this project you are going to implement a system that can find artifacts when the queries are visual. The type of queries you will test your system on are images of faces (30 points) and videos of people (40 points). You will also document your approach clearly showing advantages over the baseline - the documentation is worth 30 points but to earn them you need to write in the markdown (*.md) or restructured text (*.rst) formats.
In the following we use the term person of interest (PoI) to indicate the person that we reverse search on images and videos. The PoI may be present together with others in the datasets but the system should be able to retrieve the images / videos that the PoI is in all conditions minus complete occlusion of the face. See bonus points for partial occlusion.
Images - LFW Dataset #
This dataset will be used for images.
Videos - YouTube Faces #
This dataset will be used for videos. We will post an alternative link to a Gdrive download folder in the class slack. For the description of each file please consult the authors' website.
Reverse Image Search - Baseline (10 points) #
You will use this reverse image search system as the baseline prototype. We suggest you start here to understand how the baseline system works for faces. AWS has launched recently Opensearch - please select ES when you run the baseline example.
Reverse Image Search - Siamese Networks (20 points) #
In this step you will need to
Mirror the baseline in terms of workflow. You can use elasticknn - see here how ES can be used inside colab. You can also use Weaviate or Milvus for similarity search. You will need to beat the baseline above which means that you first need to feed the baseline with LFW data to obtain the baseline performance for LFW.
To improve over the baseline you may want to look at networks such as FaceNet and others quoted here as well as try various similarity search algorithms.
Reverse Video Search (40 points) #
Here you will demonstrate that the system is able to produce the videos involving a person of interest from a query video that obviously contains the person of interest. The query video can be one of the videos in the database.
Note that you may want to work with a cleaner version of the dataset that uses an alignment library as demonstrated below:
Clearly explain your ideas on how best to obtain the embeddings for the query / database videos. Perhaps a straightforward way is to average the per frame embeddings but you can think of other methods. Your notebook must be able to return the videos of the PoI that exists in the database minus the query video.
Documentation (30 points) #
Remember to document everything - setup instructions written in such a way so that someone else can replicate your experimenents, methods and methodologies, how each method works eg. if you use FaceNet you need to explain Siamese Networks, experiments you made etc.