Visual Genome and Inpainting

Introduction

In this project you are tasked to understand Diffusion Models (DMs) and how the can be used to provide an instructive way to remove objects from a scene.

The Image inpainting task is the erasure of unwanted pixels from images and filling them in a semantically consistent and realistic way. Traditionally, the pixels that are wished to be erased are defined with binary masks. From the application point of view, a user needs to generate the masks for the objects they would like to remove which can be time-consuming and prone to errors.

Your project must be in a separate repo and you will be required to submit a link to the repo in the final report.

Milestone 1 - Familiarization with GQA (10 points)

Download all the data that is available in GQA paper from here. The GQA dataset consists of 85K real-world images with their corresponding scene graphs. Scene graphs provide simplified representations of images by representing them in terms of objects, attributes, and relations. The size is considerable we suggest you download them into your Gdrive using the wget Colab gdown package.

You need also to download the Inst-inpaint dataset - this is also of considable size and you can

https://theaisummer.com/diffusion-models/

In this project you are tasked to create a visual genome from video scenes and segment the scenes based on the contexts.

The graph becomes a context and the graph embeddings can be used to finding scenes that possess the same context as the query scene.

https://visualgenome.org/

https://arxiv.org/pdf/2006.09199.pdf