Getting to know the Torchvision library

Numerical computations must be very efficient for any real time system such as computer vision. Pytorch is a popular library for deep learning and it provides a powerful tensor library that can be used for numerical computations and the aim here is to learn and demonstrate the basic operations of the Pytorch tensors library.

Learn the basics

Use the this notebook to learn the basics of Pytorch tensors following along with the video. You can also use this excellent book chapter

Torchvision

Torchvision is a package that provides popular datasets, model architectures, and common image transformations for computer vision. It is a part of the PyTorch project and is widely used in the deep learning community for tasks such as image classification, object detection, and segmentation. Here we will touch upon the basics of Torchvision.

Transforming image tensors

Process the video. Note that you can sample the video to generate 1000 images total. You can use yt-dlp or other tools for downloading the video.

Create a custom Torchvision dataset that loads the images and applies transformations.

Understand the transformations and why we need to apply them using this image as an example and the Transforms V2 page.

Apply the transformation of resizing the image to 224x224 and normalizing the image using the mean and standard deviation of its images.

Visualizing the dataset

Using the Fiftyone library documentation, load the dataset in its internal MongoDB as a fiftyone dataset and visualize the Torchvision dataset using the fiftyone library. Note that the fiftyone app UI can be launched in a notebook cell.