Assignment 3
Points:
Task 1 is 50 points.
Task 2 is 50 points.
Sports Analytics
Multi-Object Tracking (MOT) is a core visual ability that humans poses to perform kinetic tasks and coordinate other tasks. The AI community has recognized the importance of MOT via a series of competitions.
The assignment will give you the opportunity to apply probabilistic reasoning to sports analytics - a sizable market for AI. In this assignment the object classes are person
and ball
and you will demonstrate the ability to reason over time using Kalman Filters.
Task 1: YOLO v11
You Only Look Once (YOLO)
YOLO11 is the latest iteration in the Ultralytics YOLO series of real-time object detectors with CNNs.
The goal of this assignment is to implement and analyze the YOLOv11 (You Only Look Once) object detection algorithm. Instead of building the model from scratch, you will use a state-of-the-art pre-trained model from the ultralytics library to perform inference on a video.
Your primary task is to create a Jupyter/Colab notebook that not only runs the detection but also explains and visualizes the core stages of the YOLOv11 pipeline. You will deconstruct the single-shot detection process by inspecting feature maps and the effect of post-processing techniques like Non-Max Suppression (NMS).
A reference video is provided here ; (This is just a reference )
Dataset
You will be provided with a short video file (use the same video as test video). For your analysis and visualizations, you should select one representative frame from this video. The classes of interest, ‘person’ and ‘sports ball’, are both part of the MS COCO dataset, which the recommended pre-trained model has been trained on.
Requirements
Your submission must be a single notebook file (.ipynb) that includes all code, markdown explanations, and output visualizations.
Please note that both ball and person are included in the COCO dataset and you can use a pretrained on COCO backbone .
Task 2: Deep-SORT
Read this and this paper to understand the Deep-SORT algorithm. You can also watch the video below that explains the implementation of Deep-SORT.
Draw the architecture of the tracking solution using a diagraming tool that is compatible with Github rendering excalidraw.
Write a summary of key components of the architecture above including the equations of the Kalman filter and explain what the Hungarian Algorithm will do . For the later you may benefit from going through this implementation.
Implement Deep-SORT that will work with the object detector from the earlier step. You are free to use the code from the video but and you also need to detect and track the soccer ball.
Submit your video with all the bouding boxes of the players and the socker ball superposed on the test video below.
Test video for submission.
The test video results shown at the beginning of this page were generated using this implementation. You are free to also consult this code as well especially when it comes to the superposition part.
All youtube videos can be downloaded using the pytube library.