Assignment 3
Points:
Task 1: 20 points
Task 2: 20 points
Task 3: 20 points
Task 4: 40 points
Task 5: 25 points - add 25 points to the total point grade of the Assignments. Credit cant be transferred to exams or project.
Visual Inertial Odometry (VIO) and SLAM
Note that some tasks include ROS and some others dont. Those that do they include ROS in the task title.
Introduction
VIO is a technique used in robotics and computer vision to estimate the position and orientation of a camera or sensor in 3D space by combining visual information from images with inertial measurements from sensors like Inertial Measurement Units (IMUs). This approach is particularly useful in scenarios where GPS signals are weak or unavailable, such as indoors or in urban environments. VIO is often intergrated with SLAM (Simultaneous Localization and Mapping) to build a map of the environment while simultaneously tracking the camera’s position within that map.
Task 1: Camera Calibration (ROS)
To implement VIO we need to have a camera sensor in the robot and we need to calibrate it. Obviously the simulated camera is not needed to be calibrated but we still want you to understand the process so we will use a real camera such as the laptop camera you have and ask you to calibrate it.
The calibration process typically involves capturing multiple images of a known calibration pattern, such as a checkerboard, from different angles and distances. The images are then processed to extract feature points, and the camera parameters are estimated using optimization techniques. The result is a set of parameters that can be used to correct lens distortion and accurately map 3D points in the world to 2D image coordinates by estimating the intrinsic parameters of a camera. Intrinsic parameters include focal length, optical center, and lens distortion coefficients.
Study sections 13.1 and 13.2 of the Peter Corke’s textbook “Robotics, Vision and Control” to understand the camera calibration process. Printout the checkerboard, attach it to a cardboard and take images of it from different angles and distances as demonstrated in Fig 13.11. You can use the book’s github repo and the underlying code of chapter 13 notebook to do so. We want to see your images rather than the included images in the book (that come from OpenCV).
In practice the intrinsic parameters will be programmed into ROS2 via the camera_info topic. Check if these are set to non-ideal values - they are typically set to
Task 2: Apply Camera Calibration (ROS)
In the maze environment, place objects on tables (you can reuse the objects from the previous assignment) and use the camera to estimate the relative pose of the objects. You can use the book’s functions in section 13.2.1 that wrap OpenCV’s solvePnP
function to estimate the pose of an object given its 3D coordinates in the world and its corresponding 2D image coordinates.
You need the geometry of your created objects such as a cube to achieve this task. Alternatively feel free to place fiducial markers on the objects and use the aruco
library to estimate the pose of the objects.
Establish a correspondence between a world coordinate system and the fixed objects in the scene. Use the relative pose estimate to estimate the pose of the camera in the world coordinate system. The camera pose is represented by a rotation matrix and a translation vector, which can be obtained from the extrinsic parameters of the camera.
Task 3: Showcase StellaVSLAM
StellaVSLAM (a well maintained forked of OpenVSLAM) is a flexible, visual SLAM system that supports multiple camera models and is designed for both monocular, stereo and RGB-D inputs. It uses Oriented FAST and Rotated BRIEF (ORB) features to detect, track, and map the environment in 3D. It builds and maintains a pose graph and map of landmarks while estimating the camera’s trajectory. The aim here is to integrate the python binding and replicate the mapping and localization process using Python. Note that C++ is commonly used in real-time robotics applications and therefore in Stella VSLAM. You dont need to become expert in C++ to perform this task.
Clone the repository of StellaVSLAM and follow the instructions that allow you to launch the Dockerfile.Desktop container with the PangolinViewer .
Follow the instructions in the StellaVSLAM documentationto run the example with the Equirectangular Datasets and the aist_living_lab_1 video. Note that you have to download the data first in your host and map the directory after the
-v
flag. The video at the top of this page shows the end result.Repeat the same for a video of your own indoor space (room, dorm, lab) using the laptop webcam - if you have a desktop you can use a USB connected web camera. You need to follow these instructions and you also need to use the calibration parameters you obtained in Task 1. These parameters are included in the
example/aist/equirectangular.yaml
for the camera that shot the video in task 2. You need to create a correspodning file for your camera.Publish a demo video in your YouTube channel explaining and showcasing step 3.
Task 4: Understand VSLAM
To fully understand how VSLAM works, you need to follow the two videos below and implement a VSLAM algorithm in Python. You can any python library to do this including OpenCV, PyTorch, RVC3Python or FilterPy.
For this task you can elect to work with the Python bindings of StellaVSLAM but this means that you need to understand the associated C++ code first and include what each called C++ function does in your report. Working with the aforementioned libraries is preferrable if you just want to show the principles of VSLAM but not follow the specific algorithmic choices made by StellaVSLAM.
You will use the same KITTI dataset for this task as the one used in the videos. The dataset is available here. You can use the kitti
library to download the dataset and convert it to a format that can be used with StellaVSLAM.
Task 5: Integrate StellaVSLAM into ROS (Extra Credit)
StellaVSLAM can be integrated into ROS and help with navigation. Integrate StellaVSLAM and showcase that it can work with the nav2
stack to help the robot navigate in the maze. This will be used in a class setting so keep notes in a tutorial-like format to show how the principles of Recursive State Estimation (RSE) with the help of additional algorithms are able to achieve a globally consistent localization estimation without a-priori knowledge of the maze map.