Behavioral Cloning with Multimodal Models

PhD-level students that commit to work the equivalent of 20 working days (1 full month) after the course ends during June-July 2025 towards a publication or demonstration of ROS CUAs will receive 20 credits on the “projects and assignments” grading categories. Credits cannot be transferred to other grading categories such as exams. Work on the extra credit task will be done as part of a team and we need 3 team committed members.

Understand and apply VideoLlama3 (paper, code) for the development of Computer-Using Agents trained from multimodal human demonstrations (behavioral cloning). We need to develop the Reinforcement Learning pipeline for a task that will be revealed.

In this project you will be exposed to the engineering of Computer-Using Agents (CUA) that are able to use the LoB applications in a similar way as a human would do. The CUA will be trained from multimodal human or synthetic inputs and human demonstrations (behavioral cloning). The computer tasks are associated with the Robotics Line of Business (LoB) applications and in the absence of a real robot LoB app we will use the ROS toolchain.