Real-Time RAG Systems in Robotics

Introduction

Retrieval Augmented Generation (RAG) is a recent paradigm for large-scale language understanding tasks. It combines the strengths of retrieval-based and generation-based models, enabling the model to retrieve relevant information from a large corpus and generate a coherent domain-specific response.

In this project, we explore the use of RAG systems in robotics applications and we construct RAG systems that have two different personas:

A Robot RAG (R-RAG) is a system that helps robotic personas perform better domain-specific tasks such as block stacking. As such, the RRAG system augments the existing robot capabilities by providing additional background or sensory information and context.
A Person RAG (P-RAG) understands the environment, that this includes the robot agent, and responds to human coworker prompts such as: “what is your block stacking status right now?”.

The RAG system consists of three main components:

Milestone 0: Environment and Tooling

Ingestion Pipelines

Featurization Pipelines

We ingest multiple media sources such as ROS2 3D simulations, youtube videos, pdf manuals, instructions, LLMs, available datasets to train the domain-specific LLMs for such RAG systems.

RAG is enabling the robot to perform complex tasks autonomously.