Vision Foundation Models for an embodiment and environment agnostic scene representation for robotic manipulation - l'unam - université nantes angers le mans
Communication Dans Un Congrès Année : 2024

Vision Foundation Models for an embodiment and environment agnostic scene representation for robotic manipulation

Résumé

Traditional Imitation Learning (IL) approaches often rely on teleoperation to collect training data, which ensures consistency between training and deployment action and observation spaces. However, teleoperation slows data acquisition, distorts expert behavior and data can be affected by the lack of teleoperation skills. To overcome these limitations, IL training on human demonstrations requires visual representations that are agnostic to both embodiment and environment. Recent advancements in Vision Foundation Models, such as Grounded-Segment-Anything (Grounded-SAM), offer a solution by extracting meaningful scene information while filtering out irrelevant details without manual annotation. In this work, we collected 50 human video demonstrations of a manipulation task from the RLBench benchmark. We evaluated Grounded-SAM's ability to automatically annotate objects of interest and proposed a 3D visual representation using depth maps. This representation was used to train a diffusion policy, which successfully generalized to simulated robot deployment in RLBench, despite being trained exclusively on real-world human demonstrations. Our results demonstrate that efficient training can be achieved with just 50 demonstrations and halfan-hour training time.

Fichier principal
Vignette du fichier
iros_workshop_2024-2.pdf (1.77 Mo) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-04751375 , version 1 (24-10-2024)

Identifiants

  • HAL Id : hal-04751375 , version 1

Citer

Kevin Riou, Kevin Subrin, Patrick Le Callet. Vision Foundation Models for an embodiment and environment agnostic scene representation for robotic manipulation. International Conference on Intelligent Robots and Systems (IROS), on Brain over Brawn Workshop (BoB) (https://bob-workshop.github.io/), Oct 2024, Abu Dhabi, United Arab Emirates. ⟨hal-04751375⟩
0 Consultations
0 Téléchargements

Partager

More