Pave the Way to Grasp Anything: Transferring Foundation Models for
  Universal Pick-Place Robots

Fu, Jianlong; Jin, Chuhao; Liu, Bei; Song, Ruihua; Tan, Wenhui; Wang, Limin; Yang, Jiange

Pave the Way to Grasp Anything: Transferring Foundation Models for Universal Pick-Place Robots

Authors: Jianlong Fu
Chuhao Jin
Bei Liu
Ruihua Song
Wenhui Tan
Limin Wang
Jiange Yang
Publication date: 9 June 2023
Publisher

Abstract

Improving the generalization capabilities of general-purpose robotic agents has long been a significant challenge actively pursued by research communities. Existing approaches often rely on collecting large-scale real-world robotic data, such as the RT-1 dataset. However, these approaches typically suffer from low efficiency, limiting their capability in open-domain scenarios with new objects, and diverse backgrounds. In this paper, we propose a novel paradigm that effectively leverages language-grounded segmentation masks generated by state-of-the-art foundation models, to address a wide range of pick-and-place robot manipulation tasks in everyday scenarios. By integrating precise semantics and geometries conveyed from masks into our multi-view policy model, our approach can perceive accurate object poses and enable sample-efficient learning. Besides, such design facilitates effective generalization for grasping new objects with similar shapes observed during training. Our approach consists of two distinct steps. First, we introduce a series of foundation models to accurately ground natural language demands across multiple tasks. Second, we develop a Multi-modal Multi-view Policy Model that incorporates inputs such as RGB images, semantic masks, and robot proprioception states to jointly predict precise and executable robot actions. Extensive real-world experiments conducted on a Franka Emika robot arm validate the effectiveness of our proposed paradigm. Real-world demos are shown in YouTube (https://www.youtube.com/watch?v=1m9wNzfp_4E ) and Bilibili (https://www.bilibili.com/video/BV178411Z7H2/ )

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2306.05716

Last time updated on 14/06/2023