Search CORE

158 research outputs found

Deep Object-Centric Representations for Generalizable Robot Learning

Author: Abbeel Pieter
Darrell Trevor
Devin Coline
Levine Sergey
Publication venue
Publication date: 26/09/2017
Field of study

Robotic manipulation in complex open-world scenarios requires both reliable physical manipulation skills and effective and generalizable perception. In this paper, we propose a method where general purpose pretrained visual models serve as an object-centric prior for the perception system of a learned policy. We devise an object-level attentional mechanism that can be used to determine relevant objects from a few trajectories or demonstrations, and then immediately incorporate those objects into a learned policy. A task-independent meta-attention locates possible objects in the scene, and a task-specific attention identifies which objects are predictive of the trajectories. The scope of the task-specific attention is easily adjusted by showing demonstrations with distractor objects or with diverse relevant objects. Our results indicate that this approach exhibits good generalization across object instances using very few samples, and can be used to learn a variety of manipulation tasks using reinforcement learning

arXiv.org e-Print Archive

Crossref

Learning Environment-Aware Affordance for 3D Articulated Object Manipulation under Occlusions

Author: Cheng Kai
Dong Hao
Ning Chuanruo
Shen Yan
Wu Ruihai
Zhan Guanqi
Publication venue
Publication date: 25/09/2023
Field of study

Perceiving and manipulating 3D articulated objects in diverse environments is essential for home-assistant robots. Recent studies have shown that point-level affordance provides actionable priors for downstream manipulation tasks. However, existing works primarily focus on single-object scenarios with homogeneous agents, overlooking the realistic constraints imposed by the environment and the agent's morphology, e.g., occlusions and physical limitations. In this paper, we propose an environment-aware affordance framework that incorporates both object-level actionable priors and environment constraints. Unlike object-centric affordance approaches, learning environment-aware affordance faces the challenge of combinatorial explosion due to the complexity of various occlusions, characterized by their quantities, geometries, positions and poses. To address this and enhance data efficiency, we introduce a novel contrastive affordance learning framework capable of training on scenes containing a single occluder and generalizing to scenes with complex occluder combinations. Experiments demonstrate the effectiveness of our proposed approach in learning affordance considering environment constraints. Project page at https://chengkaiacademycity.github.io/EnvAwareAfford/Comment: In 37th Conference on Neural Information Processing Systems (NeurIPS 2023). Website at https://chengkaiacademycity.github.io/EnvAwareAfford

arXiv.org e-Print Archive

GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields

Author: Ge Yuying
Hansen Nicklas
Li Li Erran
Macaluso Annabella
Wang Xiaolong
Wu Yueh-Hua
Yan Ge
Ye Jianglong
Ze Yanjie
Publication venue
Publication date: 01/09/2023
Field of study

It is a long-standing problem in robotics to develop agents capable of executing diverse manipulation tasks from visual observations in unstructured real-world environments. To achieve this goal, the robot needs to have a comprehensive understanding of the 3D structure and semantics of the scene. In this work, we present

\textbf{GNFactor}

, a visual behavior cloning agent for multi-task robotic manipulation with

\textbf{G}

eneralizable

\textbf{N}

eural feature

\textbf{F}

ields. GNFactor jointly optimizes a generalizable neural field (GNF) as a reconstruction module and a Perceiver Transformer as a decision-making module, leveraging a shared deep 3D voxel representation. To incorporate semantics in 3D, the reconstruction module utilizes a vision-language foundation model (

\textit{e.g.}

, Stable Diffusion) to distill rich semantic information into the deep 3D voxel. We evaluate GNFactor on 3 real robot tasks and perform detailed ablations on 10 RLBench tasks with a limited number of demonstrations. We observe a substantial improvement of GNFactor over current state-of-the-art methods in seen and unseen tasks, demonstrating the strong generalization ability of GNFactor. Our project website is https://yanjieze.com/GNFactor/ .Comment: CoRL 2023 Oral. Website: https://yanjieze.com/GNFactor

arXiv.org e-Print Archive

Efficient Representations of Object Geometry for Reinforcement Learning of Interactive Grasping Policies

Author: Behnke Sven
Mosbach Malte
Publication venue
Publication date: 20/11/2022
Field of study

Grasping objects of different shapes and sizes - a foundational, effortless skill for humans - remains a challenging task in robotics. Although model-based approaches can predict stable grasp configurations for known object models, they struggle to generalize to novel objects and often operate in a non-interactive open-loop manner. In this work, we present a reinforcement learning framework that learns the interactive grasping of various geometrically distinct real-world objects by continuously controlling an anthropomorphic robotic hand. We explore several explicit representations of object geometry as input to the policy. Moreover, we propose to inform the policy implicitly through signed distances and show that this is naturally suited to guide the search through a shaped reward component. Finally, we demonstrate that the proposed framework is able to learn even in more challenging conditions, such as targeted grasping from a cluttered bin. Necessary pre-grasping behaviors such as object reorientation and utilization of environmental constraints emerge in this case. Videos of learned interactive policies are available at https://maltemosbach.github. io/geometry_aware_grasping_policies

arXiv.org e-Print Archive

The Treachery of Images: Bayesian Scene Keypoints for Deep Policy Learning in Robotic Manipulation

Author: Boedecker Joschka
Burgard Wolfram
Chisari Eugenio
Valada Abhinav
von Hartz Jan Ole
Welschehold Tim
Publication venue
Publication date: 20/09/2023
Field of study

In policy learning for robotic manipulation, sample efficiency is of paramount importance. Thus, learning and extracting more compact representations from camera observations is a promising avenue. However, current methods often assume full observability of the scene and struggle with scale invariance. In many tasks and settings, this assumption does not hold as objects in the scene are often occluded or lie outside the field of view of the camera, rendering the camera observation ambiguous with regard to their location. To tackle this problem, we present BASK, a Bayesian approach to tracking scale-invariant keypoints over time. Our approach successfully resolves inherent ambiguities in images, enabling keypoint tracking on symmetrical objects and occluded and out-of-view objects. We employ our method to learn challenging multi-object robot manipulation tasks from wrist camera observations and demonstrate superior utility for policy learning compared to other representation learning techniques. Furthermore, we show outstanding robustness towards disturbances such as clutter, occlusions, and noisy depth measurements, as well as generalization to unseen objects both in simulation and real-world robotic experiments

arXiv.org e-Print Archive