6 research outputs found
Deep Projective Rotation Estimation through Relative Supervision
Orientation estimation is the core to a variety of vision and robotics tasks
such as camera and object pose estimation. Deep learning has offered a way to
develop image-based orientation estimators; however, such estimators often
require training on a large labeled dataset, which can be time-intensive to
collect. In this work, we explore whether self-supervised learning from
unlabeled data can be used to alleviate this issue. Specifically, we assume
access to estimates of the relative orientation between neighboring poses, such
that can be obtained via a local alignment method. While self-supervised
learning has been used successfully for translational object keypoints, in this
work, we show that naively applying relative supervision to the rotational
group will often fail to converge due to the non-convexity of the
rotational space. To tackle this challenge, we propose a new algorithm for
self-supervised orientation estimation which utilizes Modified Rodrigues
Parameters to stereographically project the closed manifold of to the
open manifold of , allowing the optimization to be done in an
open Euclidean space. We empirically validate the benefits of the proposed
algorithm for rotational averaging problem in two settings: (1) direct
optimization on rotation parameters, and (2) optimization of parameters of a
convolutional neural network that predicts object orientations from images. In
both settings, we demonstrate that our proposed algorithm is able to converge
to a consistent relative orientation frame much faster than algorithms that
purely operate in the space. Additional information can be found at
https://sites.google.com/view/deep-projective-rotation/home .Comment: Conference on Robot Learning (CoRL), 2022. Supplementary material is
available at https://sites.google.com/view/deep-projective-rotation/hom
TAX-Pose: Task-Specific Cross-Pose Estimation for Robot Manipulation
How do we imbue robots with the ability to efficiently manipulate unseen
objects and transfer relevant skills based on demonstrations? End-to-end
learning methods often fail to generalize to novel objects or unseen
configurations. Instead, we focus on the task-specific pose relationship
between relevant parts of interacting objects. We conjecture that this
relationship is a generalizable notion of a manipulation task that can transfer
to new objects in the same category; examples include the relationship between
the pose of a pan relative to an oven or the pose of a mug relative to a mug
rack. We call this task-specific pose relationship "cross-pose" and provide a
mathematical definition of this concept. We propose a vision-based system that
learns to estimate the cross-pose between two objects for a given manipulation
task using learned cross-object correspondences. The estimated cross-pose is
then used to guide a downstream motion planner to manipulate the objects into
the desired pose relationship (placing a pan into the oven or the mug onto the
mug rack). We demonstrate our method's capability to generalize to unseen
objects, in some cases after training on only 10 demonstrations in the real
world. Results show that our system achieves state-of-the-art performance in
both simulated and real-world experiments across a number of tasks.
Supplementary information and videos can be found at
https://sites.google.com/view/tax-pose/home.Comment: Conference on Robot Learning (CoRL), 2022. Supplementary material is
available at https://sites.google.com/view/tax-pose/hom
Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots
We present Universal Manipulation Interface (UMI) -- a data collection and
policy learning framework that allows direct skill transfer from in-the-wild
human demonstrations to deployable robot policies. UMI employs hand-held
grippers coupled with careful interface design to enable portable, low-cost,
and information-rich data collection for challenging bimanual and dynamic
manipulation demonstrations. To facilitate deployable policy learning, UMI
incorporates a carefully designed policy interface with inference-time latency
matching and a relative-trajectory action representation. The resulting learned
policies are hardware-agnostic and deployable across multiple robot platforms.
Equipped with these features, UMI framework unlocks new robot manipulation
capabilities, allowing zero-shot generalizable dynamic, bimanual, precise, and
long-horizon behaviors, by only changing the training data for each task. We
demonstrate UMI's versatility and efficacy with comprehensive real-world
experiments, where policies learned via UMI zero-shot generalize to novel
environments and objects when trained on diverse human demonstrations. UMI's
hardware and software system is open-sourced at https://umi-gripper.github.io.Comment: Project website: https://umi-gripper.github.i
Open X-Embodiment:Robotic learning datasets and RT-X models
Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train "generalist" X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. The project website is robotics-transformer-x.github.io