11 research outputs found
Deep Projective Rotation Estimation through Relative Supervision
Orientation estimation is the core to a variety of vision and robotics tasks
such as camera and object pose estimation. Deep learning has offered a way to
develop image-based orientation estimators; however, such estimators often
require training on a large labeled dataset, which can be time-intensive to
collect. In this work, we explore whether self-supervised learning from
unlabeled data can be used to alleviate this issue. Specifically, we assume
access to estimates of the relative orientation between neighboring poses, such
that can be obtained via a local alignment method. While self-supervised
learning has been used successfully for translational object keypoints, in this
work, we show that naively applying relative supervision to the rotational
group will often fail to converge due to the non-convexity of the
rotational space. To tackle this challenge, we propose a new algorithm for
self-supervised orientation estimation which utilizes Modified Rodrigues
Parameters to stereographically project the closed manifold of to the
open manifold of , allowing the optimization to be done in an
open Euclidean space. We empirically validate the benefits of the proposed
algorithm for rotational averaging problem in two settings: (1) direct
optimization on rotation parameters, and (2) optimization of parameters of a
convolutional neural network that predicts object orientations from images. In
both settings, we demonstrate that our proposed algorithm is able to converge
to a consistent relative orientation frame much faster than algorithms that
purely operate in the space. Additional information can be found at
https://sites.google.com/view/deep-projective-rotation/home .Comment: Conference on Robot Learning (CoRL), 2022. Supplementary material is
available at https://sites.google.com/view/deep-projective-rotation/hom
TAX-Pose: Task-Specific Cross-Pose Estimation for Robot Manipulation
How do we imbue robots with the ability to efficiently manipulate unseen
objects and transfer relevant skills based on demonstrations? End-to-end
learning methods often fail to generalize to novel objects or unseen
configurations. Instead, we focus on the task-specific pose relationship
between relevant parts of interacting objects. We conjecture that this
relationship is a generalizable notion of a manipulation task that can transfer
to new objects in the same category; examples include the relationship between
the pose of a pan relative to an oven or the pose of a mug relative to a mug
rack. We call this task-specific pose relationship "cross-pose" and provide a
mathematical definition of this concept. We propose a vision-based system that
learns to estimate the cross-pose between two objects for a given manipulation
task using learned cross-object correspondences. The estimated cross-pose is
then used to guide a downstream motion planner to manipulate the objects into
the desired pose relationship (placing a pan into the oven or the mug onto the
mug rack). We demonstrate our method's capability to generalize to unseen
objects, in some cases after training on only 10 demonstrations in the real
world. Results show that our system achieves state-of-the-art performance in
both simulated and real-world experiments across a number of tasks.
Supplementary information and videos can be found at
https://sites.google.com/view/tax-pose/home.Comment: Conference on Robot Learning (CoRL), 2022. Supplementary material is
available at https://sites.google.com/view/tax-pose/hom
Object Pose Estimation without Direct Supervision
Currently, robot manipulation is a special purpose tool, restricted to isolated environments with a fixed set of objects. In order to make robot manipulation more general, robots need to be able to perceive and interact with a large number of objects in cluttered scenes. Traditionally, object pose has been used as a representation to facilitate these interactions. While object pose has many benefits, several limitations become apparent when we investigate how to train an object pose estimator. Traditionally, to train pose estimators, we need to collect a large dataset of annotated object images for supervision. In addition to this data collection being a potentially costly endeavor, most pose estimators trained on such datasets do not account for uncertainty in pose predictions, nor do they generalize to novel objects outside of the training dataset. Further, the pose representation itself does not capture task-specific object interactions.Â
In this thesis we explore different methods of alleviating these limitations of training object pose estimators. First, we develop methods that can predict the pose uncertainty induced by both our training distribution and the ambiguities caused by object occlusions and symmetries. The ability to predict this uncertainty allows the robot to better understand what it does and does not know about the object’s position and orientation and how that may affect task completion. Second, we propose a method that can estimate the pose of objects that were unknown at training time. To solve this problem, we introduce a novel method for zero-shot object pose estimation in clutter that combines classical pose hypothesis generation and a learned scoring function. Third, we evaluate the convergence properties of learning pose estimation from relative pose annotations using gradient-based optimization methods. We find that naively using such supervision can lead to poor convergence. Using this analysis, we develop a method to better leverage relative annotations when training pose estimators using gradient-based optimization. Finally, we develop a method to model the object-to-object relationships required for completing a task. Rather than separately estimating the pose of each object, we show how we can learn to estimate a task-specific relative pose from a small number of demonstrations that generalizes to novel objects. We find that such a formulation is naturally translationally equivariant and is able to focus on the components of each object that are key to completing the given task. </p