11 research outputs found

    Deep Projective Rotation Estimation through Relative Supervision

    Full text link
    Orientation estimation is the core to a variety of vision and robotics tasks such as camera and object pose estimation. Deep learning has offered a way to develop image-based orientation estimators; however, such estimators often require training on a large labeled dataset, which can be time-intensive to collect. In this work, we explore whether self-supervised learning from unlabeled data can be used to alleviate this issue. Specifically, we assume access to estimates of the relative orientation between neighboring poses, such that can be obtained via a local alignment method. While self-supervised learning has been used successfully for translational object keypoints, in this work, we show that naively applying relative supervision to the rotational group SO(3)SO(3) will often fail to converge due to the non-convexity of the rotational space. To tackle this challenge, we propose a new algorithm for self-supervised orientation estimation which utilizes Modified Rodrigues Parameters to stereographically project the closed manifold of SO(3)SO(3) to the open manifold of R3\mathbb{R}^{3}, allowing the optimization to be done in an open Euclidean space. We empirically validate the benefits of the proposed algorithm for rotational averaging problem in two settings: (1) direct optimization on rotation parameters, and (2) optimization of parameters of a convolutional neural network that predicts object orientations from images. In both settings, we demonstrate that our proposed algorithm is able to converge to a consistent relative orientation frame much faster than algorithms that purely operate in the SO(3)SO(3) space. Additional information can be found at https://sites.google.com/view/deep-projective-rotation/home .Comment: Conference on Robot Learning (CoRL), 2022. Supplementary material is available at https://sites.google.com/view/deep-projective-rotation/hom

    TAX-Pose: Task-Specific Cross-Pose Estimation for Robot Manipulation

    Full text link
    How do we imbue robots with the ability to efficiently manipulate unseen objects and transfer relevant skills based on demonstrations? End-to-end learning methods often fail to generalize to novel objects or unseen configurations. Instead, we focus on the task-specific pose relationship between relevant parts of interacting objects. We conjecture that this relationship is a generalizable notion of a manipulation task that can transfer to new objects in the same category; examples include the relationship between the pose of a pan relative to an oven or the pose of a mug relative to a mug rack. We call this task-specific pose relationship "cross-pose" and provide a mathematical definition of this concept. We propose a vision-based system that learns to estimate the cross-pose between two objects for a given manipulation task using learned cross-object correspondences. The estimated cross-pose is then used to guide a downstream motion planner to manipulate the objects into the desired pose relationship (placing a pan into the oven or the mug onto the mug rack). We demonstrate our method's capability to generalize to unseen objects, in some cases after training on only 10 demonstrations in the real world. Results show that our system achieves state-of-the-art performance in both simulated and real-world experiments across a number of tasks. Supplementary information and videos can be found at https://sites.google.com/view/tax-pose/home.Comment: Conference on Robot Learning (CoRL), 2022. Supplementary material is available at https://sites.google.com/view/tax-pose/hom

    Object Pose Estimation without Direct Supervision

    No full text
    Currently, robot manipulation is a special purpose tool, restricted to isolated environments with a fixed set of objects. In order to make robot manipulation more general, robots need to be able to perceive and interact with a large number of objects in cluttered scenes. Traditionally, object pose has been used as a representation to facilitate these interactions. While object pose has many benefits, several limitations become apparent when we investigate how to train an object pose estimator. Traditionally, to train pose estimators, we need to collect a large dataset of annotated object images for supervision. In addition to this data collection being a potentially costly endeavor, most pose estimators trained on such datasets do not account for uncertainty in pose predictions, nor do they generalize to novel objects outside of the training dataset. Further, the pose representation itself does not capture task-specific object interactions.  In this thesis we explore different methods of alleviating these limitations of training object pose estimators. First, we develop methods that can predict the pose uncertainty induced by both our training distribution and the ambiguities caused by object occlusions and symmetries. The ability to predict this uncertainty allows the robot to better understand what it does and does not know about the object’s position and orientation and how that may affect task completion. Second, we propose a method that can estimate the pose of objects that were unknown at training time. To solve this problem, we introduce a novel method for zero-shot object pose estimation in clutter that combines classical pose hypothesis generation and a learned scoring function. Third, we evaluate the convergence properties of learning pose estimation from relative pose annotations using gradient-based optimization methods. We find that naively using such supervision can lead to poor convergence. Using this analysis, we develop a method to better leverage relative annotations when training pose estimators using gradient-based optimization. Finally, we develop a method to model the object-to-object relationships required for completing a task. Rather than separately estimating the pose of each object, we show how we can learn to estimate a task-specific relative pose from a small number of demonstrations that generalizes to novel objects. We find that such a formulation is naturally translationally equivariant and is able to focus on the components of each object that are key to completing the given task. </p
    corecore