55 research outputs found

    6D Object Pose Estimation from Approximate 3D Models for Orbital Robotics

    Full text link
    We present a novel technique to estimate the 6D pose of objects from single images where the 3D geometry of the object is only given approximately and not as a precise 3D model. To achieve this, we employ a dense 2D-to-3D correspondence predictor that regresses 3D model coordinates for every pixel. In addition to the 3D coordinates, our model also estimates the pixel-wise coordinate error to discard correspondences that are likely wrong. This allows us to generate multiple 6D pose hypotheses of the object, which we then refine iteratively using a highly efficient region-based approach. We also introduce a novel pixel-wise posterior formulation by which we can estimate the probability for each hypothesis and select the most likely one. As we show in experiments, our approach is capable of dealing with extreme visual conditions including overexposure, high contrast, or low signal-to-noise ratio. This makes it a powerful technique for the particularly challenging task of estimating the pose of tumbling satellites for in-orbit robotic applications. Our method achieves state-of-the-art performance on the SPEED+ dataset and has won the SPEC2021 post-mortem competition.Comment: preprin

    Self-Supervised Object-in-Gripper Segmentation from Robotic Motions

    Get PDF
    Accurate object segmentation is a crucial task in the context of robotic manipulation. However, creating sufficient annotated training data for neural networks is particularly time consuming and often requires manual labeling. To this end, we propose a simple, yet robust solution for learning to segment unknown objects grasped by a robot. Specifically, we exploit motion and temporal cues in RGB video sequences. Using optical flow estimation we first learn to predict segmentation masks of our given manipulator. Then, these annotations are used in combination with motion cues to automatically distinguish between background, manipulator and unknown, grasped object. In contrast to existing systems our approach is fully self-supervised and independent of precise camera calibration, 3D models or potentially imperfect depth data. We perform a thorough comparison with alternative baselines and approaches from literature. The object masks and views are shown to be suitable training data for segmentation networks that generalize to novel environments and also allow for watertight 3D reconstruction.Comment: 15 pages, 11 figures. Video: https://www.youtube.com/watch?v=srEwuuIIgz

    "What's This?" -- Learning to Segment Unknown Objects from Manipulation Sequences

    Get PDF
    We present a novel framework for self-supervised grasped object segmentation with a robotic manipulator. Our method successively learns an agnostic foreground segmentation followed by a distinction between manipulator and object solely by observing the motion between consecutive RGB frames. In contrast to previous approaches, we propose a single, end-to-end trainable architecture which jointly incorporates motion cues and semantic knowledge. Furthermore, while the motion of the manipulator and the object are substantial cues for our algorithm, we present means to robustly deal with distraction objects moving in the background, as well as with completely static scenes. Our method neither depends on any visual registration of a kinematic robot or 3D object models, nor on precise hand-eye calibration or any additional sensor data. By extensive experimental evaluation we demonstrate the superiority of our framework and provide detailed insights on its capability of dealing with the aforementioned extreme cases of motion. We also show that training a semantic segmentation network with the automatically labeled data achieves results on par with manually annotated training data. Code and pretrained models will be made publicly available.Comment: 8 pages, 6 figure

    Self-Supervised Object-in-Gripper Segmentation from Robotic Motions

    Get PDF
    Accurate object segmentation is a crucial task in the context of robotic manipulation. However, creating sufficient annotated training data for neural networks is particularly time consuming and often requires manual labeling. To this end, we propose a simple, yet robust solution for learning to segment unknown objects grasped by a robot. Specifically, we exploit motion and temporal cues in RGB video sequences. Using optical flow estimation we first learn to predict segmentation masks of our given manipulator. Then, these annotations are used in combination with motion cues to automatically distinguish between background, manipulator and unknown, grasped object. In contrast to existing systems our approach is fully self-supervised and independent of precise camera calibration, 3D models or potentially imperfect depth data. We perform a thorough comparison with alternative baselines and approaches from literature. The object masks and views are shown to be suitable training data for segmentation networks that generalize to novel environments and also allow for watertight 3D reconstruction

    Robust Probabilistic Robot Arm Keypoint Detection Exploiting Kinematic Knowledge

    Get PDF
    We propose PK-ROKED, a novel probabilistic deep-learning algorithm to detect keypoints of a robotic manipulator in camera images and to robustly estimate the positioning inaccuracies w.r.t the camera frame. Our algorithm uses monocular images as a primary input source and augments these with prior knowledge about the keypoint locations based on the robot's forward kinematics. As output, the network provides 2D image coordinates of the keypoints and an associated uncertainty measure, where the latter is obtained using MonteCarlo dropout. In experiments on two different robotic systems, we show that our network provides superior detection results compared to the state-of-the-art. We furthermore analyze the precision of different estimation approaches to obtain an uncertainty measure

    Bayesian Active Learning for Sim-to-Real Robotic Perception

    Get PDF
    While learning from synthetic training data has recently gained an increased attention, in real-world robotic applications, there are still performance deficiencies due to the so-called Sim-to-Real gap. In practice, this gap is hard to resolve with only synthetic data. Therefore, we focus on an efficient acquisition of real data within a Sim-to-Real learning pipeline. Concretely, we employ deep Bayesian active learning to minimize manual annotation efforts and devise an autonomous learning paradigm to select the data that is considered useful for the human expert to annotate. To achieve this, a Bayesian Neural Network (BNN) object detector providing reliable un- certainty estimates is adapted to infer the informativeness of the unlabeled data. Furthermore, to cope with misalignments of the label distribution in uncertainty-based sampling, we develop an effective randomized sampling strategy that performs favorably compared to other complex alternatives. In our experiments on object classification and detection, we show benefits of our approach and provide evidence that labeling efforts can be reduced significantly. Finally, we demonstrate the practical effectiveness of this idea in a grasping task on an assistive robot

    Bayesian Active Learning for Sim-to-Real Robotic Perception

    Get PDF
    While learning from synthetic training data has recently gained an increased attention, in real-world robotic applications, there are still performance deficiencies due to the so-called Sim-to-Real gap. In practice, this gap is hard to resolve with only synthetic data. Therefore, we focus on an efficient acquisition of real data within a Sim-to-Real learning pipeline. Concretely, we employ deep Bayesian active learning to minimize manual annotation efforts and devise an autonomous learning paradigm to select the data that is considered useful for the human expert to annotate. To achieve this, a Bayesian Neural Network (BNN) object detector providing reliable un- certainty estimates is adapted to infer the informativeness of the unlabeled data. Furthermore, to cope with misalignments of the label distribution in uncertainty-based sampling, we develop an effective randomized sampling strategy that performs favorably compared to other complex alternatives. In our experiments on object classification and detection, we show benefits of our approach and provide evidence that labeling efforts can be reduced significantly. Finally, we demonstrate the practical effectiveness of this idea in a grasping task on an assistive robot

    Towards Robust Perception of Unknown Objects in the Wild

    Get PDF
    To be able to interact in dynamic and cluttered environments, detection and instance segmentation of only known objects is often not sufficient. Our recently proposed Instance Stereo Transformer (INSTR) addresses this problem by yielding pixel-wise instance masks of unknown items on dominant horizontal surfaces without requiring potentially noisy depth maps. To further boost the application of INSTR in a robotic domain, we propose two improvements: First, we extend the network to semantically label all non-object pixels, and experimentally validate that the additional explicit semantic information further enhances the object instance predictions. Second, knowledge about some detected objects might often readily be available, and we utilize Dropout as approximation of Bayesian inference to robustly classify the detected instances into known and unknown categories. The overall framework is well suited for various robotic applications, e.g. stone segmentation in planetary environments or in an unknown object grasping setting

    ReSyRIS: A Real-Synthetic Rock Instance Segmentation Dataset for Training and Benchmarking

    Get PDF
    The exploration of our solar system for understanding its creation and investigating potential chances of life on other celestial bodies is a fundamental drive of human mankind. After early telescope-based observation, Apollo 11 was the first space mission able to collect samples on the lunar surface and take them back to earth for analysis. Especially in recent years this trend accelerates again, and many successors were (or are in the process of being) launched into space for extra-terrestrial sample extraction. Yet, the abundance of potential failures makes these missions extremely challenging. For operations aimed at deeper parts of the solar system, the operational working distance extends even further, and communication delay and limited bandwidth increase complexity. Consequently, sample extraction missions are designed to be more autonomous in order to carry out large parts without human intervention. One specific sub-task particularly suitable for automation is the identification of relevant extraction candidates. While there exists several approaches for rock sample identification, there are often limiting factors in the form of applicable training data, lack of suitable annotations of the very same, and unclear performance of the algorithms in extra-terrestrial environments because of inadequate test data. To address these issues, we present ReSyRIS (Real-Synthetic Rock Instance Segmentation Dataset), which consists of real-world images together with their manually created synthetic counterpart. The real-world part is collected in a quasi-extra-terrestrial environment on Mt. Etna in Sicily, and focuses recordings of several rock sample sites. Every scene is re-created in OAISYS, a Blender-based data generation pipeline for unstructured outdoor environments, for which the required meshes and textures are extracted from the volcano site. This allows not only precise re-construction of the scenes in a synthetic environment, but also generation of highly realistic training data with automatic annotations in similar fashion to the real recordings. We finally investigate the generalization capability of a neural network trained on incrementally altered versions of synthetic data to explore potential sim-to-real gaps. The real-world dataset together with the OAISYS config files to create its synthetic counterpart are publicly available at https://rm.dlr.de/resyris_en. With this novel benchmark on extra-terrestrial stone instance segmentation we hope to further push the boundaries of autonomous rock sample extraction

    Multi-path Learning for Object Pose Estimation Across Domains

    Get PDF
    We introduce a scalable approach for object pose estimation trained on simulated RGB views of multiple 3D models together. We learn an encoding of object views that does not only describe an implicit orientation of all objects seen during training, but can also relate views of untrained objects. Our single-encoder-multi-decoder network is trained using a technique we denote "multi-path learning": While the encoder is shared by all objects, each decoder only reconstructs views of a single object. Consequently, views of different instances do not have to be separated in the latent space and can share common features. The resulting encoder generalizes well from synthetic to real data and across various instances, categories, model types and datasets. We systematically investigate the learned encodings, their generalization, and iterative refinement strategies on the ModelNet40 and T-LESS dataset. Despite training jointly on multiple objects, our 6D Object Detection pipeline achieves state-of-the-art results on T-LESS at much lower runtimes than competing approaches.Comment: To appear at CVPR 2020; Code will be available here: https://github.com/DLR-RM/AugmentedAutoencoder/tree/multipat
    • …
    corecore