5,873 research outputs found

    Efficient monocular pose estimation for complex 3D models

    Get PDF
    Trabajo presentado al ICRA celebrado en Seattle (US) del 26 al 30 de mayo de 2015.We propose a robust and efficient method to estimate the pose of a camera with respect to complex 3D textured models of the environment that can potentially contain more than 100, 000 points. To tackle this problem we follow a top down approach where we combine high-level deep network classifiers with low level geometric approaches to come up with a solution that is fast, robust and accurate. Given an input image, we initially use a pre-trained deep network to compute a rough estimation of the camera pose. This initial estimate constrains the number of 3D model points that can be seen from the camera viewpoint. We then establish 3D-to-2D correspondences between these potentially visible points of the model and the 2D detected image features. Accurate pose estimation is finally obtained from the 2D-to-3D correspondences using a novel PnP algorithm that rejects outliers without the need to use a RANSAC strategy, and which is between 10 and 100 times faster than other methods that use it. Two real experimentsdealing with very large and complex 3D models demonstrate the effectiveness of the approach.This work has been partially funded by the Spanish Ministry of Economy and Competitiveness under projects ERANet Chistera project ViSen PCIN-2013-047, PAU+ DPI2011-27510 and ROBOT-INT-COOP DPI2013-42458-P, and by the EU project ARCAS FP7-ICT-2011-28761.Peer Reviewe

    Pix2Pose: Pixel-Wise Coordinate Regression of Objects for 6D Pose Estimation

    Full text link
    Estimating the 6D pose of objects using only RGB images remains challenging because of problems such as occlusion and symmetries. It is also difficult to construct 3D models with precise texture without expert knowledge or specialized scanning devices. To address these problems, we propose a novel pose estimation method, Pix2Pose, that predicts the 3D coordinates of each object pixel without textured models. An auto-encoder architecture is designed to estimate the 3D coordinates and expected errors per pixel. These pixel-wise predictions are then used in multiple stages to form 2D-3D correspondences to directly compute poses with the PnP algorithm with RANSAC iterations. Our method is robust to occlusion by leveraging recent achievements in generative adversarial training to precisely recover occluded parts. Furthermore, a novel loss function, the transformer loss, is proposed to handle symmetric objects by guiding predictions to the closest symmetric pose. Evaluations on three different benchmark datasets containing symmetric and occluded objects show our method outperforms the state of the art using only RGB images.Comment: Accepted at ICCV 2019 (Oral

    Geometry-Aware Network for Non-Rigid Shape Prediction from a Single View

    Get PDF
    We propose a method for predicting the 3D shape of a deformable surface from a single view. By contrast with previous approaches, we do not need a pre-registered template of the surface, and our method is robust to the lack of texture and partial occlusions. At the core of our approach is a {\it geometry-aware} deep architecture that tackles the problem as usually done in analytic solutions: first perform 2D detection of the mesh and then estimate a 3D shape that is geometrically consistent with the image. We train this architecture in an end-to-end manner using a large dataset of synthetic renderings of shapes under different levels of deformation, material properties, textures and lighting conditions. We evaluate our approach on a test split of this dataset and available real benchmarks, consistently improving state-of-the-art solutions with a significantly lower computational time.Comment: Accepted at CVPR 201

    Implicit 3D Orientation Learning for 6D Object Detection from RGB Images

    Get PDF
    We propose a real-time RGB-based pipeline for object detection and 6D pose estimation. Our novel 3D orientation estimation is based on a variant of the Denoising Autoencoder that is trained on simulated views of a 3D model using Domain Randomization. This so-called Augmented Autoencoder has several advantages over existing methods: It does not require real, pose-annotated training data, generalizes to various test sensors and inherently handles object and view symmetries. Instead of learning an explicit mapping from input images to object poses, it provides an implicit representation of object orientations defined by samples in a latent space. Our pipeline achieves state-of-the-art performance on the T-LESS dataset both in the RGB and RGB-D domain. We also evaluate on the LineMOD dataset where we can compete with other synthetically trained approaches. We further increase performance by correcting 3D orientation estimates to account for perspective errors when the object deviates from the image center and show extended results.Comment: Code available at: https://github.com/DLR-RM/AugmentedAutoencode

    Probabilistic Combination of Noisy Points and Planes for RGB-D Odometry

    Full text link
    This work proposes a visual odometry method that combines points and plane primitives, extracted from a noisy depth camera. Depth measurement uncertainty is modelled and propagated through the extraction of geometric primitives to the frame-to-frame motion estimation, where pose is optimized by weighting the residuals of 3D point and planes matches, according to their uncertainties. Results on an RGB-D dataset show that the combination of points and planes, through the proposed method, is able to perform well in poorly textured environments, where point-based odometry is bound to fail.Comment: Accepted to TAROS 201

    Probabilistic RGB-D Odometry based on Points, Lines and Planes Under Depth Uncertainty

    Full text link
    This work proposes a robust visual odometry method for structured environments that combines point features with line and plane segments, extracted through an RGB-D camera. Noisy depth maps are processed by a probabilistic depth fusion framework based on Mixtures of Gaussians to denoise and derive the depth uncertainty, which is then propagated throughout the visual odometry pipeline. Probabilistic 3D plane and line fitting solutions are used to model the uncertainties of the feature parameters and pose is estimated by combining the three types of primitives based on their uncertainties. Performance evaluation on RGB-D sequences collected in this work and two public RGB-D datasets: TUM and ICL-NUIM show the benefit of using the proposed depth fusion framework and combining the three feature-types, particularly in scenes with low-textured surfaces, dynamic objects and missing depth measurements.Comment: Major update: more results, depth filter released as opensource, 34 page
    • …
    corecore