5,873 research outputs found
Efficient monocular pose estimation for complex 3D models
Trabajo presentado al ICRA celebrado en Seattle (US) del 26 al 30 de mayo de 2015.We propose a robust and efficient method to estimate the pose of a camera with respect to complex 3D textured models of the environment that can potentially contain more than 100, 000 points. To tackle this problem we follow a top down approach where we combine high-level deep network classifiers with low level geometric approaches to come up with a solution that is fast, robust and accurate. Given an input image, we initially use a pre-trained deep network to compute a rough estimation of the camera pose. This initial estimate constrains the number of 3D model points that can be seen from the camera viewpoint. We then establish 3D-to-2D correspondences between these potentially visible points of the model and the 2D detected image features. Accurate pose estimation is finally obtained from the 2D-to-3D correspondences using a novel PnP algorithm that rejects outliers without the need to use a RANSAC strategy, and which is between 10 and 100 times faster than other methods that use it. Two real experimentsdealing with very large and complex 3D models demonstrate the effectiveness of the approach.This work has been partially funded by the Spanish Ministry of Economy and Competitiveness under projects ERANet Chistera project ViSen PCIN-2013-047, PAU+ DPI2011-27510 and ROBOT-INT-COOP DPI2013-42458-P, and by the EU project ARCAS FP7-ICT-2011-28761.Peer Reviewe
Pix2Pose: Pixel-Wise Coordinate Regression of Objects for 6D Pose Estimation
Estimating the 6D pose of objects using only RGB images remains challenging
because of problems such as occlusion and symmetries. It is also difficult to
construct 3D models with precise texture without expert knowledge or
specialized scanning devices. To address these problems, we propose a novel
pose estimation method, Pix2Pose, that predicts the 3D coordinates of each
object pixel without textured models. An auto-encoder architecture is designed
to estimate the 3D coordinates and expected errors per pixel. These pixel-wise
predictions are then used in multiple stages to form 2D-3D correspondences to
directly compute poses with the PnP algorithm with RANSAC iterations. Our
method is robust to occlusion by leveraging recent achievements in generative
adversarial training to precisely recover occluded parts. Furthermore, a novel
loss function, the transformer loss, is proposed to handle symmetric objects by
guiding predictions to the closest symmetric pose. Evaluations on three
different benchmark datasets containing symmetric and occluded objects show our
method outperforms the state of the art using only RGB images.Comment: Accepted at ICCV 2019 (Oral
Geometry-Aware Network for Non-Rigid Shape Prediction from a Single View
We propose a method for predicting the 3D shape of a deformable surface from
a single view. By contrast with previous approaches, we do not need a
pre-registered template of the surface, and our method is robust to the lack of
texture and partial occlusions. At the core of our approach is a {\it
geometry-aware} deep architecture that tackles the problem as usually done in
analytic solutions: first perform 2D detection of the mesh and then estimate a
3D shape that is geometrically consistent with the image. We train this
architecture in an end-to-end manner using a large dataset of synthetic
renderings of shapes under different levels of deformation, material
properties, textures and lighting conditions. We evaluate our approach on a
test split of this dataset and available real benchmarks, consistently
improving state-of-the-art solutions with a significantly lower computational
time.Comment: Accepted at CVPR 201
Implicit 3D Orientation Learning for 6D Object Detection from RGB Images
We propose a real-time RGB-based pipeline for object detection and 6D pose
estimation. Our novel 3D orientation estimation is based on a variant of the
Denoising Autoencoder that is trained on simulated views of a 3D model using
Domain Randomization. This so-called Augmented Autoencoder has several
advantages over existing methods: It does not require real, pose-annotated
training data, generalizes to various test sensors and inherently handles
object and view symmetries. Instead of learning an explicit mapping from input
images to object poses, it provides an implicit representation of object
orientations defined by samples in a latent space. Our pipeline achieves
state-of-the-art performance on the T-LESS dataset both in the RGB and RGB-D
domain. We also evaluate on the LineMOD dataset where we can compete with other
synthetically trained approaches. We further increase performance by correcting
3D orientation estimates to account for perspective errors when the object
deviates from the image center and show extended results.Comment: Code available at: https://github.com/DLR-RM/AugmentedAutoencode
Probabilistic Combination of Noisy Points and Planes for RGB-D Odometry
This work proposes a visual odometry method that combines points and plane
primitives, extracted from a noisy depth camera. Depth measurement uncertainty
is modelled and propagated through the extraction of geometric primitives to
the frame-to-frame motion estimation, where pose is optimized by weighting the
residuals of 3D point and planes matches, according to their uncertainties.
Results on an RGB-D dataset show that the combination of points and planes,
through the proposed method, is able to perform well in poorly textured
environments, where point-based odometry is bound to fail.Comment: Accepted to TAROS 201
Probabilistic RGB-D Odometry based on Points, Lines and Planes Under Depth Uncertainty
This work proposes a robust visual odometry method for structured
environments that combines point features with line and plane segments,
extracted through an RGB-D camera. Noisy depth maps are processed by a
probabilistic depth fusion framework based on Mixtures of Gaussians to denoise
and derive the depth uncertainty, which is then propagated throughout the
visual odometry pipeline. Probabilistic 3D plane and line fitting solutions are
used to model the uncertainties of the feature parameters and pose is estimated
by combining the three types of primitives based on their uncertainties.
Performance evaluation on RGB-D sequences collected in this work and two public
RGB-D datasets: TUM and ICL-NUIM show the benefit of using the proposed depth
fusion framework and combining the three feature-types, particularly in scenes
with low-textured surfaces, dynamic objects and missing depth measurements.Comment: Major update: more results, depth filter released as opensource, 34
page
- …