614 research outputs found
Real-Time Seamless Single Shot 6D Object Pose Prediction
We propose a single-shot approach for simultaneously detecting an object in
an RGB image and predicting its 6D pose without requiring multiple stages or
having to examine multiple hypotheses. Unlike a recently proposed single-shot
technique for this task (Kehl et al., ICCV'17) that only predicts an
approximate 6D pose that must then be refined, ours is accurate enough not to
require additional post-processing. As a result, it is much faster - 50 fps on
a Titan X (Pascal) GPU - and more suitable for real-time processing. The key
component of our method is a new CNN architecture inspired by the YOLO network
design that directly predicts the 2D image locations of the projected vertices
of the object's 3D bounding box. The object's 6D pose is then estimated using a
PnP algorithm.
For single object and multiple object pose estimation on the LINEMOD and
OCCLUSION datasets, our approach substantially outperforms other recent
CNN-based approaches when they are all used without post-processing. During
post-processing, a pose refinement step can be used to boost the accuracy of
the existing methods, but at 10 fps or less, they are much slower than our
method.Comment: CVPR 201
Pix2Pose: Pixel-Wise Coordinate Regression of Objects for 6D Pose Estimation
Estimating the 6D pose of objects using only RGB images remains challenging
because of problems such as occlusion and symmetries. It is also difficult to
construct 3D models with precise texture without expert knowledge or
specialized scanning devices. To address these problems, we propose a novel
pose estimation method, Pix2Pose, that predicts the 3D coordinates of each
object pixel without textured models. An auto-encoder architecture is designed
to estimate the 3D coordinates and expected errors per pixel. These pixel-wise
predictions are then used in multiple stages to form 2D-3D correspondences to
directly compute poses with the PnP algorithm with RANSAC iterations. Our
method is robust to occlusion by leveraging recent achievements in generative
adversarial training to precisely recover occluded parts. Furthermore, a novel
loss function, the transformer loss, is proposed to handle symmetric objects by
guiding predictions to the closest symmetric pose. Evaluations on three
different benchmark datasets containing symmetric and occluded objects show our
method outperforms the state of the art using only RGB images.Comment: Accepted at ICCV 2019 (Oral
iPose: Instance-Aware 6D Pose Estimation of Partly Occluded Objects
We address the task of 6D pose estimation of known rigid objects from single
input images in scenarios where the objects are partly occluded. Recent
RGB-D-based methods are robust to moderate degrees of occlusion. For RGB
inputs, no previous method works well for partly occluded objects. Our main
contribution is to present the first deep learning-based system that estimates
accurate poses for partly occluded objects from RGB-D and RGB input. We achieve
this with a new instance-aware pipeline that decomposes 6D object pose
estimation into a sequence of simpler steps, where each step removes specific
aspects of the problem. The first step localizes all known objects in the image
using an instance segmentation network, and hence eliminates surrounding
clutter and occluders. The second step densely maps pixels to 3D object surface
positions, so called object coordinates, using an encoder-decoder network, and
hence eliminates object appearance. The third, and final, step predicts the 6D
pose using geometric optimization. We demonstrate that we significantly
outperform the state-of-the-art for pose estimation of partly occluded objects
for both RGB and RGB-D input
- …