9 research outputs found
Recovering 6D Object Pose and Predicting Next-Best-View in the Crowd
Object detection and 6D pose estimation in the crowd (scenes with multiple
object instances, severe foreground occlusions and background distractors), has
become an important problem in many rapidly evolving technological areas such
as robotics and augmented reality. Single shot-based 6D pose estimators with
manually designed features are still unable to tackle the above challenges,
motivating the research towards unsupervised feature learning and
next-best-view estimation. In this work, we present a complete framework for
both single shot-based 6D object pose estimation and next-best-view prediction
based on Hough Forests, the state of the art object pose estimator that
performs classification and regression jointly. Rather than using manually
designed features we a) propose an unsupervised feature learnt from
depth-invariant patches using a Sparse Autoencoder and b) offer an extensive
evaluation of various state of the art features. Furthermore, taking advantage
of the clustering performed in the leaf nodes of Hough Forests, we learn to
estimate the reduction of uncertainty in other views, formulating the problem
of selecting the next-best-view. To further improve pose estimation, we propose
an improved joint registration and hypotheses verification module as a final
refinement step to reject false detections. We provide two additional
challenging datasets inspired from realistic scenarios to extensively evaluate
the state of the art and our framework. One is related to domestic environments
and the other depicts a bin-picking scenario mostly found in industrial
settings. We show that our framework significantly outperforms state of the art
both on public and on our datasets.Comment: CVPR 2016 accepted paper, project page:
http://www.iis.ee.ic.ac.uk/rkouskou/6D_NBV.htm
Visibility Constrained Generative Model for Depth-based 3D Facial Pose Tracking
In this paper, we propose a generative framework that unifies depth-based 3D
facial pose tracking and face model adaptation on-the-fly, in the unconstrained
scenarios with heavy occlusions and arbitrary facial expression variations.
Specifically, we introduce a statistical 3D morphable model that flexibly
describes the distribution of points on the surface of the face model, with an
efficient switchable online adaptation that gradually captures the identity of
the tracked subject and rapidly constructs a suitable face model when the
subject changes. Moreover, unlike prior art that employed ICP-based facial pose
estimation, to improve robustness to occlusions, we propose a ray visibility
constraint that regularizes the pose based on the face model's visibility with
respect to the input point cloud. Ablation studies and experimental results on
Biwi and ICT-3DHP datasets demonstrate that the proposed framework is effective
and outperforms completing state-of-the-art depth-based methods
Making Deep Heatmaps Robust to Partial Occlusions for 3D Object Pose Estimation
We introduce a novel method for robust and accurate 3D object pose estimation
from a single color image under large occlusions. Following recent approaches,
we first predict the 2D projections of 3D points related to the target object
and then compute the 3D pose from these correspondences using a geometric
method. Unfortunately, as the results of our experiments show, predicting these
2D projections using a regular CNN or a Convolutional Pose Machine is highly
sensitive to partial occlusions, even when these methods are trained with
partially occluded examples. Our solution is to predict heatmaps from multiple
small patches independently and to accumulate the results to obtain accurate
and robust predictions. Training subsequently becomes challenging because
patches with similar appearances but different positions on the object
correspond to different heatmaps. However, we provide a simple yet effective
solution to deal with such ambiguities. We show that our approach outperforms
existing methods on two challenging datasets: The Occluded LineMOD dataset and
the YCB-Video dataset, both exhibiting cluttered scenes with highly occluded
objects. Project website:
https://www.tugraz.at/institute/icg/research/team-lepetit/research-projects/robust-object-pose-estimation