569 research outputs found
Latent-Class Hough Forests for 3D object detection and pose estimation of rigid objects
In this thesis we propose a novel framework, Latent-Class Hough Forests, for the problem of 3D object detection and pose estimation in heavily cluttered and occluded scenes. Firstly, we adapt the state-of-the-art template-based representation, LINEMOD [34, 36], into a scale-invariant patch descriptor and integrate it into a regression forest using a novel template-based split function. In training, rather than explicitly collecting representative negative samples, our method is trained on positive samples only and we treat the class distributions at the leaf nodes as latent variables. During the inference process we iteratively update these distributions, providing accurate estimation of background clutter and foreground occlusions and thus a better detection rate. Furthermore, as a by-product, the latent class distributions can provide accurate occlusion aware segmentation masks, even in the multi-instance scenario. In addition to an existing public dataset, which contains only single-instance sequences with large amounts of clutter, we have collected a new, more challenging, dataset for multiple-instance detection containing heavy 2D and 3D clutter as well as foreground occlusions. We evaluate the Latent-Class Hough Forest on both of these datasets where we outperform state-of-the art methods.Open Acces
Stratified decision forests for accurate anatomical landmark localization in cardiac images
Accurate localization of anatomical landmarks is an important step in medical imaging, as it provides useful prior information for subsequent image analysis and acquisition methods. It is particularly useful for initialization of automatic image analysis tools (e.g. segmentation and registration) and detection of scan planes for automated image acquisition. Landmark localization has been commonly performed using learning based approaches, such as classifier and/or regressor models. However, trained models may not generalize well in heterogeneous datasets when the images contain large differences due to size, pose and shape variations of organs. To learn more data-adaptive and patient specific models, we propose a novel stratification based training model, and demonstrate its use in a decision forest. The proposed approach does not require any additional training information compared to the standard model training procedure and can be easily integrated into any decision tree framework. The proposed method is evaluated on 1080 3D highresolution and 90 multi-stack 2D cardiac cine MR images. The experiments show that the proposed method achieves state-of-theart landmark localization accuracy and outperforms standard regression and classification based approaches. Additionally, the proposed method is used in a multi-atlas segmentation to create a fully automatic segmentation pipeline, and the results show that it achieves state-of-the-art segmentation accuracy
Recovering 6D Object Pose: A Review and Multi-modal Analysis
A large number of studies analyse object detection and pose estimation at
visual level in 2D, discussing the effects of challenges such as occlusion,
clutter, texture, etc., on the performances of the methods, which work in the
context of RGB modality. Interpreting the depth data, the study in this paper
presents thorough multi-modal analyses. It discusses the above-mentioned
challenges for full 6D object pose estimation in RGB-D images comparing the
performances of several 6D detectors in order to answer the following
questions: What is the current position of the computer vision community for
maintaining "automation" in robotic manipulation? What next steps should the
community take for improving "autonomy" in robotics while handling objects? Our
findings include: (i) reasonably accurate results are obtained on
textured-objects at varying viewpoints with cluttered backgrounds. (ii) Heavy
existence of occlusion and clutter severely affects the detectors, and
similar-looking distractors is the biggest challenge in recovering instances'
6D. (iii) Template-based methods and random forest-based learning algorithms
underlie object detection and 6D pose estimation. Recent paradigm is to learn
deep discriminative feature representations and to adopt CNNs taking RGB images
as input. (iv) Depending on the availability of large-scale 6D annotated depth
datasets, feature representations can be learnt on these datasets, and then the
learnt representations can be customized for the 6D problem
Recovering 6D Object Pose and Predicting Next-Best-View in the Crowd
Object detection and 6D pose estimation in the crowd (scenes with multiple
object instances, severe foreground occlusions and background distractors), has
become an important problem in many rapidly evolving technological areas such
as robotics and augmented reality. Single shot-based 6D pose estimators with
manually designed features are still unable to tackle the above challenges,
motivating the research towards unsupervised feature learning and
next-best-view estimation. In this work, we present a complete framework for
both single shot-based 6D object pose estimation and next-best-view prediction
based on Hough Forests, the state of the art object pose estimator that
performs classification and regression jointly. Rather than using manually
designed features we a) propose an unsupervised feature learnt from
depth-invariant patches using a Sparse Autoencoder and b) offer an extensive
evaluation of various state of the art features. Furthermore, taking advantage
of the clustering performed in the leaf nodes of Hough Forests, we learn to
estimate the reduction of uncertainty in other views, formulating the problem
of selecting the next-best-view. To further improve pose estimation, we propose
an improved joint registration and hypotheses verification module as a final
refinement step to reject false detections. We provide two additional
challenging datasets inspired from realistic scenarios to extensively evaluate
the state of the art and our framework. One is related to domestic environments
and the other depicts a bin-picking scenario mostly found in industrial
settings. We show that our framework significantly outperforms state of the art
both on public and on our datasets.Comment: CVPR 2016 accepted paper, project page:
http://www.iis.ee.ic.ac.uk/rkouskou/6D_NBV.htm
- …