30 research outputs found
3D Model Assisted Image Segmentation
The problem of segmenting a given image into coherent regions is important in Computer Vision and many industrial applications require segmenting a known object into its components. Examples include identifying individual parts of a component for proces
Recovering 6D Object Pose and Predicting Next-Best-View in the Crowd
Object detection and 6D pose estimation in the crowd (scenes with multiple
object instances, severe foreground occlusions and background distractors), has
become an important problem in many rapidly evolving technological areas such
as robotics and augmented reality. Single shot-based 6D pose estimators with
manually designed features are still unable to tackle the above challenges,
motivating the research towards unsupervised feature learning and
next-best-view estimation. In this work, we present a complete framework for
both single shot-based 6D object pose estimation and next-best-view prediction
based on Hough Forests, the state of the art object pose estimator that
performs classification and regression jointly. Rather than using manually
designed features we a) propose an unsupervised feature learnt from
depth-invariant patches using a Sparse Autoencoder and b) offer an extensive
evaluation of various state of the art features. Furthermore, taking advantage
of the clustering performed in the leaf nodes of Hough Forests, we learn to
estimate the reduction of uncertainty in other views, formulating the problem
of selecting the next-best-view. To further improve pose estimation, we propose
an improved joint registration and hypotheses verification module as a final
refinement step to reject false detections. We provide two additional
challenging datasets inspired from realistic scenarios to extensively evaluate
the state of the art and our framework. One is related to domestic environments
and the other depicts a bin-picking scenario mostly found in industrial
settings. We show that our framework significantly outperforms state of the art
both on public and on our datasets.Comment: CVPR 2016 accepted paper, project page:
http://www.iis.ee.ic.ac.uk/rkouskou/6D_NBV.htm
Depth-aware convolutional neural networks for accurate 3D pose estimation in RGB-D images
© 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Most recent approaches to 3D pose estimation from RGB-D images address the problem in a two-stage pipeline. First, they learn a classifier –typically a random forest– to predict the position of each input pixel on the object surface. These estimates are then used to define an energy function that is minimized w.r.t. the object pose. In this paper, we focus on the first stage of the problem and propose a novel classifier based on a depth-aware Convolutional Neural Network. This classifier is able to learn a scale-adaptive regression model that yields very accurate pixel-level predictions, allowing to finally estimate the pose using a simple RANSAC-based scheme, with no need to optimize complex ad hoc energy functions. Our experiments on publicly available datasets show that our approach achieves remarkable improvements over state-of-the-art methods.Peer ReviewedPostprint (author's final draft
ALET (Automated Labeling of Equipment and Tools): A Dataset, a Baseline and a Usecase for Tool Detection in the Wild
Robots collaborating with humans in realistic environments will need to be
able to detect the tools that can be used and manipulated. However, there is no
available dataset or study that addresses this challenge in real settings. In
this paper, we fill this gap by providing an extensive dataset (METU-ALET) for
detecting farming, gardening, office, stonemasonry, vehicle, woodworking and
workshop tools. The scenes correspond to sophisticated environments with or
without humans using the tools. The scenes we consider introduce several
challenges for object detection, including the small scale of the tools, their
articulated nature, occlusion, inter-class invariance, etc. Moreover, we train
and compare several state of the art deep object detectors (including Faster
R-CNN, Cascade R-CNN, RepPoint and RetinaNet) on our dataset. We observe that
the detectors have difficulty in detecting especially small-scale tools or
tools that are visually similar to parts of other tools. This in turn supports
the importance of our dataset and paper. With the dataset, the code and the
trained models, our work provides a basis for further research into tools and
their use in robotics applications.Comment: 7 pages, 4 figure
Accurate Localization of 3D Objects from RGB-D Data Using Segmentation Hypotheses
In this paper we focus on the problem of detecting ob-jects in 3D from RGB-D images. We propose a novel frame-work that explores the compatibility between segmentation hypotheses of the object in the image and the corresponding 3D map. Our framework allows to discover the optimal lo-cation of the object using a generalization of the structural latent SVM formulation in 3D as well as the definition of a new loss function defined over the 3D space in training. We evaluate our method using two existing RGB-D datasets. Extensive quantitative and qualitative experimental results show that our proposed approach outperforms state-of-the-art as methods well as a number of baseline approaches for both 3D and 2D object recognition tasks. 1