3,600 research outputs found
Cumulative object categorization in clutter
In this paper we present an approach based on scene- or part-graphs for geometrically categorizing touching and
occluded objects. We use additive RGBD feature descriptors and hashing of graph configuration parameters for describing the spatial arrangement of constituent parts. The presented experiments quantify that this method outperforms our earlier part-voting and sliding window classification. We evaluated our approach on cluttered scenes, and by using a 3D dataset containing over 15000 Kinect scans of over 100 objects which were grouped into general geometric categories. Additionally, color, geometric, and combined features were compared for categorization tasks
Semantic Pose using Deep Networks Trained on Synthetic RGB-D
In this work we address the problem of indoor scene understanding from RGB-D
images. Specifically, we propose to find instances of common furniture classes,
their spatial extent, and their pose with respect to generalized class models.
To accomplish this, we use a deep, wide, multi-output convolutional neural
network (CNN) that predicts class, pose, and location of possible objects
simultaneously. To overcome the lack of large annotated RGB-D training sets
(especially those with pose), we use an on-the-fly rendering pipeline that
generates realistic cluttered room scenes in parallel to training. We then
perform transfer learning on the relatively small amount of publicly available
annotated RGB-D data, and find that our model is able to successfully annotate
even highly challenging real scenes. Importantly, our trained network is able
to understand noisy and sparse observations of highly cluttered scenes with a
remarkable degree of accuracy, inferring class and pose from a very limited set
of cues. Additionally, our neural network is only moderately deep and computes
class, pose and position in tandem, so the overall run-time is significantly
faster than existing methods, estimating all output parameters simultaneously
in parallel on a GPU in seconds.Comment: ICCV 2015 Submissio
Interactive Perception Based on Gaussian Process Classification for House-Hold Objects Recognition and Sorting
We present an interactive perception model for
object sorting based on Gaussian Process (GP) classification
that is capable of recognizing objects categories from point
cloud data. In our approach, FPFH features are extracted from
point clouds to describe the local 3D shape of objects and
a Bag-of-Words coding method is used to obtain an object-level
vocabulary representation. Multi-class Gaussian Process
classification is employed to provide and probable estimation of
the identity of the object and serves a key role in the interactive
perception cycle – modelling perception confidence. We show
results from simulated input data on both SVM and GP based
multi-class classifiers to validate the recognition accuracy of our
proposed perception model. Our results demonstrate that by
using a GP-based classifier, we obtain true positive classification
rates of up to 80%. Our semi-autonomous object sorting
experiments show that the proposed GP based interactive
sorting approach outperforms random sorting by up to 30%
when applied to scenes comprising configurations of household
objects
SegICP: Integrated Deep Semantic Segmentation and Pose Estimation
Recent robotic manipulation competitions have highlighted that sophisticated
robots still struggle to achieve fast and reliable perception of task-relevant
objects in complex, realistic scenarios. To improve these systems' perceptive
speed and robustness, we present SegICP, a novel integrated solution to object
recognition and pose estimation. SegICP couples convolutional neural networks
and multi-hypothesis point cloud registration to achieve both robust pixel-wise
semantic segmentation as well as accurate and real-time 6-DOF pose estimation
for relevant objects. Our architecture achieves 1cm position error and
<5^\circ$ angle error in real time without an initial seed. We evaluate and
benchmark SegICP against an annotated dataset generated by motion capture.Comment: IROS camera-read
Recovering 6D Object Pose and Predicting Next-Best-View in the Crowd
Object detection and 6D pose estimation in the crowd (scenes with multiple
object instances, severe foreground occlusions and background distractors), has
become an important problem in many rapidly evolving technological areas such
as robotics and augmented reality. Single shot-based 6D pose estimators with
manually designed features are still unable to tackle the above challenges,
motivating the research towards unsupervised feature learning and
next-best-view estimation. In this work, we present a complete framework for
both single shot-based 6D object pose estimation and next-best-view prediction
based on Hough Forests, the state of the art object pose estimator that
performs classification and regression jointly. Rather than using manually
designed features we a) propose an unsupervised feature learnt from
depth-invariant patches using a Sparse Autoencoder and b) offer an extensive
evaluation of various state of the art features. Furthermore, taking advantage
of the clustering performed in the leaf nodes of Hough Forests, we learn to
estimate the reduction of uncertainty in other views, formulating the problem
of selecting the next-best-view. To further improve pose estimation, we propose
an improved joint registration and hypotheses verification module as a final
refinement step to reject false detections. We provide two additional
challenging datasets inspired from realistic scenarios to extensively evaluate
the state of the art and our framework. One is related to domestic environments
and the other depicts a bin-picking scenario mostly found in industrial
settings. We show that our framework significantly outperforms state of the art
both on public and on our datasets.Comment: CVPR 2016 accepted paper, project page:
http://www.iis.ee.ic.ac.uk/rkouskou/6D_NBV.htm
Fast and Robust Detection of Fallen People from a Mobile Robot
This paper deals with the problem of detecting fallen people lying on the
floor by means of a mobile robot equipped with a 3D depth sensor. In the
proposed algorithm, inspired by semantic segmentation techniques, the 3D scene
is over-segmented into small patches. Fallen people are then detected by means
of two SVM classifiers: the first one labels each patch, while the second one
captures the spatial relations between them. This novel approach showed to be
robust and fast. Indeed, thanks to the use of small patches, fallen people in
real cluttered scenes with objects side by side are correctly detected.
Moreover, the algorithm can be executed on a mobile robot fitted with a
standard laptop making it possible to exploit the 2D environmental map built by
the robot and the multiple points of view obtained during the robot navigation.
Additionally, this algorithm is robust to illumination changes since it does
not rely on RGB data but on depth data. All the methods have been thoroughly
validated on the IASLAB-RGBD Fallen Person Dataset, which is published online
as a further contribution. It consists of several static and dynamic sequences
with 15 different people and 2 different environments
- …