336 research outputs found
LabelFusion: A Pipeline for Generating Ground Truth Labels for Real RGBD Data of Cluttered Scenes
Deep neural network (DNN) architectures have been shown to outperform
traditional pipelines for object segmentation and pose estimation using RGBD
data, but the performance of these DNN pipelines is directly tied to how
representative the training data is of the true data. Hence a key requirement
for employing these methods in practice is to have a large set of labeled data
for your specific robotic manipulation task, a requirement that is not
generally satisfied by existing datasets. In this paper we develop a pipeline
to rapidly generate high quality RGBD data with pixelwise labels and object
poses. We use an RGBD camera to collect video of a scene from multiple
viewpoints and leverage existing reconstruction techniques to produce a 3D
dense reconstruction. We label the 3D reconstruction using a human assisted
ICP-fitting of object meshes. By reprojecting the results of labeling the 3D
scene we can produce labels for each RGBD image of the scene. This pipeline
enabled us to collect over 1,000,000 labeled object instances in just a few
days. We use this dataset to answer questions related to how much training data
is required, and of what quality the data must be, to achieve high performance
from a DNN architecture
Semantic Pose using Deep Networks Trained on Synthetic RGB-D
In this work we address the problem of indoor scene understanding from RGB-D
images. Specifically, we propose to find instances of common furniture classes,
their spatial extent, and their pose with respect to generalized class models.
To accomplish this, we use a deep, wide, multi-output convolutional neural
network (CNN) that predicts class, pose, and location of possible objects
simultaneously. To overcome the lack of large annotated RGB-D training sets
(especially those with pose), we use an on-the-fly rendering pipeline that
generates realistic cluttered room scenes in parallel to training. We then
perform transfer learning on the relatively small amount of publicly available
annotated RGB-D data, and find that our model is able to successfully annotate
even highly challenging real scenes. Importantly, our trained network is able
to understand noisy and sparse observations of highly cluttered scenes with a
remarkable degree of accuracy, inferring class and pose from a very limited set
of cues. Additionally, our neural network is only moderately deep and computes
class, pose and position in tandem, so the overall run-time is significantly
faster than existing methods, estimating all output parameters simultaneously
in parallel on a GPU in seconds.Comment: ICCV 2015 Submissio
Rapid Pose Label Generation through Sparse Representation of Unknown Objects
Deep Convolutional Neural Networks (CNNs) have been successfully deployed on
robots for 6-DoF object pose estimation through visual perception. However,
obtaining labeled data on a scale required for the supervised training of CNNs
is a difficult task - exacerbated if the object is novel and a 3D model is
unavailable. To this end, this work presents an approach for rapidly generating
real-world, pose-annotated RGB-D data for unknown objects. Our method not only
circumvents the need for a prior 3D object model (textured or otherwise) but
also bypasses complicated setups of fiducial markers, turntables, and sensors.
With the help of a human user, we first source minimalistic labelings of an
ordered set of arbitrarily chosen keypoints over a set of RGB-D videos. Then,
by solving an optimization problem, we combine these labels under a world frame
to recover a sparse, keypoint-based representation of the object. The sparse
representation leads to the development of a dense model and the pose labels
for each image frame in the set of scenes. We show that the sparse model can
also be efficiently used for scaling to a large number of new scenes. We
demonstrate the practicality of the generated labeled dataset by training a
pipeline for 6-DoF object pose estimation and a pixel-wise segmentation
network
- …