2,866 research outputs found
RGB-D datasets using microsoft kinect or similar sensors: a survey
RGB-D data has turned out to be a very useful representation of an indoor scene for solving fundamental computer vision problems. It takes the advantages of the color image that provides appearance information of an object and also the depth image that is immune to the variations in color, illumination, rotation angle and scale. With the invention of the low-cost Microsoft Kinect sensor, which was initially used for gaming and later became a popular device for computer vision, high quality RGB-D data can be acquired easily. In recent years, more and more RGB-D image/video datasets dedicated to various applications have become available, which are of great importance to benchmark the state-of-the-art. In this paper, we systematically survey popular RGB-D datasets for different applications including object recognition, scene classification, hand gesture recognition, 3D-simultaneous localization and mapping, and pose estimation. We provide the insights into the characteristics of each important dataset, and compare the popularity and the difficulty of those datasets. Overall, the main goal of this survey is to give a comprehensive description about the available RGB-D datasets and thus to guide researchers in the selection of suitable datasets for evaluating their algorithms
3D Object Discovery and Modeling Using Single RGB-D Images Containing Multiple Object Instances
Unsupervised object modeling is important in robotics, especially for
handling a large set of objects. We present a method for unsupervised 3D object
discovery, reconstruction, and localization that exploits multiple instances of
an identical object contained in a single RGB-D image. The proposed method does
not rely on segmentation, scene knowledge, or user input, and thus is easily
scalable. Our method aims to find recurrent patterns in a single RGB-D image by
utilizing appearance and geometry of the salient regions. We extract keypoints
and match them in pairs based on their descriptors. We then generate triplets
of the keypoints matching with each other using several geometric criteria to
minimize false matches. The relative poses of the matched triplets are computed
and clustered to discover sets of triplet pairs with similar relative poses.
Triplets belonging to the same set are likely to belong to the same object and
are used to construct an initial object model. Detection of remaining instances
with the initial object model using RANSAC allows to further expand and refine
the model. The automatically generated object models are both compact and
descriptive. We show quantitative and qualitative results on RGB-D images with
various objects including some from the Amazon Picking Challenge. We also
demonstrate the use of our method in an object picking scenario with a robotic
arm
Robust Dense Mapping for Large-Scale Dynamic Environments
We present a stereo-based dense mapping algorithm for large-scale dynamic
urban environments. In contrast to other existing methods, we simultaneously
reconstruct the static background, the moving objects, and the potentially
moving but currently stationary objects separately, which is desirable for
high-level mobile robotic tasks such as path planning in crowded environments.
We use both instance-aware semantic segmentation and sparse scene flow to
classify objects as either background, moving, or potentially moving, thereby
ensuring that the system is able to model objects with the potential to
transition from static to dynamic, such as parked cars. Given camera poses
estimated from visual odometry, both the background and the (potentially)
moving objects are reconstructed separately by fusing the depth maps computed
from the stereo input. In addition to visual odometry, sparse scene flow is
also used to estimate the 3D motions of the detected moving objects, in order
to reconstruct them accurately. A map pruning technique is further developed to
improve reconstruction accuracy and reduce memory consumption, leading to
increased scalability. We evaluate our system thoroughly on the well-known
KITTI dataset. Our system is capable of running on a PC at approximately 2.5Hz,
with the primary bottleneck being the instance-aware semantic segmentation,
which is a limitation we hope to address in future work. The source code is
available from the project website (http://andreibarsan.github.io/dynslam).Comment: Presented at IEEE International Conference on Robotics and Automation
(ICRA), 201
- …