3,590 research outputs found
RGB-D datasets using microsoft kinect or similar sensors: a survey
RGB-D data has turned out to be a very useful representation of an indoor scene for solving fundamental computer vision problems. It takes the advantages of the color image that provides appearance information of an object and also the depth image that is immune to the variations in color, illumination, rotation angle and scale. With the invention of the low-cost Microsoft Kinect sensor, which was initially used for gaming and later became a popular device for computer vision, high quality RGB-D data can be acquired easily. In recent years, more and more RGB-D image/video datasets dedicated to various applications have become available, which are of great importance to benchmark the state-of-the-art. In this paper, we systematically survey popular RGB-D datasets for different applications including object recognition, scene classification, hand gesture recognition, 3D-simultaneous localization and mapping, and pose estimation. We provide the insights into the characteristics of each important dataset, and compare the popularity and the difficulty of those datasets. Overall, the main goal of this survey is to give a comprehensive description about the available RGB-D datasets and thus to guide researchers in the selection of suitable datasets for evaluating their algorithms
RGBDTAM: A Cost-Effective and Accurate RGB-D Tracking and Mapping System
Simultaneous Localization and Mapping using RGB-D cameras has been a fertile
research topic in the latest decade, due to the suitability of such sensors for
indoor robotics. In this paper we propose a direct RGB-D SLAM algorithm with
state-of-the-art accuracy and robustness at a los cost. Our experiments in the
RGB-D TUM dataset [34] effectively show a better accuracy and robustness in CPU
real time than direct RGB-D SLAM systems that make use of the GPU. The key
ingredients of our approach are mainly two. Firstly, the combination of a
semi-dense photometric and dense geometric error for the pose tracking (see
Figure 1), which we demonstrate to be the most accurate alternative. And
secondly, a model of the multi-view constraints and their errors in the mapping
and tracking threads, which adds extra information over other approaches. We
release the open-source implementation of our approach 1 . The reader is
referred to a video with our results 2 for a more illustrative visualization of
its performance
A Non-Rigid Map Fusion-Based RGB-Depth SLAM Method for Endoscopic Capsule Robots
In the gastrointestinal (GI) tract endoscopy field, ingestible wireless
capsule endoscopy is considered as a minimally invasive novel diagnostic
technology to inspect the entire GI tract and to diagnose various diseases and
pathologies. Since the development of this technology, medical device companies
and many groups have made significant progress to turn such passive capsule
endoscopes into robotic active capsule endoscopes to achieve almost all
functions of current active flexible endoscopes. However, the use of robotic
capsule endoscopy still has some challenges. One such challenge is the precise
localization of such active devices in 3D world, which is essential for a
precise three-dimensional (3D) mapping of the inner organ. A reliable 3D map of
the explored inner organ could assist the doctors to make more intuitive and
correct diagnosis. In this paper, we propose to our knowledge for the first
time in literature a visual simultaneous localization and mapping (SLAM) method
specifically developed for endoscopic capsule robots. The proposed RGB-Depth
SLAM method is capable of capturing comprehensive dense globally consistent
surfel-based maps of the inner organs explored by an endoscopic capsule robot
in real time. This is achieved by using dense frame-to-model camera tracking
and windowed surfelbased fusion coupled with frequent model refinement through
non-rigid surface deformations
Exploiting Points and Lines in Regression Forests for RGB-D Camera Relocalization
Camera relocalization plays a vital role in many robotics and computer vision
tasks, such as global localization, recovery from tracking failure and loop
closure detection. Recent random forests based methods exploit randomly sampled
pixel comparison features to predict 3D world locations for 2D image locations
to guide the camera pose optimization. However, these image features are only
sampled randomly in the images, without considering the spatial structures or
geometric information, leading to large errors or failure cases with the
existence of poorly textured areas or in motion blur. Line segment features are
more robust in these environments. In this work, we propose to jointly exploit
points and lines within the framework of uncertainty driven regression forests.
The proposed approach is thoroughly evaluated on three publicly available
datasets against several strong state-of-the-art baselines in terms of several
different error metrics. Experimental results prove the efficacy of our method,
showing superior or on-par state-of-the-art performance.Comment: published as a conference paper at 2018 IEEE/RSJ International
Conference on Intelligent Robots and Systems (IROS
CNN-SLAM: Real-time dense monocular SLAM with learned depth prediction
Given the recent advances in depth prediction from Convolutional Neural
Networks (CNNs), this paper investigates how predicted depth maps from a deep
neural network can be deployed for accurate and dense monocular reconstruction.
We propose a method where CNN-predicted dense depth maps are naturally fused
together with depth measurements obtained from direct monocular SLAM. Our
fusion scheme privileges depth prediction in image locations where monocular
SLAM approaches tend to fail, e.g. along low-textured regions, and vice-versa.
We demonstrate the use of depth prediction for estimating the absolute scale of
the reconstruction, hence overcoming one of the major limitations of monocular
SLAM. Finally, we propose a framework to efficiently fuse semantic labels,
obtained from a single frame, with dense SLAM, yielding semantically coherent
scene reconstruction from a single view. Evaluation results on two benchmark
datasets show the robustness and accuracy of our approach.Comment: 10 pages, 6 figures, IEEE Computer Society Conference on Computer
Vision and Pattern Recognition (CVPR), Hawaii, USA, June, 2017. The first two
authors contribute equally to this pape
- …