4,427 research outputs found
SPLODE: Semi-Probabilistic Point and Line Odometry with Depth Estimation from RGB-D Camera Motion
Active depth cameras suffer from several limitations, which cause incomplete
and noisy depth maps, and may consequently affect the performance of RGB-D
Odometry. To address this issue, this paper presents a visual odometry method
based on point and line features that leverages both measurements from a depth
sensor and depth estimates from camera motion. Depth estimates are generated
continuously by a probabilistic depth estimation framework for both types of
features to compensate for the lack of depth measurements and inaccurate
feature depth associations. The framework models explicitly the uncertainty of
triangulating depth from both point and line observations to validate and
obtain precise estimates. Furthermore, depth measurements are exploited by
propagating them through a depth map registration module and using a
frame-to-frame motion estimation method that considers 3D-to-2D and 2D-to-3D
reprojection errors, independently. Results on RGB-D sequences captured on
large indoor and outdoor scenes, where depth sensor limitations are critical,
show that the combination of depth measurements and estimates through our
approach is able to overcome the absence and inaccuracy of depth measurements.Comment: IROS 201
Direct Monocular Odometry Using Points and Lines
Most visual odometry algorithm for a monocular camera focuses on points,
either by feature matching, or direct alignment of pixel intensity, while
ignoring a common but important geometry entity: edges. In this paper, we
propose an odometry algorithm that combines points and edges to benefit from
the advantages of both direct and feature based methods. It works better in
texture-less environments and is also more robust to lighting changes and fast
motion by increasing the convergence basin. We maintain a depth map for the
keyframe then in the tracking part, the camera pose is recovered by minimizing
both the photometric error and geometric error to the matched edge in a
probabilistic framework. In the mapping part, edge is used to speed up and
increase stereo matching accuracy. On various public datasets, our algorithm
achieves better or comparable performance than state-of-the-art monocular
odometry methods. In some challenging texture-less environments, our algorithm
reduces the state estimation error over 50%.Comment: ICRA 201
Probabilistic RGB-D Odometry based on Points, Lines and Planes Under Depth Uncertainty
This work proposes a robust visual odometry method for structured
environments that combines point features with line and plane segments,
extracted through an RGB-D camera. Noisy depth maps are processed by a
probabilistic depth fusion framework based on Mixtures of Gaussians to denoise
and derive the depth uncertainty, which is then propagated throughout the
visual odometry pipeline. Probabilistic 3D plane and line fitting solutions are
used to model the uncertainties of the feature parameters and pose is estimated
by combining the three types of primitives based on their uncertainties.
Performance evaluation on RGB-D sequences collected in this work and two public
RGB-D datasets: TUM and ICL-NUIM show the benefit of using the proposed depth
fusion framework and combining the three feature-types, particularly in scenes
with low-textured surfaces, dynamic objects and missing depth measurements.Comment: Major update: more results, depth filter released as opensource, 34
page
RGB-D datasets using microsoft kinect or similar sensors: a survey
RGB-D data has turned out to be a very useful representation of an indoor scene for solving fundamental computer vision problems. It takes the advantages of the color image that provides appearance information of an object and also the depth image that is immune to the variations in color, illumination, rotation angle and scale. With the invention of the low-cost Microsoft Kinect sensor, which was initially used for gaming and later became a popular device for computer vision, high quality RGB-D data can be acquired easily. In recent years, more and more RGB-D image/video datasets dedicated to various applications have become available, which are of great importance to benchmark the state-of-the-art. In this paper, we systematically survey popular RGB-D datasets for different applications including object recognition, scene classification, hand gesture recognition, 3D-simultaneous localization and mapping, and pose estimation. We provide the insights into the characteristics of each important dataset, and compare the popularity and the difficulty of those datasets. Overall, the main goal of this survey is to give a comprehensive description about the available RGB-D datasets and thus to guide researchers in the selection of suitable datasets for evaluating their algorithms
Multi-View Deep Learning for Consistent Semantic Mapping with RGB-D Cameras
Visual scene understanding is an important capability that enables robots to
purposefully act in their environment. In this paper, we propose a novel
approach to object-class segmentation from multiple RGB-D views using deep
learning. We train a deep neural network to predict object-class semantics that
is consistent from several view points in a semi-supervised way. At test time,
the semantics predictions of our network can be fused more consistently in
semantic keyframe maps than predictions of a network trained on individual
views. We base our network architecture on a recent single-view deep learning
approach to RGB and depth fusion for semantic object-class segmentation and
enhance it with multi-scale loss minimization. We obtain the camera trajectory
using RGB-D SLAM and warp the predictions of RGB-D images into ground-truth
annotated frames in order to enforce multi-view consistency during training. At
test time, predictions from multiple views are fused into keyframes. We propose
and analyze several methods for enforcing multi-view consistency during
training and testing. We evaluate the benefit of multi-view consistency
training and demonstrate that pooling of deep features and fusion over multiple
views outperforms single-view baselines on the NYUDv2 benchmark for semantic
segmentation. Our end-to-end trained network achieves state-of-the-art
performance on the NYUDv2 dataset in single-view segmentation as well as
multi-view semantic fusion.Comment: the 2017 IEEE/RSJ International Conference on Intelligent Robots and
Systems (IROS 2017
Real-time Monocular Object SLAM
We present a real-time object-based SLAM system that leverages the largest
object database to date. Our approach comprises two main components: 1) a
monocular SLAM algorithm that exploits object rigidity constraints to improve
the map and find its real scale, and 2) a novel object recognition algorithm
based on bags of binary words, which provides live detections with a database
of 500 3D objects. The two components work together and benefit each other: the
SLAM algorithm accumulates information from the observations of the objects,
anchors object features to especial map landmarks and sets constrains on the
optimization. At the same time, objects partially or fully located within the
map are used as a prior to guide the recognition algorithm, achieving higher
recall. We evaluate our proposal on five real environments showing improvements
on the accuracy of the map and efficiency with respect to other
state-of-the-art techniques
- …