940 research outputs found
Probabilistic Global Scale Estimation for MonoSLAM Based on Generic Object Detection
This paper proposes a novel method to estimate the global scale of a 3D
reconstructed model within a Kalman filtering-based monocular SLAM algorithm.
Our Bayesian framework integrates height priors over the detected objects
belonging to a set of broad predefined classes, based on recent advances in
fast generic object detection. Each observation is produced on single frames,
so that we do not need a data association process along video frames. This is
because we associate the height priors with the image region sizes at image
places where map features projections fall within the object detection regions.
We present very promising results of this approach obtained on several
experiments with different object classes.Comment: Int. Workshop on Visual Odometry, CVPR, (July 2017
Scale-Adaptive Neural Dense Features: Learning via Hierarchical Context Aggregation
How do computers and intelligent agents view the world around them? Feature
extraction and representation constitutes one the basic building blocks towards
answering this question. Traditionally, this has been done with carefully
engineered hand-crafted techniques such as HOG, SIFT or ORB. However, there is
no ``one size fits all'' approach that satisfies all requirements. In recent
years, the rising popularity of deep learning has resulted in a myriad of
end-to-end solutions to many computer vision problems. These approaches, while
successful, tend to lack scalability and can't easily exploit information
learned by other systems. Instead, we propose SAND features, a dedicated deep
learning solution to feature extraction capable of providing hierarchical
context information. This is achieved by employing sparse relative labels
indicating relationships of similarity/dissimilarity between image locations.
The nature of these labels results in an almost infinite set of dissimilar
examples to choose from. We demonstrate how the selection of negative examples
during training can be used to modify the feature space and vary it's
properties. To demonstrate the generality of this approach, we apply the
proposed features to a multitude of tasks, each requiring different properties.
This includes disparity estimation, semantic segmentation, self-localisation
and SLAM. In all cases, we show how incorporating SAND features results in
better or comparable results to the baseline, whilst requiring little to no
additional training. Code can be found at:
https://github.com/jspenmar/SAND_featuresComment: CVPR201
Eliminating Scale Drift in Monocular SLAM Using Depth from Defocus
© 2017 IEEE. This letter presents a novel approach to correct errors caused by accumulated scale drift in monocular SLAM. It is shown that the metric scale can be estimated using information gathered through monocular SLAM and image blur due to defocus. A nonlinear least squares optimization problem is formulated to integrate depth estimates from defocus to monocular SLAM. An algorithm to process the output keyframe and feature location estimates generated by a monocular SLAM algorithm to correct for scale drift at selected local regions of the environment is presented. The proposed algorithm is experimentally evaluated by processing the output of ORB-SLAM to obtain accurate metric scale maps from a monocular camera without any prior knowledge about the scene
Estimation of Absolute Scale in Monocular SLAM Using Synthetic Data
This paper addresses the problem of scale estimation in monocular SLAM by
estimating absolute distances between camera centers of consecutive image
frames. These estimates would improve the overall performance of classical (not
deep) SLAM systems and allow metric feature locations to be recovered from a
single monocular camera. We propose several network architectures that lead to
an improvement of scale estimation accuracy over the state of the art. In
addition, we exploit a possibility to train the neural network only with
synthetic data derived from a computer graphics simulator. Our key insight is
that, using only synthetic training inputs, we can achieve similar scale
estimation accuracy as that obtained from real data. This fact indicates that
fully annotated simulated data is a viable alternative to existing
deep-learning-based SLAM systems trained on real (unlabeled) data. Our
experiments with unsupervised domain adaptation also show that the difference
in visual appearance between simulated and real data does not affect scale
estimation results. Our method operates with low-resolution images (0.03MP),
which makes it practical for real-time SLAM applications with a monocular
camera
J-MOD: Joint Monocular Obstacle Detection and Depth Estimation
In this work, we propose an end-to-end deep architecture that jointly learns
to detect obstacles and estimate their depth for MAV flight applications. Most
of the existing approaches either rely on Visual SLAM systems or on depth
estimation models to build 3D maps and detect obstacles. However, for the task
of avoiding obstacles this level of complexity is not required. Recent works
have proposed multi task architectures to both perform scene understanding and
depth estimation. We follow their track and propose a specific architecture to
jointly estimate depth and obstacles, without the need to compute a global map,
but maintaining compatibility with a global SLAM system if needed. The network
architecture is devised to exploit the joint information of the obstacle
detection task, that produces more reliable bounding boxes, with the depth
estimation one, increasing the robustness of both to scenario changes. We call
this architecture J-MOD. We test the effectiveness of our approach with
experiments on sequences with different appearance and focal lengths and
compare it to SotA multi task methods that jointly perform semantic
segmentation and depth estimation. In addition, we show the integration in a
full system using a set of simulated navigation experiments where a MAV
explores an unknown scenario and plans safe trajectories by using our detection
model
Improved Real-Time Monocular SLAM Using Semantic Segmentation on Selective Frames
Monocular simultaneous localization and mapping (SLAM) is emerging in
advanced driver assistance systems and autonomous driving, because a single
camera is cheap and easy to install. Conventional monocular SLAM has two major
challenges leading inaccurate localization and mapping. First, it is
challenging to estimate scales in localization and mapping. Second,
conventional monocular SLAM uses inappropriate mapping factors such as dynamic
objects and low-parallax areas in mapping. This paper proposes an improved
real-time monocular SLAM that resolves the aforementioned challenges by
efficiently using deep learning-based semantic segmentation. To achieve the
real-time execution of the proposed method, we apply semantic segmentation only
to downsampled keyframes in parallel with mapping processes. In addition, the
proposed method corrects scales of camera poses and three-dimensional (3D)
points, using estimated ground plane from road-labeled 3D points and the real
camera height. The proposed method also removes inappropriate corner features
labeled as moving objects and low parallax areas. Experiments with eight video
sequences demonstrate that the proposed monocular SLAM system achieves
significantly improved and comparable trajectory tracking accuracy, compared to
existing state-of-the-art monocular and stereo SLAM systems, respectively. The
proposed system can achieve real-time tracking on a standard CPU potentially
with a standard GPU support, whereas existing segmentation-aided monocular SLAM
does not
Visual SLAM muuttuvissa ympäristöissä
This thesis investigates the problem of Visual Simultaneous Localization and Mapping (vSLAM) in
changing environments. The vSLAM problem is to sequentially estimate the pose of a device with
mounted cameras in a map generated based on images taken with those cameras. vSLAM algorithms
face two main challenges in changing environments: moving objects and temporal appearance
changes. Moving objects cause problems in pose estimation if they are mistaken for static objects.
Moving objects also cause problems for loop closure detection (LCD), which is the problem of
detecting whether a previously visited place has been revisited. A same moving object observed
in two different places may cause false loop closures to be detected. Temporal appearance changes
such as those brought about by time of day or weather changes cause long-term data association
errors for LCD. These cause difficulties in recognizing previously visited places after they have
undergone appearance changes. Focus is placed on LCD, which turns out to be the part of vSLAM
that changing environment affects the most. In addition, several techniques and algorithms for
Visual Place Recognition (VPR) in challenging conditions that could be used in the context of
LCD are surveyed and the performance of two state-of-the-art modern VPR algorithms in changing
environments is assessed in an experiment in order to measure their applicability for LCD. The
most severe performance degrading appearance changes are found to be those caused by change in
season and illumination. Several algorithms and techniques that perform well in loop closure related
tasks in specific environmental conditions are identified as a result of the survey. Finally, a limited
experiment on the Nordland dataset implies that the tested VPR algorithms are usable as is or can
be modified for use in long-term LCD. As a part of the experiment, a new simple neighborhood
consistency check was also developed, evaluated, and found to be effective at reducing false positives
output by the tested VPR algorithms
- …