90 research outputs found
Real-Time RGB-D Camera Pose Estimation in Novel Scenes using a Relocalisation Cascade
Camera pose estimation is an important problem in computer vision. Common
techniques either match the current image against keyframes with known poses,
directly regress the pose, or establish correspondences between keypoints in
the image and points in the scene to estimate the pose. In recent years,
regression forests have become a popular alternative to establish such
correspondences. They achieve accurate results, but have traditionally needed
to be trained offline on the target scene, preventing relocalisation in new
environments. Recently, we showed how to circumvent this limitation by adapting
a pre-trained forest to a new scene on the fly. The adapted forests achieved
relocalisation performance that was on par with that of offline forests, and
our approach was able to estimate the camera pose in close to real time. In
this paper, we present an extension of this work that achieves significantly
better relocalisation performance whilst running fully in real time. To achieve
this, we make several changes to the original approach: (i) instead of
accepting the camera pose hypothesis without question, we make it possible to
score the final few hypotheses using a geometric approach and select the most
promising; (ii) we chain several instantiations of our relocaliser together in
a cascade, allowing us to try faster but less accurate relocalisation first,
only falling back to slower, more accurate relocalisation as necessary; and
(iii) we tune the parameters of our cascade to achieve effective overall
performance. These changes allow us to significantly improve upon the
performance our original state-of-the-art method was able to achieve on the
well-known 7-Scenes and Stanford 4 Scenes benchmarks. As additional
contributions, we present a way of visualising the internal behaviour of our
forests and show how to entirely circumvent the need to pre-train a forest on a
generic scene.Comment: Tommaso Cavallari, Stuart Golodetz, Nicholas Lord and Julien Valentin
assert joint first authorshi
Implementation of a Blind navigation method in outdoors/indoors areas
According to WHO statistics, the number of visually impaired people is
increasing annually. One of the most critical necessities for visually impaired
people is the ability to navigate safely. This paper proposes a navigation
system based on the visual slam and Yolo algorithm using monocular cameras. The
proposed system consists of three steps: obstacle distance estimation, path
deviation detection, and next-step prediction. Using the ORB-SLAM algorithm,
the proposed method creates a map from a predefined route and guides the users
to stay on the route while notifying them if they deviate from it.
Additionally, the system utilizes the YOLO algorithm to detect obstacles along
the route and alert the user. The experimental results, obtained by using a
laptop camera, show that the proposed system can run in 30 frame per second
while guiding the user within predefined routes of 11 meters in indoors and
outdoors. The accuracy of the positioning system is 8cm, and the system
notifies the users if they deviate from the predefined route by more than 60
cm.Comment: 14 pages, 6 figures and 6 table
Loosely-Coupled Semi-Direct Monocular SLAM
We propose a novel semi-direct approach for monocular simultaneous
localization and mapping (SLAM) that combines the complementary strengths of
direct and feature-based methods. The proposed pipeline loosely couples direct
odometry and feature-based SLAM to perform three levels of parallel
optimizations: (1) photometric bundle adjustment (BA) that jointly optimizes
the local structure and motion, (2) geometric BA that refines keyframe poses
and associated feature map points, and (3) pose graph optimization to achieve
global map consistency in the presence of loop closures. This is achieved in
real-time by limiting the feature-based operations to marginalized keyframes
from the direct odometry module. Exhaustive evaluation on two benchmark
datasets demonstrates that our system outperforms the state-of-the-art
monocular odometry and SLAM systems in terms of overall accuracy and
robustness.Comment: Accepted for publication in IEEE Robotics and Automation Letters.
Watch video demo at: https://youtu.be/j7WnU7ZpZ8
InfiniTAM v3: A Framework for Large-Scale 3D Reconstruction with Loop Closure
Volumetric models have become a popular representation for 3D scenes in
recent years. One breakthrough leading to their popularity was KinectFusion,
which focuses on 3D reconstruction using RGB-D sensors. However, monocular SLAM
has since also been tackled with very similar approaches. Representing the
reconstruction volumetrically as a TSDF leads to most of the simplicity and
efficiency that can be achieved with GPU implementations of these systems.
However, this representation is memory-intensive and limits applicability to
small-scale reconstructions. Several avenues have been explored to overcome
this. With the aim of summarizing them and providing for a fast, flexible 3D
reconstruction pipeline, we propose a new, unifying framework called InfiniTAM.
The idea is that steps like camera tracking, scene representation and
integration of new data can easily be replaced and adapted to the user's needs.
This report describes the technical implementation details of InfiniTAM v3,
the third version of our InfiniTAM system. We have added various new features,
as well as making numerous enhancements to the low-level code that
significantly improve our camera tracking performance. The new features that we
expect to be of most interest are (i) a robust camera tracking module; (ii) an
implementation of Glocker et al.'s keyframe-based random ferns camera
relocaliser; (iii) a novel approach to globally-consistent TSDF-based
reconstruction, based on dividing the scene into rigid submaps and optimising
the relative poses between them; and (iv) an implementation of Keller et al.'s
surfel-based reconstruction approach.Comment: This article largely supersedes arxiv:1410.0925 (it describes version
3 of the InfiniTAM framework
ORB-SLAM: A Versatile and Accurate Monocular SLAM System
This paper presents ORB-SLAM, a feature-based monocular simultaneous localization and mapping (SLAM) system that operates in real time, in small and large indoor and outdoor environments. The system is robust to severe motion clutter, allows wide baseline loop closing and relocalization, and includes full automatic initialization. Building on excellent algorithms of recent years, we designed from scratch a novel system that uses the same features for all SLAM tasks: tracking, mapping, relocalization, and loop closing. A survival of the fittest strategy that selects the points and keyframes of the reconstruction leads to excellent robustness and generates a compact and trackable map that only grows if the scene content changes, allowing lifelong operation. We present an exhaustive evaluation in 27 sequences from the most popular datasets. ORB-SLAM achieves unprecedented performance with respect to other state-of-the-art monocular SLAM approaches. For the benefit of the community, we make the source code public
- …