26 research outputs found
Real-Time RGB-D Camera Pose Estimation in Novel Scenes using a Relocalisation Cascade
Camera pose estimation is an important problem in computer vision. Common
techniques either match the current image against keyframes with known poses,
directly regress the pose, or establish correspondences between keypoints in
the image and points in the scene to estimate the pose. In recent years,
regression forests have become a popular alternative to establish such
correspondences. They achieve accurate results, but have traditionally needed
to be trained offline on the target scene, preventing relocalisation in new
environments. Recently, we showed how to circumvent this limitation by adapting
a pre-trained forest to a new scene on the fly. The adapted forests achieved
relocalisation performance that was on par with that of offline forests, and
our approach was able to estimate the camera pose in close to real time. In
this paper, we present an extension of this work that achieves significantly
better relocalisation performance whilst running fully in real time. To achieve
this, we make several changes to the original approach: (i) instead of
accepting the camera pose hypothesis without question, we make it possible to
score the final few hypotheses using a geometric approach and select the most
promising; (ii) we chain several instantiations of our relocaliser together in
a cascade, allowing us to try faster but less accurate relocalisation first,
only falling back to slower, more accurate relocalisation as necessary; and
(iii) we tune the parameters of our cascade to achieve effective overall
performance. These changes allow us to significantly improve upon the
performance our original state-of-the-art method was able to achieve on the
well-known 7-Scenes and Stanford 4 Scenes benchmarks. As additional
contributions, we present a way of visualising the internal behaviour of our
forests and show how to entirely circumvent the need to pre-train a forest on a
generic scene.Comment: Tommaso Cavallari, Stuart Golodetz, Nicholas Lord and Julien Valentin
assert joint first authorshi
InfiniTAM v3: A Framework for Large-Scale 3D Reconstruction with Loop Closure
Volumetric models have become a popular representation for 3D scenes in
recent years. One breakthrough leading to their popularity was KinectFusion,
which focuses on 3D reconstruction using RGB-D sensors. However, monocular SLAM
has since also been tackled with very similar approaches. Representing the
reconstruction volumetrically as a TSDF leads to most of the simplicity and
efficiency that can be achieved with GPU implementations of these systems.
However, this representation is memory-intensive and limits applicability to
small-scale reconstructions. Several avenues have been explored to overcome
this. With the aim of summarizing them and providing for a fast, flexible 3D
reconstruction pipeline, we propose a new, unifying framework called InfiniTAM.
The idea is that steps like camera tracking, scene representation and
integration of new data can easily be replaced and adapted to the user's needs.
This report describes the technical implementation details of InfiniTAM v3,
the third version of our InfiniTAM system. We have added various new features,
as well as making numerous enhancements to the low-level code that
significantly improve our camera tracking performance. The new features that we
expect to be of most interest are (i) a robust camera tracking module; (ii) an
implementation of Glocker et al.'s keyframe-based random ferns camera
relocaliser; (iii) a novel approach to globally-consistent TSDF-based
reconstruction, based on dividing the scene into rigid submaps and optimising
the relative poses between them; and (iv) an implementation of Keller et al.'s
surfel-based reconstruction approach.Comment: This article largely supersedes arxiv:1410.0925 (it describes version
3 of the InfiniTAM framework
Image-Based Localization Using Context
Image-based localization problem consists of estimating the 6 DoFcamera pose by matching the image to a 3D point cloud (or equivalent)representing a 3D environment. The robustness and accuracyof current solutions is not objective and quantifiable. Wehave completed a comparative analysis of the main state of the artapproaches, namely Brute Force Matching, Approximate NearestNeighbour Matching, Embedded Ferns Classification, ACG Localizer(Using Visual Vocabulary) and Keyframe Matching Approach.The results of the study revealed major deficiencies in each approachmainly in search space reduction, clustering, feature matchingand sensitivity to where the query image was taken. Then, wechoose to focus on one common major problem that is reducingthe search space. We propose to create a new image-based localizationapproach based on reducing the search space by usingglobal descriptors to find candidate keyframes in the database thensearch against the 3D points that are only seen from these candidatesusing local descriptors stored in a 3D cloud map
Exploiting Points and Lines in Regression Forests for RGB-D Camera Relocalization
Camera relocalization plays a vital role in many robotics and computer vision
tasks, such as global localization, recovery from tracking failure and loop
closure detection. Recent random forests based methods exploit randomly sampled
pixel comparison features to predict 3D world locations for 2D image locations
to guide the camera pose optimization. However, these image features are only
sampled randomly in the images, without considering the spatial structures or
geometric information, leading to large errors or failure cases with the
existence of poorly textured areas or in motion blur. Line segment features are
more robust in these environments. In this work, we propose to jointly exploit
points and lines within the framework of uncertainty driven regression forests.
The proposed approach is thoroughly evaluated on three publicly available
datasets against several strong state-of-the-art baselines in terms of several
different error metrics. Experimental results prove the efficacy of our method,
showing superior or on-par state-of-the-art performance.Comment: published as a conference paper at 2018 IEEE/RSJ International
Conference on Intelligent Robots and Systems (IROS
Efficient Image-Based Localization Using Context
Image-Based Localization (IBL) is the problem of computing the position and orientation of a camera with respect to a geometric representation of the scene. A fundamental building block of IBL is searching the space of a saved 3D representation of the scene for correspondences to a query image. The robustness and accuracy of the IBL approaches in the literature are not objective and quantifiable.
First, this thesis presents a detailed description and study of three different 3D modeling packages based on SFM to reconstruct a 3D map of an environment. The packages tested are VSFM, Bundler and PTAM. The objective is to assess the mapping ability of each of the techniques and choose the best one to use for reconstructing the IBL 3D map. The study results show that image matching which is the bottleneck of SFM, SLAM and IBL plays the major role in favour of VSFM. This will result in using wrong matches in building the 3D map. It is crucial for IBL to choose the software that provides the best quality of points, \textit{i.e.} the largest number of correct 3D points. For this reason, VSFM will be chosen to reconstruct the 3D maps for IBL.
Second, this work presents a comparative study of the main approaches, namely Brute Force Matching, Tree-Based Approach, Embedded Ferns Classification, ACG Localizer, Keyframe Approach, Decision Forest, Worldwide Pose Estimation and MPEG Search Space Reduction. The objective of the comparative analysis was to first uncover the specifics of each of these techniques and thereby understand the advantages and disadvantages of each of them. The testing was performed on Dubrovnik Dataset where the localization is determined with respect to a 3D cloud map which was computed using a Structure-from-Motion approach. The study results show that the current state of the art IBL solutions still face challenges in search space reduction, feature matching, clustering, and the quality of the solution is not consistent across all query images.
Third, this work addresses the search space problem in order to solve the IBL problem. The Gist-based Search Space Reduction (GSSR), an efficient alternative to the available search space solutions, is proposed. It relies on GIST descriptors to considerably reduce search space and computational time, while at the same exceeding the state of the art in localization accuracy. Experiments on the 7 scenes datasets of Microsoft Research reveal considerable speedups for GSSR versus tree-based approaches, reaching a 4 times faster speed for the Heads dataset, and reducing the search space by an average of 92% while maintaining a better accuracy
Collaborative large-scale dense 3D reconstruction with online inter-agent pose optimisation
Reconstructing dense, volumetric models of real-world 3D scenes is important for many tasks, but capturing large scenes can take significant time, and the risk of transient changes to the scene goes up as the capture time increases. These are good reasons to want instead to capture several smaller sub-scenes that can be joined to make the whole scene. Achieving this has traditionally been difficult: joining sub-scenes that may never have been viewed from the same angle requires a high-quality camera relocaliser that can cope with novel poses, and tracking drift in each sub-scene can prevent them from being joined to make a consistent overall scene. Recent advances, however, have significantly improved our ability to capture medium-sized sub-scenes with little to no tracking drift: real-time globally consistent reconstruction systems can close loops and re-integrate the scene surface on the fly, whilst new visual-inertial odometry approaches can significantly reduce tracking drift during live reconstruction. Moreover, high-quality regression forest-based relocalisers have recently been made more practical by the introduction of a method to allow them to be trained and used online. In this paper, we leverage these advances to present what to our knowledge is the first system to allow multiple users to collaborate interactively to reconstruct dense, voxel-based models of whole buildings using only consumer-grade hardware, a task that has traditionally been both time-consuming and dependent on the availability of specialised hardware. Using our system, an entire house or lab can be reconstructed in under half an hour and at a far lower cost than was previously possible