4 research outputs found
Scene Coordinate and Correspondence Learning for Image-Based Localization
Scene coordinate regression has become an essential part of current camera
re-localization methods. Different versions, such as regression forests and
deep learning methods, have been successfully applied to estimate the
corresponding camera pose given a single input image. In this work, we propose
to regress the scene coordinates pixel-wise for a given RGB image by using deep
learning. Compared to the recent methods, which usually employ RANSAC to obtain
a robust pose estimate from the established point correspondences, we propose
to regress confidences of these correspondences, which allows us to immediately
discard erroneous predictions and improve the initial pose estimates. Finally,
the resulting confidences can be used to score initial pose hypothesis and aid
in pose refinement, offering a generalized solution to solve this task
Adversarial Networks for Camera Pose Regression and Refinement
Despite recent advances on the topic of direct camera pose regression using
neural networks, accurately estimating the camera pose of a single RGB image
still remains a challenging task. To address this problem, we introduce a novel
framework based, in its core, on the idea of implicitly learning the joint
distribution of RGB images and their corresponding camera poses using a
discriminator network and adversarial learning. Our method allows not only to
regress the camera pose from a single image, however, also offers a solely
RGB-based solution for camera pose refinement using the discriminator network.
Further, we show that our method can effectively be used to optimize the
predicted camera poses and thus improve the localization accuracy. To this end,
we validate our proposed method on the publicly available 7-Scenes dataset
improving upon the results of direct camera pose regression methods
Hierarchical Scene Coordinate Classification and Regression for Visual Localization
Visual localization is critical to many applications in computer vision and
robotics. To address single-image RGB localization, state-of-the-art
feature-based methods match local descriptors between a query image and a
pre-built 3D model. Recently, deep neural networks have been exploited to
regress the mapping between raw pixels and 3D coordinates in the scene, and
thus the matching is implicitly performed by the forward pass through the
network. However, in a large and ambiguous environment, learning such a
regression task directly can be difficult for a single network. In this work,
we present a new hierarchical scene coordinate network to predict pixel scene
coordinates in a coarse-to-fine manner from a single RGB image. The network
consists of a series of output layers, each of them conditioned on the previous
ones. The final output layer predicts the 3D coordinates and the others produce
progressively finer discrete location labels. The proposed method outperforms
the baseline regression-only network and allows us to train compact models
which scale robustly to large environments. It sets a new state-of-the-art for
single-image RGB localization performance on the 7-Scenes, 12-Scenes, Cambridge
Landmarks datasets, and three combined scenes. Moreover, for large-scale
outdoor localization on the Aachen Day-Night dataset, we present a hybrid
approach which outperforms existing scene coordinate regression methods, and
reduces significantly the performance gap w.r.t. explicit feature matching
methods.Comment: CVPR 202
A Survey on Deep Learning for Localization and Mapping: Towards the Age of Spatial Machine Intelligence
Deep learning based localization and mapping has recently attracted
significant attention. Instead of creating hand-designed algorithms through
exploitation of physical models or geometric theories, deep learning based
solutions provide an alternative to solve the problem in a data-driven way.
Benefiting from ever-increasing volumes of data and computational power, these
methods are fast evolving into a new area that offers accurate and robust
systems to track motion and estimate scenes and their structure for real-world
applications. In this work, we provide a comprehensive survey, and propose a
new taxonomy for localization and mapping using deep learning. We also discuss
the limitations of current models, and indicate possible future directions. A
wide range of topics are covered, from learning odometry estimation, mapping,
to global localization and simultaneous localization and mapping (SLAM). We
revisit the problem of perceiving self-motion and scene understanding with
on-board sensors, and show how to solve it by integrating these modules into a
prospective spatial machine intelligence system (SMIS). It is our hope that
this work can connect emerging works from robotics, computer vision and machine
learning communities, and serve as a guide for future researchers to apply deep
learning to tackle localization and mapping problems.Comment: 26 pages, 10 figures. Project website:
https://github.com/changhao-chen/deep-learning-localization-mappin