1,579 research outputs found
From Coarse to Fine: Robust Hierarchical Localization at Large Scale
Robust and accurate visual localization is a fundamental capability for
numerous applications, such as autonomous driving, mobile robotics, or
augmented reality. It remains, however, a challenging task, particularly for
large-scale environments and in presence of significant appearance changes.
State-of-the-art methods not only struggle with such scenarios, but are often
too resource intensive for certain real-time applications. In this paper we
propose HF-Net, a hierarchical localization approach based on a monolithic CNN
that simultaneously predicts local features and global descriptors for accurate
6-DoF localization. We exploit the coarse-to-fine localization paradigm: we
first perform a global retrieval to obtain location hypotheses and only later
match local features within those candidate places. This hierarchical approach
incurs significant runtime savings and makes our system suitable for real-time
operation. By leveraging learned descriptors, our method achieves remarkable
localization robustness across large variations of appearance and sets a new
state-of-the-art on two challenging benchmarks for large-scale localization.Comment: Camera-ready for CVPR 201
Leveraging Deep Visual Descriptors for Hierarchical Efficient Localization
Many robotics applications require precise pose estimates despite operating
in large and changing environments. This can be addressed by visual
localization, using a pre-computed 3D model of the surroundings. The pose
estimation then amounts to finding correspondences between 2D keypoints in a
query image and 3D points in the model using local descriptors. However,
computational power is often limited on robotic platforms, making this task
challenging in large-scale environments. Binary feature descriptors
significantly speed up this 2D-3D matching, and have become popular in the
robotics community, but also strongly impair the robustness to perceptual
aliasing and changes in viewpoint, illumination and scene structure. In this
work, we propose to leverage recent advances in deep learning to perform an
efficient hierarchical localization. We first localize at the map level using
learned image-wide global descriptors, and subsequently estimate a precise pose
from 2D-3D matches computed in the candidate places only. This restricts the
local search and thus allows to efficiently exploit powerful non-binary
descriptors usually dismissed on resource-constrained devices. Our approach
results in state-of-the-art localization performance while running in real-time
on a popular mobile platform, enabling new prospects for robotics research.Comment: CoRL 2018 Camera-ready (fix typos and update citations
Keyframe-based visual–inertial odometry using nonlinear optimization
Combining visual and inertial measurements has become popular in mobile robotics, since the two sensing modalities offer complementary characteristics that make them the ideal choice for accurate visual–inertial odometry or simultaneous localization and mapping (SLAM). While historically the problem has been addressed with filtering, advancements in visual estimation suggest that nonlinear optimization offers superior accuracy, while still tractable in complexity thanks to the sparsity of the underlying problem. Taking inspiration from these findings, we formulate a rigorously probabilistic cost function that combines reprojection errors of landmarks and inertial terms. The problem is kept tractable and thus ensuring real-time operation by limiting the optimization to a bounded window of keyframes through marginalization. Keyframes may be spaced in time by arbitrary intervals, while still related by linearized inertial terms. We present evaluation results on complementary datasets recorded with our custom-built stereo visual–inertial hardware that accurately synchronizes accelerometer and gyroscope measurements with imagery. A comparison of both a stereo and monocular version of our algorithm with and without online extrinsics estimation is shown with respect to ground truth. Furthermore, we compare the performance to an implementation of a state-of-the-art stochastic cloning sliding-window filter. This competitive reference implementation performs tightly coupled filtering-based visual–inertial odometry. While our approach declaredly demands more computation, we show its superior performance in terms of accuracy
- …