26 research outputs found
Understanding the Limitations of CNN-based Absolute Camera Pose Regression
Visual localization is the task of accurate camera pose estimation in a known
scene. It is a key problem in computer vision and robotics, with applications
including self-driving cars, Structure-from-Motion, SLAM, and Mixed Reality.
Traditionally, the localization problem has been tackled using 3D geometry.
Recently, end-to-end approaches based on convolutional neural networks have
become popular. These methods learn to directly regress the camera pose from an
input image. However, they do not achieve the same level of pose accuracy as 3D
structure-based methods. To understand this behavior, we develop a theoretical
model for camera pose regression. We use our model to predict failure cases for
pose regression techniques and verify our predictions through experiments. We
furthermore use our model to show that pose regression is more closely related
to pose approximation via image retrieval than to accurate pose estimation via
3D structure. A key result is that current approaches do not consistently
outperform a handcrafted image retrieval baseline. This clearly shows that
additional research is needed before pose regression algorithms are ready to
compete with structure-based methods.Comment: Initial version of a paper accepted to CVPR 201
Calibrated and Partially Calibrated Semi-Generalized Homographies
In this paper, we propose the first minimal solutions for estimating the
semi-generalized homography given a perspective and a generalized camera. The
proposed solvers use five 2D-2D image point correspondences induced by a scene
plane. One of them assumes the perspective camera to be fully calibrated, while
the other solver estimates the unknown focal length together with the absolute
pose parameters. This setup is particularly important in structure-from-motion
and image-based localization pipelines, where a new camera is localized in each
step with respect to a set of known cameras and 2D-3D correspondences might not
be available. As a consequence of a clever parametrization and the elimination
ideal method, our approach only needs to solve a univariate polynomial of
degree five or three. The proposed solvers are stable and efficient as
demonstrated by a number of synthetic and real-world experiments
AtLoc: Attention Guided Camera Localization
Deep learning has achieved impressive results in camera localization, but
current single-image techniques typically suffer from a lack of robustness,
leading to large outliers. To some extent, this has been tackled by sequential
(multi-images) or geometry constraint approaches, which can learn to reject
dynamic objects and illumination conditions to achieve better performance. In
this work, we show that attention can be used to force the network to focus on
more geometrically robust objects and features, achieving state-of-the-art
performance in common benchmark, even if using only a single image as input.
Extensive experimental evidence is provided through public indoor and outdoor
datasets. Through visualization of the saliency maps, we demonstrate how the
network learns to reject dynamic objects, yielding superior global camera pose
regression performance. The source code is avaliable at
https://github.com/BingCS/AtLoc
Can You Trust Your Pose? Confidence Estimation in Visual Localization
Camera pose estimation in large-scale environments is still an open question
and, despite recent promising results, it may still fail in some situations.
The research so far has focused on improving subcomponents of estimation
pipelines, to achieve more accurate poses. However, there is no guarantee for
the result to be correct, even though the correctness of pose estimation is
critically important in several visual localization applications,such as in
autonomous navigation. In this paper we bring to attention a novel research
question, pose confidence estimation,where we aim at quantifying how reliable
the visually estimated pose is. We develop a novel confidence measure to fulfil
this task and show that it can be flexibly applied to different datasets,indoor
or outdoor, and for various visual localization pipelines.We also show that the
proposed techniques can be used to accomplish a secondary goal: improving the
accuracy of existing pose estimation pipelines. Finally, the proposed approach
is computationally light-weight and adds only a negligible increase to the
computational effort of pose estimation.Comment: To appear in ICPR 202