73,821 research outputs found
Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions
Visual localization enables autonomous vehicles to navigate in their
surroundings and augmented reality applications to link virtual to real worlds.
Practical visual localization approaches need to be robust to a wide variety of
viewing condition, including day-night changes, as well as weather and seasonal
variations, while providing highly accurate 6 degree-of-freedom (6DOF) camera
pose estimates. In this paper, we introduce the first benchmark datasets
specifically designed for analyzing the impact of such factors on visual
localization. Using carefully created ground truth poses for query images taken
under a wide variety of conditions, we evaluate the impact of various factors
on 6DOF camera pose estimation accuracy through extensive experiments with
state-of-the-art localization approaches. Based on our results, we draw
conclusions about the difficulty of different conditions, showing that
long-term localization is far from solved, and propose promising avenues for
future work, including sequence-based localization approaches and the need for
better local features. Our benchmark is available at visuallocalization.net.Comment: Accepted to CVPR 2018 as a spotligh
Video Registration in Egocentric Vision under Day and Night Illumination Changes
With the spread of wearable devices and head mounted cameras, a wide range of
application requiring precise user localization is now possible. In this paper
we propose to treat the problem of obtaining the user position with respect to
a known environment as a video registration problem. Video registration, i.e.
the task of aligning an input video sequence to a pre-built 3D model, relies on
a matching process of local keypoints extracted on the query sequence to a 3D
point cloud. The overall registration performance is strictly tied to the
actual quality of this 2D-3D matching, and can degrade if environmental
conditions such as steep changes in lighting like the ones between day and
night occur. To effectively register an egocentric video sequence under these
conditions, we propose to tackle the source of the problem: the matching
process. To overcome the shortcomings of standard matching techniques, we
introduce a novel embedding space that allows us to obtain robust matches by
jointly taking into account local descriptors, their spatial arrangement and
their temporal robustness. The proposal is evaluated using unconstrained
egocentric video sequences both in terms of matching quality and resulting
registration performance using different 3D models of historical landmarks. The
results show that the proposed method can outperform state of the art
registration algorithms, in particular when dealing with the challenges of
night and day sequences
Adversarial Training for Adverse Conditions: Robust Metric Localisation using Appearance Transfer
We present a method of improving visual place recognition and metric
localisation under very strong appear- ance change. We learn an invertable
generator that can trans- form the conditions of images, e.g. from day to
night, summer to winter etc. This image transforming filter is explicitly
designed to aid and abet feature-matching using a new loss based on SURF
detector and dense descriptor maps. A network is trained to output synthetic
images optimised for feature matching given only an input RGB image, and these
generated images are used to localize the robot against a previously built map
using traditional sparse matching approaches. We benchmark our results using
multiple traversals of the Oxford RobotCar Dataset over a year-long period,
using one traversal as a map and the other to localise. We show that this
method significantly improves place recognition and localisation under changing
and adverse conditions, while reducing the number of mapping runs needed to
successfully achieve reliable localisation.Comment: Accepted at ICRA201
D2-Net: A Trainable CNN for Joint Detection and Description of Local Features
In this work we address the problem of finding reliable pixel-level
correspondences under difficult imaging conditions. We propose an approach
where a single convolutional neural network plays a dual role: It is
simultaneously a dense feature descriptor and a feature detector. By postponing
the detection to a later stage, the obtained keypoints are more stable than
their traditional counterparts based on early detection of low-level
structures. We show that this model can be trained using pixel correspondences
extracted from readily available large-scale SfM reconstructions, without any
further annotations. The proposed method obtains state-of-the-art performance
on both the difficult Aachen Day-Night localization dataset and the InLoc
indoor localization benchmark, as well as competitive performance on other
benchmarks for image matching and 3D reconstruction.Comment: Accepted at CVPR 201
Night-to-Day Image Translation for Retrieval-based Localization
Visual localization is a key step in many robotics pipelines, allowing the
robot to (approximately) determine its position and orientation in the world.
An efficient and scalable approach to visual localization is to use image
retrieval techniques. These approaches identify the image most similar to a
query photo in a database of geo-tagged images and approximate the query's pose
via the pose of the retrieved database image. However, image retrieval across
drastically different illumination conditions, e.g. day and night, is still a
problem with unsatisfactory results, even in this age of powerful neural
models. This is due to a lack of a suitably diverse dataset with true
correspondences to perform end-to-end learning. A recent class of neural models
allows for realistic translation of images among visual domains with relatively
little training data and, most importantly, without ground-truth pairings. In
this paper, we explore the task of accurately localizing images captured from
two traversals of the same area in both day and night. We propose ToDayGAN - a
modified image-translation model to alter nighttime driving images to a more
useful daytime representation. We then compare the daytime and translated night
images to obtain a pose estimate for the night image using the known 6-DOF
position of the closest day image. Our approach improves localization
performance by over 250% compared the current state-of-the-art, in the context
of standard metrics in multiple categories.Comment: Published in ICRA 201
- …