7,528 research outputs found
Semantic Visual Localization
Robust visual localization under a wide range of viewing conditions is a
fundamental problem in computer vision. Handling the difficult cases of this
problem is not only very challenging but also of high practical relevance,
e.g., in the context of life-long localization for augmented reality or
autonomous robots. In this paper, we propose a novel approach based on a joint
3D geometric and semantic understanding of the world, enabling it to succeed
under conditions where previous approaches failed. Our method leverages a novel
generative model for descriptor learning, trained on semantic scene completion
as an auxiliary task. The resulting 3D descriptors are robust to missing
observations by encoding high-level 3D geometric and semantic information.
Experiments on several challenging large-scale localization datasets
demonstrate reliable localization under extreme viewpoint, illumination, and
geometry changes
Semantic Cross-View Matching
Matching cross-view images is challenging because the appearance and
viewpoints are significantly different. While low-level features based on
gradient orientations or filter responses can drastically vary with such
changes in viewpoint, semantic information of images however shows an invariant
characteristic in this respect. Consequently, semantically labeled regions can
be used for performing cross-view matching. In this paper, we therefore explore
this idea and propose an automatic method for detecting and representing the
semantic information of an RGB image with the goal of performing cross-view
matching with a (non-RGB) geographic information system (GIS). A segmented
image forms the input to our system with segments assigned to semantic concepts
such as traffic signs, lakes, roads, foliage, etc. We design a descriptor to
robustly capture both, the presence of semantic concepts and the spatial layout
of those segments. Pairwise distances between the descriptors extracted from
the GIS map and the query image are then used to generate a shortlist of the
most promising locations with similar semantic concepts in a consistent spatial
layout. An experimental evaluation with challenging query images and a large
urban area shows promising results
Video Registration in Egocentric Vision under Day and Night Illumination Changes
With the spread of wearable devices and head mounted cameras, a wide range of
application requiring precise user localization is now possible. In this paper
we propose to treat the problem of obtaining the user position with respect to
a known environment as a video registration problem. Video registration, i.e.
the task of aligning an input video sequence to a pre-built 3D model, relies on
a matching process of local keypoints extracted on the query sequence to a 3D
point cloud. The overall registration performance is strictly tied to the
actual quality of this 2D-3D matching, and can degrade if environmental
conditions such as steep changes in lighting like the ones between day and
night occur. To effectively register an egocentric video sequence under these
conditions, we propose to tackle the source of the problem: the matching
process. To overcome the shortcomings of standard matching techniques, we
introduce a novel embedding space that allows us to obtain robust matches by
jointly taking into account local descriptors, their spatial arrangement and
their temporal robustness. The proposal is evaluated using unconstrained
egocentric video sequences both in terms of matching quality and resulting
registration performance using different 3D models of historical landmarks. The
results show that the proposed method can outperform state of the art
registration algorithms, in particular when dealing with the challenges of
night and day sequences
Night-to-Day Image Translation for Retrieval-based Localization
Visual localization is a key step in many robotics pipelines, allowing the
robot to (approximately) determine its position and orientation in the world.
An efficient and scalable approach to visual localization is to use image
retrieval techniques. These approaches identify the image most similar to a
query photo in a database of geo-tagged images and approximate the query's pose
via the pose of the retrieved database image. However, image retrieval across
drastically different illumination conditions, e.g. day and night, is still a
problem with unsatisfactory results, even in this age of powerful neural
models. This is due to a lack of a suitably diverse dataset with true
correspondences to perform end-to-end learning. A recent class of neural models
allows for realistic translation of images among visual domains with relatively
little training data and, most importantly, without ground-truth pairings. In
this paper, we explore the task of accurately localizing images captured from
two traversals of the same area in both day and night. We propose ToDayGAN - a
modified image-translation model to alter nighttime driving images to a more
useful daytime representation. We then compare the daytime and translated night
images to obtain a pose estimate for the night image using the known 6-DOF
position of the closest day image. Our approach improves localization
performance by over 250% compared the current state-of-the-art, in the context
of standard metrics in multiple categories.Comment: Published in ICRA 201
- …