794 research outputs found
Orientation-Guided Contrastive Learning for UAV-View Geo-Localisation
Retrieving relevant multimedia content is one of the main problems in a world
that is increasingly data-driven. With the proliferation of drones, high
quality aerial footage is now available to a wide audience for the first time.
Integrating this footage into applications can enable GPS-less geo-localisation
or location correction.
In this paper, we present an orientation-guided training framework for
UAV-view geo-localisation. Through hierarchical localisation orientations of
the UAV images are estimated in relation to the satellite imagery. We propose a
lightweight prediction module for these pseudo labels which predicts the
orientation between the different views based on the contrastive learned
embeddings. We experimentally demonstrate that this prediction supports the
training and outperforms previous approaches. The extracted pseudo-labels also
enable aligned rotation of the satellite image as augmentation to further
strengthen the generalisation. During inference, we no longer need this
orientation module, which means that no additional computations are required.
We achieve state-of-the-art results on both the University-1652 and
University-160k datasets
Satellite Image Based Cross-view Localization for Autonomous Vehicle
Existing spatial localization techniques for autonomous vehicles mostly use a
pre-built 3D-HD map, often constructed using a survey-grade 3D mapping vehicle,
which is not only expensive but also laborious. This paper shows that by using
an off-the-shelf high-definition satellite image as a ready-to-use map, we are
able to achieve cross-view vehicle localization up to a satisfactory accuracy,
providing a cheaper and more practical way for localization. While the
utilization of satellite imagery for cross-view localization is an established
concept, the conventional methodology focuses primarily on image retrieval.
This paper introduces a novel approach to cross-view localization that departs
from the conventional image retrieval method. Specifically, our method develops
(1) a Geometric-align Feature Extractor (GaFE) that leverages measured 3D
points to bridge the geometric gap between ground and overhead views, (2) a
Pose Aware Branch (PAB) adopting a triplet loss to encourage pose-aware feature
extraction, and (3) a Recursive Pose Refine Branch (RPRB) using the
Levenberg-Marquardt (LM) algorithm to align the initial pose towards the true
vehicle pose iteratively. Our method is validated on KITTI and Ford Multi-AV
Seasonal datasets as ground view and Google Maps as the satellite view. The
results demonstrate the superiority of our method in cross-view localization
with median spatial and angular errors within meter and ,
respectively.Comment: Accepted by ICRA202
Optimal Feature Transport for Cross-View Image Geo-Localization
This paper addresses the problem of cross-view image geo-localization, where
the geographic location of a ground-level street-view query image is estimated
by matching it against a large scale aerial map (e.g., a high-resolution
satellite image). State-of-the-art deep-learning based methods tackle this
problem as deep metric learning which aims to learn global feature
representations of the scene seen by the two different views. Despite promising
results are obtained by such deep metric learning methods, they, however, fail
to exploit a crucial cue relevant for localization, namely, the spatial layout
of local features. Moreover, little attention is paid to the obvious domain gap
(between aerial view and ground view) in the context of cross-view
localization. This paper proposes a novel Cross-View Feature Transport (CVFT)
technique to explicitly establish cross-view domain transfer that facilitates
feature alignment between ground and aerial images. Specifically, we implement
the CVFT as network layers, which transports features from one domain to the
other, leading to more meaningful feature similarity comparison. Our model is
differentiable and can be learned end-to-end. Experiments on large-scale
datasets have demonstrated that our method has remarkably boosted the
state-of-the-art cross-view localization performance, e.g., on the CVUSA
dataset, with significant improvements for top-1 recall from 40.79% to 61.43%,
and for top-10 from 76.36% to 90.49%. We expect the key insight of the paper
(i.e., explicitly handling domain difference via domain transport) will prove
to be useful for other similar problems in computer vision as well
Wide-Area Geolocalization with a Limited Field of View Camera in Challenging Urban Environments
Cross-view geolocalization, a supplement or replacement for GPS, localizes an
agent within a search area by matching ground-view images to overhead images.
Significant progress has been made assuming a panoramic ground camera.
Panoramic cameras' high complexity and cost make non-panoramic cameras more
widely applicable, but also more challenging since they yield less scene
overlap between ground and overhead images. This paper presents Restricted FOV
Wide-Area Geolocalization (ReWAG), a cross-view geolocalization approach that
combines a neural network and particle filter to globally localize a mobile
agent with only odometry and a non-panoramic camera. ReWAG creates pose-aware
embeddings and provides a strategy to incorporate particle pose into the
Siamese network, improving localization accuracy by a factor of 100 compared to
a vision transformer baseline. This extended work also presents ReWAG*, which
improves upon ReWAG's generalization ability in previously unseen environments.
ReWAG* repeatedly converges accurately on a dataset of images we have collected
in Boston with a 72 degree field of view (FOV) camera, a location and FOV that
ReWAG* was not trained on.Comment: 10 pages, 16 figures. Extension of ICRA 2023 paper arXiv:2209.1185
Boosting 3-DoF Ground-to-Satellite Camera Localization Accuracy via Geometry-Guided Cross-View Transformer
Image retrieval-based cross-view localization methods often lead to very
coarse camera pose estimation, due to the limited sampling density of the
database satellite images. In this paper, we propose a method to increase the
accuracy of a ground camera's location and orientation by estimating the
relative rotation and translation between the ground-level image and its
matched/retrieved satellite image. Our approach designs a geometry-guided
cross-view transformer that combines the benefits of conventional geometry and
learnable cross-view transformers to map the ground-view observations to an
overhead view. Given the synthesized overhead view and observed satellite
feature maps, we construct a neural pose optimizer with strong global
information embedding ability to estimate the relative rotation between them.
After aligning their rotations, we develop an uncertainty-guided spatial
correlation to generate a probability map of the vehicle locations, from which
the relative translation can be determined. Experimental results demonstrate
that our method significantly outperforms the state-of-the-art. Notably, the
likelihood of restricting the vehicle lateral pose to be within 1m of its
Ground Truth (GT) value on the cross-view KITTI dataset has been improved from
to , and the likelihood of restricting the vehicle
orientation to be within of its GT value has been improved from
to .Comment: Accepted to ICCV 202
- …