24 research outputs found
Satellite Image Based Cross-view Localization for Autonomous Vehicle
Existing spatial localization techniques for autonomous vehicles mostly use a
pre-built 3D-HD map, often constructed using a survey-grade 3D mapping vehicle,
which is not only expensive but also laborious. This paper shows that by using
an off-the-shelf high-definition satellite image as a ready-to-use map, we are
able to achieve cross-view vehicle localization up to a satisfactory accuracy,
providing a cheaper and more practical way for localization. While the
utilization of satellite imagery for cross-view localization is an established
concept, the conventional methodology focuses primarily on image retrieval.
This paper introduces a novel approach to cross-view localization that departs
from the conventional image retrieval method. Specifically, our method develops
(1) a Geometric-align Feature Extractor (GaFE) that leverages measured 3D
points to bridge the geometric gap between ground and overhead views, (2) a
Pose Aware Branch (PAB) adopting a triplet loss to encourage pose-aware feature
extraction, and (3) a Recursive Pose Refine Branch (RPRB) using the
Levenberg-Marquardt (LM) algorithm to align the initial pose towards the true
vehicle pose iteratively. Our method is validated on KITTI and Ford Multi-AV
Seasonal datasets as ground view and Google Maps as the satellite view. The
results demonstrate the superiority of our method in cross-view localization
with median spatial and angular errors within meter and ,
respectively.Comment: Accepted by ICRA202
Optimal Feature Transport for Cross-View Image Geo-Localization
This paper addresses the problem of cross-view image geo-localization, where
the geographic location of a ground-level street-view query image is estimated
by matching it against a large scale aerial map (e.g., a high-resolution
satellite image). State-of-the-art deep-learning based methods tackle this
problem as deep metric learning which aims to learn global feature
representations of the scene seen by the two different views. Despite promising
results are obtained by such deep metric learning methods, they, however, fail
to exploit a crucial cue relevant for localization, namely, the spatial layout
of local features. Moreover, little attention is paid to the obvious domain gap
(between aerial view and ground view) in the context of cross-view
localization. This paper proposes a novel Cross-View Feature Transport (CVFT)
technique to explicitly establish cross-view domain transfer that facilitates
feature alignment between ground and aerial images. Specifically, we implement
the CVFT as network layers, which transports features from one domain to the
other, leading to more meaningful feature similarity comparison. Our model is
differentiable and can be learned end-to-end. Experiments on large-scale
datasets have demonstrated that our method has remarkably boosted the
state-of-the-art cross-view localization performance, e.g., on the CVUSA
dataset, with significant improvements for top-1 recall from 40.79% to 61.43%,
and for top-10 from 76.36% to 90.49%. We expect the key insight of the paper
(i.e., explicitly handling domain difference via domain transport) will prove
to be useful for other similar problems in computer vision as well
Fine-Grained Cross-View Geo-Localization Using a Correlation-Aware Homography Estimator
In this paper, we introduce a novel approach to fine-grained cross-view
geo-localization. Our method aligns a warped ground image with a corresponding
GPS-tagged satellite image covering the same area using homography estimation.
We first employ a differentiable spherical transform, adhering to geometric
principles, to accurately align the perspective of the ground image with the
satellite map. This transformation effectively places ground and aerial images
in the same view and on the same plane, reducing the task to an image alignment
problem. To address challenges such as occlusion, small overlapping range, and
seasonal variations, we propose a robust correlation-aware homography estimator
to align similar parts of the transformed ground image with the satellite
image. Our method achieves sub-pixel resolution and meter-level GPS accuracy by
mapping the center point of the transformed ground image to the satellite image
using a homography matrix and determining the orientation of the ground camera
using a point above the central axis. Operating at a speed of 30 FPS, our
method outperforms state-of-the-art techniques, reducing the mean metric
localization error by 21.3% and 32.4% in same-area and cross-area
generalization tasks on the VIGOR benchmark, respectively, and by 34.4% on the
KITTI benchmark in same-area evaluation.Comment: 19 pages. Reducing the cross-view geo-localization problem to a 2D
image alignment problem by utilizing BEV transformation, and completing the
alignment process with a correlation-aware homography estimator. Code:
https://github.com/xlwangDev/HC-Ne
Image-based Geolocalization by Ground-to-2.5D Map Matching
We study the image-based geolocalization problem, aiming to localize
ground-view query images on cartographic maps. Current methods often utilize
cross-view localization techniques to match ground-view query images with 2D
maps. However, the performance of these methods is unsatisfactory due to
significant cross-view appearance differences. In this paper, we lift
cross-view matching to a 2.5D space, where heights of structures (e.g., trees
and buildings) provide geometric information to guide the cross-view matching.
We propose a new approach to learning representative embeddings from
multi-modal data. Specifically, we establish a projection relationship between
2.5D space and 2D aerial-view space. The projection is further used to combine
multi-modal features from the 2.5D and 2D maps using an effective
pixel-to-point fusion method. By encoding crucial geometric cues, our method
learns discriminative location embeddings for matching panoramic images and
maps. Additionally, we construct the first large-scale ground-to-2.5D map
geolocalization dataset to validate our method and facilitate future research.
Both single-image based and route based localization experiments are conducted
to test our method. Extensive experiments demonstrate that the proposed method
achieves significantly higher localization accuracy and faster convergence than
previous 2D map-based approaches