2,744 research outputs found
RGB2LIDAR: Towards Solving Large-Scale Cross-Modal Visual Localization
We study an important, yet largely unexplored problem of large-scale
cross-modal visual localization by matching ground RGB images to a
geo-referenced aerial LIDAR 3D point cloud (rendered as depth images). Prior
works were demonstrated on small datasets and did not lend themselves to
scaling up for large-scale applications. To enable large-scale evaluation, we
introduce a new dataset containing over 550K pairs (covering 143 km^2 area) of
RGB and aerial LIDAR depth images. We propose a novel joint embedding based
method that effectively combines the appearance and semantic cues from both
modalities to handle drastic cross-modal variations. Experiments on the
proposed dataset show that our model achieves a strong result of a median rank
of 5 in matching across a large test set of 50K location pairs collected from a
14km^2 area. This represents a significant advancement over prior works in
performance and scale. We conclude with qualitative results to highlight the
challenging nature of this task and the benefits of the proposed model. Our
work provides a foundation for further research in cross-modal visual
localization.Comment: ACM Multimedia 202
Sketch Me That Shoe
This project received support from the European Union’s Horizon 2020 research and innovation programme under grant agreement #640891, the Royal Society and Natural Science Foundation of China (NSFC) joint grant #IE141387 and #61511130081, and the China Scholarship Council (CSC). We gratefully acknowledge the support of NVIDIA Corporation for the donation of the GPUs used for this research
Cross-View Visual Geo-Localization for Outdoor Augmented Reality
Precise estimation of global orientation and location is critical to ensure a
compelling outdoor Augmented Reality (AR) experience. We address the problem of
geo-pose estimation by cross-view matching of query ground images to a
geo-referenced aerial satellite image database. Recently, neural network-based
methods have shown state-of-the-art performance in cross-view matching.
However, most of the prior works focus only on location estimation, ignoring
orientation, which cannot meet the requirements in outdoor AR applications. We
propose a new transformer neural network-based model and a modified triplet
ranking loss for joint location and orientation estimation. Experiments on
several benchmark cross-view geo-localization datasets show that our model
achieves state-of-the-art performance. Furthermore, we present an approach to
extend the single image query-based geo-localization approach by utilizing
temporal information from a navigation pipeline for robust continuous
geo-localization. Experimentation on several large-scale real-world video
sequences demonstrates that our approach enables high-precision and stable AR
insertion.Comment: IEEE VR 202
Beyond Geo-localization: Fine-grained Orientation of Street-view Images by Cross-view Matching with Satellite Imagery
Street-view imagery provides us with novel experiences to explore different
places remotely. Carefully calibrated street-view images (e.g. Google Street
View) can be used for different downstream tasks, e.g. navigation, map features
extraction. As personal high-quality cameras have become much more affordable
and portable, an enormous amount of crowdsourced street-view images are
uploaded to the internet, but commonly with missing or noisy sensor
information. To prepare this hidden treasure for "ready-to-use" status,
determining missing location information and camera orientation angles are two
equally important tasks. Recent methods have achieved high performance on
geo-localization of street-view images by cross-view matching with a pool of
geo-referenced satellite imagery. However, most of the existing works focus
more on geo-localization than estimating the image orientation. In this work,
we re-state the importance of finding fine-grained orientation for street-view
images, formally define the problem and provide a set of evaluation metrics to
assess the quality of the orientation estimation. We propose two methods to
improve the granularity of the orientation estimation, achieving 82.4% and
72.3% accuracy for images with estimated angle errors below 2 degrees for CVUSA
and CVACT datasets, corresponding to 34.9% and 28.2% absolute improvement
compared to previous works. Integrating fine-grained orientation estimation in
training also improves the performance on geo-localization, giving top 1 recall
95.5%/85.5% and 86.8%/80.4% for orientation known/unknown tests on the two
datasets.Comment: This paper has been accepted by ACM Multimedia 2022. The version
contains additional supplementary material
- …