8,747 research outputs found
Optimal Feature Transport for Cross-View Image Geo-Localization
This paper addresses the problem of cross-view image geo-localization, where
the geographic location of a ground-level street-view query image is estimated
by matching it against a large scale aerial map (e.g., a high-resolution
satellite image). State-of-the-art deep-learning based methods tackle this
problem as deep metric learning which aims to learn global feature
representations of the scene seen by the two different views. Despite promising
results are obtained by such deep metric learning methods, they, however, fail
to exploit a crucial cue relevant for localization, namely, the spatial layout
of local features. Moreover, little attention is paid to the obvious domain gap
(between aerial view and ground view) in the context of cross-view
localization. This paper proposes a novel Cross-View Feature Transport (CVFT)
technique to explicitly establish cross-view domain transfer that facilitates
feature alignment between ground and aerial images. Specifically, we implement
the CVFT as network layers, which transports features from one domain to the
other, leading to more meaningful feature similarity comparison. Our model is
differentiable and can be learned end-to-end. Experiments on large-scale
datasets have demonstrated that our method has remarkably boosted the
state-of-the-art cross-view localization performance, e.g., on the CVUSA
dataset, with significant improvements for top-1 recall from 40.79% to 61.43%,
and for top-10 from 76.36% to 90.49%. We expect the key insight of the paper
(i.e., explicitly handling domain difference via domain transport) will prove
to be useful for other similar problems in computer vision as well
Multi-level Feedback Joint Representation Learning Network Based on Adaptive Area Elimination for Cross-view Geo-localization
Cross-view geo-localization refers to the task of matching the same geographic target using images obtained from different platforms, such as drone-view and satellite-view. However, the view angle of images obtained through different platforms will vary greatly, which can bring great challenges to the cross-view geo-localization task. Therefore, we propose a multi-level feedback joint representation learning network based on adaptive area elimination to solve the cross-view geo-localization problem. In our network model, we first process the extracted global features to obtain part-level and patch-level features. We then utilize these features as feedback to the global features to extract the contextual information in the global features and improve the robustness of the extracted features. In addition, as images obtained from different platforms differ, there will always be some interference when matching images. Therefore, we introduce an adaptive area elimination strategy to erase the interference information in the global features and assist the model in obtaining crucial information. On this basis, the feature correlation loss function is designed to constrain learning when using global feature information, thereby eliminating the possible interference, which can improve the network model performance. Finally, a series of experiments is carried out using two well-known benchmarks, namely University-1652 and SUES-200, and the experimental results show that the proposed network model achieves competitive results, thereby demonstrating the effectiveness of proposed model.</p
- …