Search CORE

2,901 research outputs found

Cross-Task Transfer for Geotagged Audiovisual Aerial Scene Recognition

Author: A Owens
F Zhang
G Cheng
GS Xia
K Nogueira
Lvd Maaten
Q Zou
T Baltrušaitis
WL Zheng
Publication venue
Publication date: 15/07/2020
Field of study

Aerial scene recognition is a fundamental task in remote sensing and has recently received increased interest. While the visual information from overhead images with powerful models and efficient algorithms yields considerable performance on scene recognition, it still suffers from the variation of ground objects, lighting conditions etc. Inspired by the multi-channel perception theory in cognition science, in this paper, for improving the performance on the aerial scene recognition, we explore a novel audiovisual aerial scene recognition task using both images and sounds as input. Based on an observation that some specific sound events are more likely to be heard at a given geographic location, we propose to exploit the knowledge from the sound events to improve the performance on the aerial scene recognition. For this purpose, we have constructed a new dataset named AuDio Visual Aerial sceNe reCognition datasEt (ADVANCE). With the help of this dataset, we evaluate three proposed approaches for transferring the sound event knowledge to the aerial scene recognition task in a multimodal learning framework, and show the benefit of exploiting the audio information for the aerial scene recognition. The source code is publicly available for reproducibility purposes.Comment: ECCV 202

arXiv.org e-Print Archive

Institute of Transport Research:Publications

Crossref

Deep learning in remote sensing: a review

Author: Fraundorfer Friedrich
Mou Lichao
Tuia Devis
Xia Gui-Song
Xu Feng
Zhang Liangpei
Zhu Xiao Xiang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

Standing at the paradigm shift towards data-intensive science, machine learning techniques are becoming increasingly important. In particular, as a major breakthrough in the field, deep learning has proven as an extremely powerful tool in many fields. Shall we embrace deep learning as the key to all? Or, should we resist a 'black-box' solution? There are controversial opinions in the remote sensing community. In this article, we analyze the challenges of using deep learning for remote sensing data analysis, review the recent advances, and provide resources to make deep learning in remote sensing ridiculously simple to start with. More importantly, we advocate remote sensing scientists to bring their expertise into deep learning, and use it as an implicit general model to tackle unprecedented large-scale influential challenges, such as climate change and urbanization.Comment: Accepted for publication IEEE Geoscience and Remote Sensing Magazin

arXiv.org e-Print Archive

Institute of Transport Research:Publications

Wageningen University & Research Publications

Carolina Digital Repository

RGB2LIDAR: Towards Solving Large-Scale Cross-Modal Visual Localization

Author: Arandjelović Relja
Chen Hui
Chiu Han-Pang
Cordts Marius
Cummins Mark J
Faghri Fartash
Gong Yunchao
Hu Sixing
Huang Feiran
Hubert Tsai Yao-Hung
Levinson Jesse
Mahmood Faisal
Mao Junhua
Mithun Niluthpol Chowdhury
Mithun Niluthpol Chowdhury
Mithun Niluthpol Chowdhury
Mithun Niluthpol Chowdhury
Pronobis Andrzej
Razavian Sharif
Rottmann Axel
Schönberger Johannes L
Seymour Zachary
Toft Carl
Wang Tan
Wu Jianixn
Wu Yiling
Zadeh Amir
Zhou Bolei
Zolanvari SM
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 11/09/2020
Field of study

We study an important, yet largely unexplored problem of large-scale cross-modal visual localization by matching ground RGB images to a geo-referenced aerial LIDAR 3D point cloud (rendered as depth images). Prior works were demonstrated on small datasets and did not lend themselves to scaling up for large-scale applications. To enable large-scale evaluation, we introduce a new dataset containing over 550K pairs (covering 143 km^2 area) of RGB and aerial LIDAR depth images. We propose a novel joint embedding based method that effectively combines the appearance and semantic cues from both modalities to handle drastic cross-modal variations. Experiments on the proposed dataset show that our model achieves a strong result of a median rank of 5 in matching across a large test set of 50K location pairs collected from a 14km^2 area. This represents a significant advancement over prior works in performance and scale. We conclude with qualitative results to highlight the challenging nature of this task and the benefits of the proposed model. Our work provides a foundation for further research in cross-modal visual localization.Comment: ACM Multimedia 202

arXiv.org e-Print Archive

Crossref

RS5M: A Large Scale Vision-Language Dataset for Remote Sensing Vision-Language Foundation Model

Author: Guo Yulong
Yin Jianwei
Zhang Zilun
Zhao Tiancheng
Publication venue
Publication date: 31/08/2023
Field of study

Pre-trained Vision-Language Foundation Models utilizing extensive image-text paired data have demonstrated unprecedented image-text association capabilities, achieving remarkable results across various downstream tasks. A critical challenge is how to make use of existing large-scale pre-trained VLMs, which are trained on common objects, to perform the domain-specific transfer for accomplishing domain-related downstream tasks. In this paper, we propose a new framework that includes the Domain Foundation Model (DFM), bridging the gap between the General Foundation Model (GFM) and domain-specific downstream tasks. Moreover, we present an image-text paired dataset in the field of remote sensing (RS), RS5M, which has 5 million RS images with English descriptions. The dataset is obtained from filtering publicly available image-text paired datasets and captioning label-only RS datasets with pre-trained VLM. These constitute the first large-scale RS image-text paired dataset. Additionally, we tried several Parameter-Efficient Fine-Tuning methods on RS5M to implement the DFM. Experimental results show that our proposed dataset are highly effective for various tasks, improving upon the baseline by

8 \% \sim 16 \%

in zero-shot classification tasks, and obtaining good results in both Vision-Language Retrieval and Semantic Localization tasks. \url{https://github.com/om-ai-lab/RS5M}Comment: RS5M dataset v

arXiv.org e-Print Archive