Search CORE

335 research outputs found

Learning to Evaluate Performance of Multi-modal Semantic Localization

Author: Chen Jialiang
Li Chongyang
Li Shouke
Mao Yongqiang
Pan Zhaoying
Sun Xian
Wang Hongqi
Yuan Zhiqiang
Zhang Wenkai
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/09/2022
Field of study

Semantic localization (SeLo) refers to the task of obtaining the most relevant locations in large-scale remote sensing (RS) images using semantic information such as text. As an emerging task based on cross-modal retrieval, SeLo achieves semantic-level retrieval with only caption-level annotation, which demonstrates its great potential in unifying downstream tasks. Although SeLo has been carried out successively, but there is currently no work has systematically explores and analyzes this urgent direction. In this paper, we thoroughly study this field and provide a complete benchmark in terms of metrics and testdata to advance the SeLo task. Firstly, based on the characteristics of this task, we propose multiple discriminative evaluation metrics to quantify the performance of the SeLo task. The devised significant area proportion, attention shift distance, and discrete attention distance are utilized to evaluate the generated SeLo map from pixel-level and region-level. Next, to provide standard evaluation data for the SeLo task, we contribute a diverse, multi-semantic, multi-objective Semantic Localization Testset (AIR-SLT). AIR-SLT consists of 22 large-scale RS images and 59 test cases with different semantics, which aims to provide a comprehensive evaluations for retrieval models. Finally, we analyze the SeLo performance of RS cross-modal retrieval models in detail, explore the impact of different variables on this task, and provide a complete benchmark for the SeLo task. We have also established a new paradigm for RS referring expression comprehension, and demonstrated the great advantage of SeLo in semantics through combining it with tasks such as detection and road extraction. The proposed evaluation metrics, semantic localization testsets, and corresponding scripts have been open to access at github.com/xiaoyuan1996/SemanticLocalizationMetrics .Comment: 19 pages, 11 figure

arXiv.org e-Print Archive

A Dimensional Structure based Knowledge Distillation Method for Cross-Modal Learning

Author: Dong Hongwei
Qiang Wenwen
Si Lingyu
Sun Fuchun
Xu Fanjiang
Yu Junzhi
Zhai Wenlong
Zheng Changwen
Publication venue
Publication date: 28/06/2023
Field of study

Due to limitations in data quality, some essential visual tasks are difficult to perform independently. Introducing previously unavailable information to transfer informative dark knowledge has been a common way to solve such hard tasks. However, research on why transferred knowledge works has not been extensively explored. To address this issue, in this paper, we discover the correlation between feature discriminability and dimensional structure (DS) by analyzing and observing features extracted from simple and hard tasks. On this basis, we express DS using deep channel-wise correlation and intermediate spatial distribution, and propose a novel cross-modal knowledge distillation (CMKD) method for better supervised cross-modal learning (CML) performance. The proposed method enforces output features to be channel-wise independent and intermediate ones to be uniformly distributed, thereby learning semantically irrelevant features from the hard task to boost its accuracy. This is especially useful in specific applications where the performance gap between dual modalities is relatively large. Furthermore, we collect a real-world CML dataset to promote community development. The dataset contains more than 10,000 paired optical and radar images and is continuously being updated. Experimental results on real-world and benchmark datasets validate the effectiveness of the proposed method

arXiv.org e-Print Archive

Special Issue on Advances in Deep Learning

Author: Bottino Andrea
Cumani Sandro
Gragnaniello Diego
Kim Wonjoon
Publication venue: 'MDPI AG'
Publication date: 01/01/2020
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Archivio della Ricerca - Università di Salerno

Recommended from our members

Scale-wise interaction fusion and knowledge distillation network for aerial scene recognition

Author: An M
Hu Z
Lei T
Nandi AK
Ning H
Sun H
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 04/03/2023
Field of study

Data availability statement: Data sharing is not applicable to this article as no new data were created or analysed in this study.Copyright © 2023 The Authors. Aerial scene recognition (ASR) has attracted great attention due to its increasingly essential applications. Most of the ASR methods adopt the multi-scale architecture because both global and local features play great roles in ASR. However, the existing multi-scale methods neglect the effective interactions among different scales and various spatial locations when fusing global and local features, leading to a limited ability to deal with challenges of large-scale variation and complex background in aerial scene images. In addition, existing methods may suffer from poor generalisations due to millions of to-be-learnt parameters and inconsistent predictions between global and local features. To tackle these problems, this study proposes a scale-wise interaction fusion and knowledge distillation (SIF-KD) network for learning robust and discriminative features with scale-invariance and background-independent information. The main highlights of this study include two aspects. On the one hand, a global-local features collaborative learning scheme is devised for extracting scale-invariance features so as to tackle the large-scale variation problem in aerial scene images. Specifically, a plug-and-play multi-scale context attention fusion module is proposed for collaboratively fusing the context information between global and local features. On the other hand, a scale-wise knowledge distillation scheme is proposed to produce more consistent predictions by distilling the predictive distribution between different scales during training. Comprehensive experimental results show the proposed SIF-KD network achieves the best overall accuracy with 99.68%, 98.74% and 95.47% on the UCM, AID and NWPU-RESISC45 datasets, respectively, compared with state of the arts.National Natural Science Foundation of China. Grant Numbers: 62201452, 2271296, 62201453; Natural Science Basic Research Program of Shaanxi. Grant Number: 2022JQ-592; Key Research and Development Program of Shaanxi Province. Grant Number: 2021JC-47; Shaanxi Provincial Education Department. Grant Number: 22JK0568

Brunel University Research Archive

A Transformer and Visual Foundation Model-Based Method for Cross-View Remote Sensing Image Retrieval

Author: C. Yin
J. Luo
Q. Ye
Publication venue: Copernicus Publications
Publication date: 01/05/2024
Field of study

Retrieving UAV images that lack POS information with georeferenced satellite orthoimagery is challenging due to the differences in angles of views. Most existing methods rely on deep neural networks with a large number of parameters, leading to substantial time and financial investments in network training. Consequently, these methods may not be well-suited for downstream tasks that have high timeliness requirements. In this work, we propose a cross-view remote sensing image retrieval method based on transformer and visual foundation model. We investigated the potential of visual foundation model for extracting common features from cross-view images. Training is only conducted on a small, self-designed retrieval head, alleviating the burden of network training. Specifically, we designed a CVV module to optimize the features extracted from the visual foundation model, making these features more adept for cross-view image retrieval tasks. And we designed an MLP head to achieve similarity discrimination. The method is verified on a publicly available dataset containing multiple scenes. Our method shows excellent results in terms of both efficiency and accuracy on 15 sub-datasets (10 or 50 scene categories) derived from the public dataset, which holds practical value in engineering applications with streamlined scene categories and constrained computational resources. Furthermore, we initiated a comprehensive discussion and conducted ablation experiments on the network design to validate its efficacy. Additionally, we analyzed the presence of overfitting within the network and deliberated on the limitations of our study, proposing potential avenues for future enhancements

Directory of Open Access Journals

Copernicus Publications

Knowledge Distillation and Continual Learning for Optimized Deep Neural Networks

Author: Phan Vu Minh Hieu
Publication venue: School of Electrical, Computer and Telecommunications Engineering
Publication date: 01/01/2023
Field of study

Over the past few years, deep learning (DL) has been achieving state-of-theart performance on various human tasks such as speech generation, language translation, image segmentation, and object detection. While traditional machine learning models require hand-crafted features, deep learning algorithms can automatically extract discriminative features and learn complex knowledge from large datasets. This powerful learning ability makes deep learning models attractive to both academia and big corporations. Despite their popularity, deep learning methods still have two main limitations: large memory consumption and catastrophic knowledge forgetting. First, DL algorithms use very deep neural networks (DNNs) with many billion parameters, which have a big model size and a slow inference speed. This restricts the application of DNNs in resource-constraint devices such as mobile phones and autonomous vehicles. Second, DNNs are known to suffer from catastrophic forgetting. When incrementally learning new tasks, the model performance on old tasks significantly drops. The ability to accommodate new knowledge while retaining previously learned knowledge is called continual learning. Since the realworld environments in which the model operates are always evolving, a robust neural network needs to have this continual learning ability for adapting to new changes

Research Online