21 research outputs found
Fine-Grained Object Recognition and Zero-Shot Learning in Remote Sensing Imagery
Fine-grained object recognition that aims to identify the type of an object
among a large number of subcategories is an emerging application with the
increasing resolution that exposes new details in image data. Traditional fully
supervised algorithms fail to handle this problem where there is low
between-class variance and high within-class variance for the classes of
interest with small sample sizes. We study an even more extreme scenario named
zero-shot learning (ZSL) in which no training example exists for some of the
classes. ZSL aims to build a recognition model for new unseen categories by
relating them to seen classes that were previously learned. We establish this
relation by learning a compatibility function between image features extracted
via a convolutional neural network and auxiliary information that describes the
semantics of the classes of interest by using training samples from the seen
classes. Then, we show how knowledge transfer can be performed for the unseen
classes by maximizing this function during inference. We introduce a new data
set that contains 40 different types of street trees in 1-ft spatial resolution
aerial data, and evaluate the performance of this model with manually annotated
attributes, a natural language model, and a scientific taxonomy as auxiliary
information. The experiments show that the proposed model achieves 14.3%
recognition accuracy for the classes with no training examples, which is
significantly better than a random guess accuracy of 6.3% for 16 test classes,
and three other ZSL algorithms.Comment: G. Sumbul, R. G. Cinbis, S. Aksoy, "Fine-Grained Object Recognition
and Zero-Shot Learning in Remote Sensing Imagery", IEEE Transactions on
Geoscience and Remote Sensing (TGRS), in press, 201
A Deep Multi-Attention Driven Approach for Multi-Label Remote Sensing Image Classification
Deep learning (DL) based methods have been found popular in the framework of remote sensing (RS) image scene classification. Most of the existing DL based methods assume that training images are annotated by single-labels, however RS images typically contain multiple classes and thus can simultaneously be associated with multi-labels. Despite the success of existing methods in describing the information content of very high resolution aerial images with RGB bands, any direct adaptation for high-dimensional high-spatial resolution RS images falls short of accurate modeling the spectral and spatial information content. To address this problem, this paper presents a novel approach in the framework of the multi-label classification of high dimensional RS images. The proposed approach is based on three main steps. The first step describes the complex spatial and spectral content of image local areas by a novel KBranch CNN that includes spatial resolution specific CNN branches. The second step initially characterizes the importance scores of different local areas of each image and then defines a global descriptor for each image based on these scores. This is achieved by a novel multi-attention strategy that utilizes the bidirectional long short-term memory networks. The final step achieves the classification of RS image scenes with multilabels. Experiments carried out on BigEarthNet (which is a large-scale Sentinel-2 benchmark archive) show the effectiveness of the proposed approach in terms of multi-label classification accuracy compared to the state-of-the-art approaches. The code of the proposed approach is publicly available at https://gitlab.tubit.tuberlin.de/rsim/MAML-RSIC.EC/H2020/759764/EU/Accurate and Scalable Processing of Big Data in Earth Observation/BigEart
Federated Learning Across Decentralized and Unshared Archives for Remote Sensing Image Classification
Federated learning (FL) enables the collaboration of multiple deep learning
models to learn from decentralized data archives (i.e., clients) without
accessing data on clients. Although FL offers ample opportunities in knowledge
discovery from distributed image archives, it is seldom considered in remote
sensing (RS). In this paper, as a first time in RS, we present a comparative
study of state-of-the-art FL algorithms. To this end, we initially provide a
systematic review of the FL algorithms presented in the computer vision
community for image classification problems, and select several
state-of-the-art FL algorithms based on their effectiveness with respect to
training data heterogeneity across clients (known as non-IID data). After
presenting an extensive overview of the selected algorithms, a theoretical
comparison of the algorithms is conducted based on their: 1) local training
complexity; 2) aggregation complexity; 3) learning efficiency; 4) communication
cost; and 5) scalability in terms of number of clients. As the classification
task, we consider multi-label classification (MLC) problem since RS images
typically consist of multiple classes, and thus can simultaneously be
associated with multi-labels. After the theoretical comparison, experimental
analyses are presented to compare them under different decentralization
scenarios in terms of MLC performance. Based on our comprehensive analyses, we
finally derive a guideline for selecting suitable FL algorithms in RS. The code
of this work will be publicly available at https://git.tu-berlin.de/rsim/FL-RS.Comment: Submitted to the IEEE Geoscience and Remote Sensing Magazin
Learning Across Decentralized Multi-Modal Remote Sensing Archives with Federated Learning
The development of federated learning (FL) methods, which aim to learn from
distributed databases (i.e., clients) without accessing data on clients, has
recently attracted great attention. Most of these methods assume that the
clients are associated with the same data modality. However, remote sensing
(RS) images in different clients can be associated with different data
modalities that can improve the classification performance when jointly used.
To address this problem, in this paper we introduce a novel multi-modal FL
framework that aims to learn from decentralized multi-modal RS image archives
for RS image classification problems. The proposed framework is made up of
three modules: 1) multi-modal fusion (MF); 2) feature whitening (FW); and 3)
mutual information maximization (MIM). The MF module performs iterative model
averaging to learn without accessing data on clients in the case that clients
are associated with different data modalities. The FW module aligns the
representations learned among the different clients. The MIM module maximizes
the similarity of images from different modalities. Experimental results show
the effectiveness of the proposed framework compared to iterative model
averaging, which is a widely used algorithm in FL. The code of the proposed
framework is publicly available at https://git.tu-berlin.de/rsim/MM-FL.Comment: Accepted at IEEE International Geoscience and Remote Sensing
Symposium (IGARSS) 2023. Our code is available at
https://git.tu-berlin.de/rsim/MM-F
A Comparative Study of Deep Learning Loss Functions for Multi-Label Remote Sensing Image Classification
This paper analyzes and compares different deep learning loss functions in
the framework of multi-label remote sensing (RS) image scene classification
problems. We consider seven loss functions: 1) cross-entropy loss; 2) focal
loss; 3) weighted cross-entropy loss; 4) Hamming loss; 5) Huber loss; 6)
ranking loss; and 7) sparseMax loss. All the considered loss functions are
analyzed for the first time in RS. After a theoretical analysis, an
experimental analysis is carried out to compare the considered loss functions
in terms of their: 1) overall accuracy; 2) class imbalance awareness (for which
the number of samples associated to each class significantly varies); 3)
convexibility and differentiability; and 4) learning efficiency (i.e.,
convergence speed). On the basis of our analysis, some guidelines are derived
for a proper selection of a loss function in multi-label RS scene
classification problems.Comment: Accepted at IEEE International Geoscience and Remote Sensing
Symposium (IGARSS) 2020. For code visit:
https://gitlab.tubit.tu-berlin.de/rsim/RS-MLC-Losse
Deep Active Learning for Multi-Label Classification of Remote Sensing Images
In this letter, we introduce deep active learning (AL) for multi-label
classification (MLC) problems in remote sensing (RS). In particular, we
investigate the effectiveness of several AL query functions for MLC of RS
images. Unlike the existing AL query functions (which are defined for
single-label classification or semantic segmentation problems), each query
function in this paper is based on the evaluation of two criteria: i)
multi-label uncertainty; and ii) multi-label diversity. The multi-label
uncertainty criterion is associated to the confidence of the deep neural
networks (DNNs) in correctly assigning multi-labels to each image. To assess
this criterion, we investigate three strategies: i) learning multi-label loss
ordering; ii) measuring temporal discrepancy of multi-label predictions; and
iii) measuring magnitude of approximated gradient embeddings. The multi-label
diversity criterion is associated to the selection of a set of images that are
as diverse as possible to each other that prevents redundancy among them. To
assess this criterion, we exploit a clustering based strategy. We combine each
of the above-mentioned uncertainty strategies with the clustering based
diversity strategy, resulting in three different query functions. All the
considered query functions are introduced for the first time in the framework
of MLC problems in RS. Experimental results obtained on two benchmark archives
show that these query functions result in the selection of a highly informative
set of samples at each iteration of the AL process.Comment: Accepted to IEEE Geoscience and Remote Sensing Letter
Bigearthnet: A Large-Scale Benchmark Archive for Remote Sensing Image Understanding
© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.This paper presents the BigEarthNet that is a new large-scale multi-label Sentinel-2 benchmark archive. The BigEarthNet consists of 590, 326 Sentinel-2 image patches, each of which is a section of i) 120 × 120 pixels for 10m bands; ii) 60×60 pixels for 20m bands; and iii) 20×20 pixels for 60m bands. Unlike most of the existing archives, each image patch is annotated by multiple land-cover classes (i.e., multi-labels) that are provided from the CORINE Land Cover database of the year 2018 (CLC 2018). The BigEarthNet is significantly larger than the existing archives in remote sensing (RS) and thus is much more convenient to be used as a training source in the context of deep learning. This paper first addresses the limitations of the existing archives and then describes the properties of the BigEarthNet. Experimental results obtained in the framework of RS image scene classification problems show that a shallow Convolutional Neural Network (CNN) architecture trained on the BigEarthNet provides much higher accuracy compared to a state-of-the-art CNN model pre-trained on the ImageNet (which is a very popular large-scale benchmark archive in computer vision). The BigEarthNet opens up promising directions to advance operational RS applications and research in massive Sentinel-2 image archives.EC/H2020/759764/EU/Accurate and Scalable Processing of Big Data in Earth Observation/BigEarthBMBF, 01IS14013A, Verbundprojekt: BBDC - Berliner Kompetenzzentrum für Big Dat
Informative and Representative Triplet Selection for Multilabel Remote Sensing Image Retrieval
Learning the similarity between remote sensing (RS) images forms the foundation for content-based RS image retrieval (CBIR). Recently, deep metric learning approaches that map the semantic similarity of images into an embedding (metric) space have been found very popular in RS. A common approach for learning the metric space relies on the selection of triplets of similar (positive) and dissimilar (negative) images to a reference image called an anchor. Choosing triplets is a difficult task particularly for multilabel RS CBIR, where each training image is annotated by multiple class labels. To address this problem, in this article, we propose a novel triplet sampling method in the framework of deep neural networks (DNNs) defined for multilabel RS CBIR problems. The proposed method selects a small set of the most representative and informative triplets based on two main steps. In the first step, a set of anchors that are diverse to each other in the embedding space is selected from the current minibatch using an iterative algorithm. In the second step, different sets of positive and negative images are chosen for each anchor by evaluating the relevancy, hardness, and diversity of the images among each other based on a novel strategy. Experimental results obtained on two multilabel benchmark archives show that the selection of the most informative and representative triplets in the context of DNNs results in: 1) reducing the computational complexity of the training phase of the DNNs without any significant loss on the performance and 2) an increase in learning speed since informative triplets allow fast convergence. The code of the proposed method is publicly available at https://git.tu-berlin.de/rsim/image-retrieval-from-triplets .EC/H2020/759764/EU/Accurate and Scalable Processing of Big Data in Earth Observation/BigEarthDFG, 273827070, SPP 1894: Volunteered Geographic Information: Interpretation, Visualisierung und Social Computin