57,935 research outputs found
A generic self-supervised learning (SSL) framework for representation learning from spectra-spatial feature of unlabeled remote sensing imagery
Remote sensing data has been widely used for various Earth Observation (EO)
missions such as land use and cover classification, weather forecasting,
agricultural management, and environmental monitoring. Most existing remote
sensing data-based models are based on supervised learning that requires large
and representative human-labelled data for model training, which is costly and
time-consuming. Recently, self-supervised learning (SSL) enables the models to
learn a representation from orders of magnitude more unlabelled data. This
representation has been proven to boost the performance of downstream tasks and
has potential for remote sensing applications. The success of SSL is heavily
dependent on a pre-designed pretext task, which introduces an inductive bias
into the model from a large amount of unlabelled data. Since remote sensing
imagery has rich spectral information beyond the standard RGB colour space, the
pretext tasks established in computer vision based on RGB images may not be
straightforward to be extended to the multi/hyperspectral domain. To address
this challenge, this work has designed a novel SSL framework that is capable of
learning representation from both spectra-spatial information of unlabelled
data. The framework contains two novel pretext tasks for object-based and
pixel-based remote sensing data analysis methods, respectively. Through two
typical downstream tasks evaluation (a multi-label land cover classification
task on Sentienl-2 multispectral datasets and a ground soil parameter retrieval
task on hyperspectral datasets), the results demonstrate that the
representation obtained through the proposed SSL achieved a significant
improvement in model performance
MVP: Meta Visual Prompt Tuning for Few-Shot Remote Sensing Image Scene Classification
Vision Transformer (ViT) models have recently emerged as powerful and
versatile models for various visual tasks. Recently, a work called PMF has
achieved promising results in few-shot image classification by utilizing
pre-trained vision transformer models. However, PMF employs full fine-tuning
for learning the downstream tasks, leading to significant overfitting and
storage issues, especially in the remote sensing domain. In order to tackle
these issues, we turn to the recently proposed parameter-efficient tuning
methods, such as VPT, which updates only the newly added prompt parameters
while keeping the pre-trained backbone frozen. Inspired by VPT, we propose the
Meta Visual Prompt Tuning (MVP) method. Specifically, we integrate the VPT
method into the meta-learning framework and tailor it to the remote sensing
domain, resulting in an efficient framework for Few-Shot Remote Sensing Scene
Classification (FS-RSSC). Furthermore, we introduce a novel data augmentation
strategy based on patch embedding recombination to enhance the representation
and diversity of scenes for classification purposes. Experiment results on the
FS-RSSC benchmark demonstrate the superior performance of the proposed MVP over
existing methods in various settings, such as various-way-various-shot,
various-way-one-shot, and cross-domain adaptation.Comment: SUBMIT TO IEEE TRANSACTION
Towards Out-of-Distribution Detection for Remote Sensing
In remote sensing, distributional mismatch between the training and test data may arise due to several reasons, including unseen classes in the test data, differences in the geographic area, and multi-sensor differences. Deep learning based models may behave in unexpected manners when subjected to test data that has such distributional shifts from the training data, also called out-of-distribution (OOD) examples. Vulnerability to OOD data severely reduces the reliability of deep learning based models. In this work, we address this issue by proposing a model to quantify distributional uncertainty of deep learning based remote sensing models. In particular, we adopt a Dirichlet Prior Network for remote sensing data. The approach seeks to maximize the representation gap between the in-domain and OOD examples for a better identification of unknown examples at test time. Experimental results on three exemplary test scenarios show that the proposed model can detect OOD images in remote sensing
Unsupervised domain adaptation semantic segmentation of high-resolution remote sensing imagery with invariant domain-level prototype memory
Semantic segmentation is a key technique involved in automatic interpretation
of high-resolution remote sensing (HRS) imagery and has drawn much attention in
the remote sensing community. Deep convolutional neural networks (DCNNs) have
been successfully applied to the HRS imagery semantic segmentation task due to
their hierarchical representation ability. However, the heavy dependency on a
large number of training data with dense annotation and the sensitiveness to
the variation of data distribution severely restrict the potential application
of DCNNs for the semantic segmentation of HRS imagery. This study proposes a
novel unsupervised domain adaptation semantic segmentation network
(MemoryAdaptNet) for the semantic segmentation of HRS imagery. MemoryAdaptNet
constructs an output space adversarial learning scheme to bridge the domain
distribution discrepancy between source domain and target domain and to narrow
the influence of domain shift. Specifically, we embed an invariant feature
memory module to store invariant domain-level context information because the
features obtained from adversarial learning only tend to represent the variant
feature of current limited inputs. This module is integrated by a category
attention-driven invariant domain-level context aggregation module to current
pseudo invariant feature for further augmenting the pixel representations. An
entropy-based pseudo label filtering strategy is used to update the memory
module with high-confident pseudo invariant feature of current target images.
Extensive experiments under three cross-domain tasks indicate that our proposed
MemoryAdaptNet is remarkably superior to the state-of-the-art methods.Comment: 17 pages, 12 figures and 8 table
Hyperspectral image representation through alpha-trees
International audienceα-trees provide a hierarchical representation of an image into partitions of regions with increasing heterogeneity. This model, inspired from the single-linkage paradigm, has recently been revisited for grayscale images and has been successfully used in the field of remote sensing. This article shows how this representation can be adapted to more complex data here hyperspectral images, according to different strategies. We know that the measure of distance between two neighbouring pixels is a key element for the quality of the underlying tree, but usual metrics are not satisfying. We show here that a relevant solution to understand hyperspectral data relies on the prior learning of the metric to be used and the exploitation of domain knowledge
A generic Self-Supervised Learning (SSL) framework for representation learning from spectral–spatial features of unlabeled remote sensing imagery
Remote sensing data has been widely used for various Earth Observation (EO) missions such as land use and cover classification, weather forecasting, agricultural management, and environmental monitoring. Most existing remote-sensing-data-based models are based on supervised learning that requires large and representative human-labeled data for model training, which is costly and time-consuming. The recent introduction of self-supervised learning (SSL) enables models to learn a representation from orders of magnitude more unlabeled data. The success of SSL is heavily dependent on a pre-designed pretext task, which introduces an inductive bias into the model from a large amount of unlabeled data. Since remote sensing imagery has rich spectral information beyond the standard RGB color space, it may not be straightforward to extend to the multi/hyperspectral domain the pretext tasks established in computer vision based on RGB images. To address this challenge, this work proposed a generic self-supervised learning framework based on remote sensing data at both the object and pixel levels. The method contains two novel pretext tasks, one for object-based and one for pixel-based remote sensing data analysis methods. One pretext task is used to reconstruct the spectral profile from the masked data, which can be used to extract a representation of pixel information and improve the performance of downstream tasks associated with pixel-based analysis. The second pretext task is used to identify objects from multiple views of the same object in multispectral data, which can be used to extract a representation and improve the performance of downstream tasks associated with object-based analysis. The results of two typical downstream task evaluation exercises (a multilabel land cover classification task on Sentinel-2 multispectral datasets and a ground soil parameter retrieval task on hyperspectral datasets) demonstrate that the proposed SSL method learns a target representation that covers both spatial and spectral information from massive unlabeled data. A comparison with currently available SSL methods shows that the proposed method, which emphasizes both spectral and spatial features, outperforms existing SSL methods on multi- and hyperspectral remote sensing datasets. We believe that this approach has the potential to be effective in a wider range of remote sensing applications and we will explore its utility in more remote sensing applications in the future
State-of-the-art and gaps for deep learning on limited training data in remote sensing
Deep learning usually requires big data, with respect to both volume and
variety. However, most remote sensing applications only have limited training
data, of which a small subset is labeled. Herein, we review three
state-of-the-art approaches in deep learning to combat this challenge. The
first topic is transfer learning, in which some aspects of one domain, e.g.,
features, are transferred to another domain. The next is unsupervised learning,
e.g., autoencoders, which operate on unlabeled data. The last is generative
adversarial networks, which can generate realistic looking data that can fool
the likes of both a deep learning network and human. The aim of this article is
to raise awareness of this dilemma, to direct the reader to existing work and
to highlight current gaps that need solving.Comment: arXiv admin note: text overlap with arXiv:1709.0030
- …