634 research outputs found
Statistical and Machine Learning Models for Remote Sensing Data Mining - Recent Advancements
This book is a reprint of the Special Issue entitled "Statistical and Machine Learning Models for Remote Sensing Data Mining - Recent Advancements" that was published in Remote Sensing, MDPI. It provides insights into both core technical challenges and some selected critical applications of satellite remote sensing image analytics
Scalable computing for earth observation - Application on Sea Ice analysis
In recent years, Deep learning (DL) networks have shown considerable improvements and have become a preferred methodology in many different applications. These networks have outperformed other classical techniques, particularly in large data settings. In earth observation from the satellite field, for example, DL algorithms have demonstrated the ability to learn complicated nonlinear relationships in input data accurately. Thus, it contributed to advancement in this field. However, the training process of these networks has heavy computational overheads. The reason is two-fold: The sizable complexity of these networks and the high number of training samples needed to learn all parameters comprising these architectures. Although the quantity of training data enhances the accuracy of the trained models in general, the computational cost may restrict the amount of analysis that can be done. This issue is particularly critical in satellite remote sensing, where a myriad of satellites generate an enormous amount of data daily, and acquiring in-situ ground truth for building a large training dataset is a fundamental prerequisite.
This dissertation considers various aspects of deep learning based sea ice monitoring from SAR data. In this application, labeling data is very costly and time-consuming. Also, in some cases, it is not even achievable due to challenges in establishing the required domain knowledge, specifically when it comes to monitoring Arctic Sea ice with Synthetic Aperture Radar (SAR), which is the application domain of this thesis. Because the Arctic is remote, has long dark seasons, and has a very dynamic weather system, the collection of reliable in-situ data is very demanding. In addition to the challenges of interpreting SAR data of sea ice, this issue makes SAR-based sea ice analysis with DL networks a complicated process.
We propose novel DL methods to cope with the problems of scarce training data and address the computational cost of the training process. We analyze DL network capabilities based on self-designed architectures and learn strategies, such as transfer learning for sea ice classification. We also address the scarcity of training data by proposing a novel deep semi-supervised learning method based on SAR data for incorporating unlabeled data information into the training process. Finally, a new distributed DL method that can be used in a semi-supervised manner is proposed to address the computational complexity of deep neural network training
A Billion-scale Foundation Model for Remote Sensing Images
As the potential of foundation models in visual tasks has garnered
significant attention, pretraining these models before downstream tasks has
become a crucial step. The three key factors in pretraining foundation models
are the pretraining method, the size of the pretraining dataset, and the number
of model parameters. Recently, research in the remote sensing field has focused
primarily on the pretraining method and the size of the dataset, with limited
emphasis on the number of model parameters. This paper addresses this gap by
examining the effect of increasing the number of model parameters on the
performance of foundation models in downstream tasks such as rotated object
detection and semantic segmentation. We pretrained foundation models with
varying numbers of parameters, including 86M, 605.26M, 1.3B, and 2.4B, to
determine whether performance in downstream tasks improved with an increase in
parameters. To the best of our knowledge, this is the first billion-scale
foundation model in the remote sensing field. Furthermore, we propose an
effective method for scaling up and fine-tuning a vision transformer in the
remote sensing field. To evaluate general performance in downstream tasks, we
employed the DOTA v2.0 and DIOR-R benchmark datasets for rotated object
detection, and the Potsdam and LoveDA datasets for semantic segmentation.
Experimental results demonstrated that, across all benchmark datasets and
downstream tasks, the performance of the foundation models and data efficiency
improved as the number of parameters increased. Moreover, our models achieve
the state-of-the-art performance on several datasets including DIOR-R, Postdam,
and LoveDA.Comment: This work has been submitted to the IEEE for possible publicatio
Knowledge Distillation and Continual Learning for Optimized Deep Neural Networks
Over the past few years, deep learning (DL) has been achieving state-of-theart performance on various human tasks such as speech generation, language translation, image segmentation, and object detection. While traditional machine learning models require hand-crafted features, deep learning algorithms can automatically extract discriminative features and learn complex knowledge from large datasets. This powerful learning ability makes deep learning models attractive to both academia and big corporations.
Despite their popularity, deep learning methods still have two main limitations: large memory consumption and catastrophic knowledge forgetting. First, DL algorithms use very deep neural networks (DNNs) with many billion parameters, which have a big model size and a slow inference speed. This restricts the application of DNNs in resource-constraint devices such as mobile phones and autonomous vehicles. Second, DNNs are known to suffer from catastrophic forgetting. When incrementally learning new tasks, the model performance on old tasks significantly drops. The ability to accommodate new knowledge while retaining previously learned knowledge is called continual learning. Since the realworld environments in which the model operates are always evolving, a robust neural network needs to have this continual learning ability for adapting to new changes
Intelligent Data Analytics using Deep Learning for Data Science
Nowadays, data science stimulates the interest of academics and practitioners because it can assist in the extraction of significant insights from massive amounts of data. From the years 2018 through 2025, the Global Datasphere is expected to rise from 33 Zettabytes to 175 Zettabytes, according to the International Data Corporation. This dissertation proposes an intelligent data analytics framework that uses deep learning to tackle several difficulties when implementing a data science application. These difficulties include dealing with high inter-class similarity, the availability and quality of hand-labeled data, and designing a feasible approach for modeling significant correlations in features gathered from various data sources. The proposed intelligent data analytics framework employs a novel strategy for improving data representation learning by incorporating supplemental data from various sources and structures. First, the research presents a multi-source fusion approach that utilizes confident learning techniques to improve the data quality from many noisy sources. Meta-learning methods based on advanced techniques such as the mixture of experts and differential evolution combine the predictive capacity of individual learners with a gating mechanism, ensuring that only the most trustworthy features or predictions are integrated to train the model. Then, a Multi-Level Convolutional Fusion is presented to train a model on the correspondence between local-global deep feature interactions to identify easily confused samples of different classes. The convolutional fusion is further enhanced with the power of Graph Transformers, aggregating the relevant neighboring features in graph-based input data structures and achieving state-of-the-art performance on a large-scale building damage dataset. Finally, weakly-supervised strategies, noise regularization, and label propagation are proposed to train a model on sparse input labeled data, ensuring the model\u27s robustness to errors and supporting the automatic expansion of the training set. The suggested approaches outperformed competing strategies in effectively training a model on a large-scale dataset of 500k photos, with just about 7% of the images annotated by a human. The proposed framework\u27s capabilities have benefited various data science applications, including fluid dynamics, geometric morphometrics, building damage classification from satellite pictures, disaster scene description, and storm-surge visualization
A generic self-supervised learning (SSL) framework for representation learning from spectra-spatial feature of unlabeled remote sensing imagery
Remote sensing data has been widely used for various Earth Observation (EO)
missions such as land use and cover classification, weather forecasting,
agricultural management, and environmental monitoring. Most existing remote
sensing data-based models are based on supervised learning that requires large
and representative human-labelled data for model training, which is costly and
time-consuming. Recently, self-supervised learning (SSL) enables the models to
learn a representation from orders of magnitude more unlabelled data. This
representation has been proven to boost the performance of downstream tasks and
has potential for remote sensing applications. The success of SSL is heavily
dependent on a pre-designed pretext task, which introduces an inductive bias
into the model from a large amount of unlabelled data. Since remote sensing
imagery has rich spectral information beyond the standard RGB colour space, the
pretext tasks established in computer vision based on RGB images may not be
straightforward to be extended to the multi/hyperspectral domain. To address
this challenge, this work has designed a novel SSL framework that is capable of
learning representation from both spectra-spatial information of unlabelled
data. The framework contains two novel pretext tasks for object-based and
pixel-based remote sensing data analysis methods, respectively. Through two
typical downstream tasks evaluation (a multi-label land cover classification
task on Sentienl-2 multispectral datasets and a ground soil parameter retrieval
task on hyperspectral datasets), the results demonstrate that the
representation obtained through the proposed SSL achieved a significant
improvement in model performance
Unlocking the capabilities of explainable fewshot learning in remote sensing
Recent advancements have significantly improved the efficiency and
effectiveness of deep learning methods for imagebased remote sensing tasks.
However, the requirement for large amounts of labeled data can limit the
applicability of deep neural networks to existing remote sensing datasets. To
overcome this challenge, fewshot learning has emerged as a valuable approach
for enabling learning with limited data. While previous research has evaluated
the effectiveness of fewshot learning methods on satellite based datasets,
little attention has been paid to exploring the applications of these methods
to datasets obtained from UAVs, which are increasingly used in remote sensing
studies. In this review, we provide an up to date overview of both existing and
newly proposed fewshot classification techniques, along with appropriate
datasets that are used for both satellite based and UAV based data. Our
systematic approach demonstrates that fewshot learning can effectively adapt to
the broader and more diverse perspectives that UAVbased platforms can provide.
We also evaluate some SOTA fewshot approaches on a UAV disaster scene
classification dataset, yielding promising results. We emphasize the importance
of integrating XAI techniques like attention maps and prototype analysis to
increase the transparency, accountability, and trustworthiness of fewshot
models for remote sensing. Key challenges and future research directions are
identified, including tailored fewshot methods for UAVs, extending to unseen
tasks like segmentation, and developing optimized XAI techniques suited for
fewshot remote sensing problems. This review aims to provide researchers and
practitioners with an improved understanding of fewshot learnings capabilities
and limitations in remote sensing, while highlighting open problems to guide
future progress in efficient, reliable, and interpretable fewshot methods.Comment: Under review, once the paper is accepted, the copyright will be
transferred to the corresponding journa
Rich Feature Distillation with Feature Affinity Module for Efficient Image Dehazing
Single-image haze removal is a long-standing hurdle for computer vision
applications. Several works have been focused on transferring advances from
image classification, detection, and segmentation to the niche of image
dehazing, primarily focusing on contrastive learning and knowledge
distillation. However, these approaches prove computationally expensive,
raising concern regarding their applicability to on-the-edge use-cases. This
work introduces a simple, lightweight, and efficient framework for single-image
haze removal, exploiting rich "dark-knowledge" information from a lightweight
pre-trained super-resolution model via the notion of heterogeneous knowledge
distillation. We designed a feature affinity module to maximize the flow of
rich feature semantics from the super-resolution teacher to the student
dehazing network. In order to evaluate the efficacy of our proposed framework,
its performance as a plug-and-play setup to a baseline model is examined. Our
experiments are carried out on the RESIDE-Standard dataset to demonstrate the
robustness of our framework to the synthetic and real-world domains. The
extensive qualitative and quantitative results provided establish the
effectiveness of the framework, achieving gains of upto 15\% (PSNR) while
reducing the model size by 20 times.Comment: Preprint version. Accepted at Opti
S-CLIP: Semi-supervised Vision-Language Learning using Few Specialist Captions
Vision-language models, such as contrastive language-image pre-training
(CLIP), have demonstrated impressive results in natural image domains. However,
these models often struggle when applied to specialized domains like remote
sensing, and adapting to such domains is challenging due to the limited number
of image-text pairs available for training. To address this, we propose S-CLIP,
a semi-supervised learning method for training CLIP that utilizes additional
unpaired images. S-CLIP employs two pseudo-labeling strategies specifically
designed for contrastive learning and the language modality. The caption-level
pseudo-label is given by a combination of captions of paired images, obtained
by solving an optimal transport problem between unpaired and paired images. The
keyword-level pseudo-label is given by a keyword in the caption of the nearest
paired image, trained through partial label learning that assumes a candidate
set of labels for supervision instead of the exact one. By combining these
objectives, S-CLIP significantly enhances the training of CLIP using only a few
image-text pairs, as demonstrated in various specialist domains, including
remote sensing, fashion, scientific figures, and comics. For instance, S-CLIP
improves CLIP by 10% for zero-shot classification and 4% for image-text
retrieval on the remote sensing benchmark, matching the performance of
supervised CLIP while using three times fewer image-text pairs.Comment: NeurIPS 202
- …