320 research outputs found
Bridging Sensor Gaps via Single-Direction Tuning for Hyperspectral Image Classification
Recently, some researchers started exploring the use of ViTs in tackling HSI
classification and achieved remarkable results. However, the training of ViT
models requires a considerable number of training samples, while hyperspectral
data, due to its high annotation costs, typically has a relatively small number
of training samples. This contradiction has not been effectively addressed. In
this paper, aiming to solve this problem, we propose the single-direction
tuning (SDT) strategy, which serves as a bridge, allowing us to leverage
existing labeled HSI datasets even RGB datasets to enhance the performance on
new HSI datasets with limited samples. The proposed SDT inherits the idea of
prompt tuning, aiming to reuse pre-trained models with minimal modifications
for adaptation to new tasks. But unlike prompt tuning, SDT is custom-designed
to accommodate the characteristics of HSIs. The proposed SDT utilizes a
parallel architecture, an asynchronous cold-hot gradient update strategy, and
unidirectional interaction. It aims to fully harness the potent representation
learning capabilities derived from training on heterologous, even cross-modal
datasets. In addition, we also introduce a novel Triplet-structured transformer
(Tri-Former), where spectral attention and spatial attention modules are merged
in parallel to construct the token mixing component for reducing computation
cost and a 3D convolution-based channel mixer module is integrated to enhance
stability and keep structure information. Comparison experiments conducted on
three representative HSI datasets captured by different sensors demonstrate the
proposed Tri-Former achieves better performance compared to several
state-of-the-art methods. Homologous, heterologous and cross-modal tuning
experiments verified the effectiveness of the proposed SDT
Unlocking the capabilities of explainable fewshot learning in remote sensing
Recent advancements have significantly improved the efficiency and
effectiveness of deep learning methods for imagebased remote sensing tasks.
However, the requirement for large amounts of labeled data can limit the
applicability of deep neural networks to existing remote sensing datasets. To
overcome this challenge, fewshot learning has emerged as a valuable approach
for enabling learning with limited data. While previous research has evaluated
the effectiveness of fewshot learning methods on satellite based datasets,
little attention has been paid to exploring the applications of these methods
to datasets obtained from UAVs, which are increasingly used in remote sensing
studies. In this review, we provide an up to date overview of both existing and
newly proposed fewshot classification techniques, along with appropriate
datasets that are used for both satellite based and UAV based data. Our
systematic approach demonstrates that fewshot learning can effectively adapt to
the broader and more diverse perspectives that UAVbased platforms can provide.
We also evaluate some SOTA fewshot approaches on a UAV disaster scene
classification dataset, yielding promising results. We emphasize the importance
of integrating XAI techniques like attention maps and prototype analysis to
increase the transparency, accountability, and trustworthiness of fewshot
models for remote sensing. Key challenges and future research directions are
identified, including tailored fewshot methods for UAVs, extending to unseen
tasks like segmentation, and developing optimized XAI techniques suited for
fewshot remote sensing problems. This review aims to provide researchers and
practitioners with an improved understanding of fewshot learnings capabilities
and limitations in remote sensing, while highlighting open problems to guide
future progress in efficient, reliable, and interpretable fewshot methods.Comment: Under review, once the paper is accepted, the copyright will be
transferred to the corresponding journa
Deep Unsupervised Embedding for Remotely Sensed Images Based on Spatially Augmented Momentum Contrast
Convolutional neural networks (CNNs) have achieved great success when characterizing remote sensing (RS) images. However, the lack of sufficient annotated data (together with the high complexity of the RS image domain) often makes supervised and transfer learning schemes limited from an operational perspective. Despite the fact that unsupervised methods can potentially relieve these limitations, they are frequently unable to effectively exploit relevant prior knowledge about the RS domain, which may eventually constrain their final performance. In order to address these challenges, this article presents a new unsupervised deep metric learning model, called spatially augmented momentum contrast (SauMoCo), which has been specially designed to characterize unlabeled RS scenes. Based on the first law of geography, the proposed approach defines spatial augmentation criteria to uncover semantic relationships among land cover tiles. Then, a queue of deep embeddings is constructed to enhance the semantic variety of RS tiles within the considered contrastive learning process, where an auxiliary CNN model serves as an updating mechanism. Our experimental comparison, including different state-of-the-art techniques and benchmark RS image archives, reveals that the proposed approach obtains remarkable performance gains when characterizing unlabeled scenes since it is able to substantially enhance the discrimination ability among complex land cover categories. The source codes of this article will be made available to the RS community for reproducible research
Image Quality Is Not All You Want: Task-Driven Lens Design for Image Classification
In computer vision, it has long been taken for granted that high-quality
images obtained through well-designed camera lenses would lead to superior
results. However, we find that this common perception is not a
"one-size-fits-all" solution for diverse computer vision tasks. We demonstrate
that task-driven and deep-learned simple optics can actually deliver better
visual task performance. The Task-Driven lens design approach, which relies
solely on a well-trained network model for supervision, is proven to be capable
of designing lenses from scratch. Experimental results demonstrate the designed
image classification lens (``TaskLens'') exhibits higher accuracy compared to
conventional imaging-driven lenses, even with fewer lens elements. Furthermore,
we show that our TaskLens is compatible with various network models while
maintaining enhanced classification accuracy. We propose that TaskLens holds
significant potential, particularly when physical dimensions and cost are
severely constrained.Comment: Use an image classification network to supervise the lens design from
scratch. The final designs can achieve higher accuracy with fewer optical
element
Attention mechanism in deep neural networks for computer vision tasks
“Attention mechanism, which is one of the most important algorithms in the deep Learning community, was initially designed in the natural language processing for enhancing the feature representation of key sentence fragments over the context. In recent years, the attention mechanism has been widely adopted in solving computer vision tasks by guiding deep neural networks (DNNs) to focus on specific image features for better understanding the semantic information of the image. However, the attention mechanism is not only capable of helping DNNs understand semantics, but also useful for the feature fusion, visual cue discovering, and temporal information selection, which are seldom researched. In this study, we take the classic attention mechanism a step further by proposing the Semantic Attention Guidance Unit (SAGU) for multi-level feature fusion to tackle the challenging Biomedical Image Segmentation task. Furthermore, we propose a novel framework that consists of (1) Semantic Attention Unit (SAU), which is an advanced version of SAGU for adaptively bringing high-level semantics to mid-level features, (2) Two-level Spatial Attention Module (TSPAM) for discovering multiple visual cues within the image, and (3) Temporal Attention Module (TAM) for temporal information selection to solve the Videobased Person Re-identification task. To validate our newly proposed attention mechanisms, extensive experiments are conducted on challenging datasets. Our methods obtain competitive performance and outperform state-of-the-art methods. Selective publications are also presented in the Appendix”--Abstract, page iii
Deep learning methods for modelling forest biomass and structures from hyperspectral imagery
Forests affect the environment and ecosystems in multiple ways. Hence, understanding the forest processes and vegetation characteristics help us protect the environment better, reserve the biodiversity, and mitigate the hazardous impacts of climate change. There are studies in hyperspectral remote sensing that employ both empirical and artificial intelligence (AI) methods to analyze and predict the vegetation parameters. However, these methods have weaknesses. First, the empirical methods are inefficient because they cannot fully utilize a large amount of hyperspectral data. Secondly, even though the existing AI-based methods can achieve remarkable results, they are only validated on small-scale datasets that have simple forest structures. Thus, a robust technique that can effectively model complex forest structures on large-scale datasets is an open challenge.
This thesis directly addresses the challenge by proposing a novel deep learning architecture that can jointly learn and model four discrete and twelve continuous forest parameters. The final model is comprised of three 3D convolution layers, a 3D multi-scale convolution block, a shared fully-connected layer, and two fully-connected layers for each learning task. The model uses a loss, namely focal loss, to address class imbalance problem and the gradient normalization for multi-task learning.
Then, we record and compare the results of our comprehensive experiments. Overall, the proposed model reaches 78.32% class-balanced accuracy for the four classification tasks. For the regression tasks, the model achieves a notably low average mean absolute error (0.052) and high Pearson correlation coefficient (0.9) between predicted and target labels. In the end, the shortcomings of the thesis work are discussed and potential research areas for future work are suggested
Knowledge Distillation and Continual Learning for Optimized Deep Neural Networks
Over the past few years, deep learning (DL) has been achieving state-of-theart performance on various human tasks such as speech generation, language translation, image segmentation, and object detection. While traditional machine learning models require hand-crafted features, deep learning algorithms can automatically extract discriminative features and learn complex knowledge from large datasets. This powerful learning ability makes deep learning models attractive to both academia and big corporations.
Despite their popularity, deep learning methods still have two main limitations: large memory consumption and catastrophic knowledge forgetting. First, DL algorithms use very deep neural networks (DNNs) with many billion parameters, which have a big model size and a slow inference speed. This restricts the application of DNNs in resource-constraint devices such as mobile phones and autonomous vehicles. Second, DNNs are known to suffer from catastrophic forgetting. When incrementally learning new tasks, the model performance on old tasks significantly drops. The ability to accommodate new knowledge while retaining previously learned knowledge is called continual learning. Since the realworld environments in which the model operates are always evolving, a robust neural network needs to have this continual learning ability for adapting to new changes
- …