554 research outputs found
TransY-Net:Learning Fully Transformer Networks for Change Detection of Remote Sensing Images
In the remote sensing field, Change Detection (CD) aims to identify and
localize the changed regions from dual-phase images over the same places.
Recently, it has achieved great progress with the advances of deep learning.
However, current methods generally deliver incomplete CD regions and irregular
CD boundaries due to the limited representation ability of the extracted visual
features. To relieve these issues, in this work we propose a novel
Transformer-based learning framework named TransY-Net for remote sensing image
CD, which improves the feature extraction from a global view and combines
multi-level visual features in a pyramid manner. More specifically, the
proposed framework first utilizes the advantages of Transformers in long-range
dependency modeling. It can help to learn more discriminative global-level
features and obtain complete CD regions. Then, we introduce a novel pyramid
structure to aggregate multi-level visual features from Transformers for
feature enhancement. The pyramid structure grafted with a Progressive Attention
Module (PAM) can improve the feature representation ability with additional
inter-dependencies through spatial and channel attentions. Finally, to better
train the whole framework, we utilize the deeply-supervised learning with
multiple boundary-aware loss functions. Extensive experiments demonstrate that
our proposed method achieves a new state-of-the-art performance on four optical
and two SAR image CD benchmarks. The source code is released at
https://github.com/Drchip61/TransYNet.Comment: This work is accepted by TGRS2023. It is an extension of our ACCV2022
paper and arXiv:2210.0075
Fusion of visible and thermal images improves automated detection and classification of animals for drone surveys
Visible and thermal images acquired from drones (unoccupied aircraft systems) have substantially improved animal monitoring. Combining complementary information from both image types provides a powerful approach for automating detection and classification of multiple animal species to augment drone surveys. We compared eight image fusion methods using thermal and visible drone images combined with two supervised deep learning models, to evaluate the detection and classification of white-tailed deer (Odocoileus virginianus), domestic cow (Bos taurus), and domestic horse (Equus caballus). We classified visible and thermal images separately and compared them with the results of image fusion. Fused images provided minimal improvement for cows and horses compared to visible images alone, likely because the size, shape, and color of these species made them conspicuous against the background. For white-tailed deer, which were typically cryptic against their backgrounds and often in shadows in visible images, the added information from thermal images improved detection and classification in fusion methods from 15 to 85%. Our results suggest that image fusion is ideal for surveying animals inconspicuous from their backgrounds, and our approach uses few image pairs to train compared to typical machine-learning methods. We discuss computational and field considerations to improve drone surveys using our fusion approach.
Supplemental files attached below
Three-Dimensional Medical Image Fusion with Deformable Cross-Attention
Multimodal medical image fusion plays an instrumental role in several areas
of medical image processing, particularly in disease recognition and tumor
detection. Traditional fusion methods tend to process each modality
independently before combining the features and reconstructing the fusion
image. However, this approach often neglects the fundamental commonalities and
disparities between multimodal information. Furthermore, the prevailing
methodologies are largely confined to fusing two-dimensional (2D) medical image
slices, leading to a lack of contextual supervision in the fusion images and
subsequently, a decreased information yield for physicians relative to
three-dimensional (3D) images. In this study, we introduce an innovative
unsupervised feature mutual learning fusion network designed to rectify these
limitations. Our approach incorporates a Deformable Cross Feature Blend (DCFB)
module that facilitates the dual modalities in discerning their respective
similarities and differences. We have applied our model to the fusion of 3D MRI
and PET images obtained from 660 patients in the Alzheimer's Disease
Neuroimaging Initiative (ADNI) dataset. Through the application of the DCFB
module, our network generates high-quality MRI-PET fusion images. Experimental
results demonstrate that our method surpasses traditional 2D image fusion
methods in performance metrics such as Peak Signal to Noise Ratio (PSNR) and
Structural Similarity Index Measure (SSIM). Importantly, the capacity of our
method to fuse 3D images enhances the information available to physicians and
researchers, thus marking a significant step forward in the field. The code
will soon be available online
AdaFuse: Adaptive Medical Image Fusion Based on Spatial-Frequential Cross Attention
Multi-modal medical image fusion is essential for the precise clinical
diagnosis and surgical navigation since it can merge the complementary
information in multi-modalities into a single image. The quality of the fused
image depends on the extracted single modality features as well as the fusion
rules for multi-modal information. Existing deep learning-based fusion methods
can fully exploit the semantic features of each modality, they cannot
distinguish the effective low and high frequency information of each modality
and fuse them adaptively. To address this issue, we propose AdaFuse, in which
multimodal image information is fused adaptively through frequency-guided
attention mechanism based on Fourier transform. Specifically, we propose the
cross-attention fusion (CAF) block, which adaptively fuses features of two
modalities in the spatial and frequency domains by exchanging key and query
values, and then calculates the cross-attention scores between the spatial and
frequency features to further guide the spatial-frequential information fusion.
The CAF block enhances the high-frequency features of the different modalities
so that the details in the fused images can be retained. Moreover, we design a
novel loss function composed of structure loss and content loss to preserve
both low and high frequency information. Extensive comparison experiments on
several datasets demonstrate that the proposed method outperforms
state-of-the-art methods in terms of both visual quality and quantitative
metrics. The ablation experiments also validate the effectiveness of the
proposed loss and fusion strategy
- …