554 research outputs found

    TransY-Net:Learning Fully Transformer Networks for Change Detection of Remote Sensing Images

    Full text link
    In the remote sensing field, Change Detection (CD) aims to identify and localize the changed regions from dual-phase images over the same places. Recently, it has achieved great progress with the advances of deep learning. However, current methods generally deliver incomplete CD regions and irregular CD boundaries due to the limited representation ability of the extracted visual features. To relieve these issues, in this work we propose a novel Transformer-based learning framework named TransY-Net for remote sensing image CD, which improves the feature extraction from a global view and combines multi-level visual features in a pyramid manner. More specifically, the proposed framework first utilizes the advantages of Transformers in long-range dependency modeling. It can help to learn more discriminative global-level features and obtain complete CD regions. Then, we introduce a novel pyramid structure to aggregate multi-level visual features from Transformers for feature enhancement. The pyramid structure grafted with a Progressive Attention Module (PAM) can improve the feature representation ability with additional inter-dependencies through spatial and channel attentions. Finally, to better train the whole framework, we utilize the deeply-supervised learning with multiple boundary-aware loss functions. Extensive experiments demonstrate that our proposed method achieves a new state-of-the-art performance on four optical and two SAR image CD benchmarks. The source code is released at https://github.com/Drchip61/TransYNet.Comment: This work is accepted by TGRS2023. It is an extension of our ACCV2022 paper and arXiv:2210.0075

    Fusion of visible and thermal images improves automated detection and classification of animals for drone surveys

    Get PDF
    Visible and thermal images acquired from drones (unoccupied aircraft systems) have substantially improved animal monitoring. Combining complementary information from both image types provides a powerful approach for automating detection and classification of multiple animal species to augment drone surveys. We compared eight image fusion methods using thermal and visible drone images combined with two supervised deep learning models, to evaluate the detection and classification of white-tailed deer (Odocoileus virginianus), domestic cow (Bos taurus), and domestic horse (Equus caballus). We classified visible and thermal images separately and compared them with the results of image fusion. Fused images provided minimal improvement for cows and horses compared to visible images alone, likely because the size, shape, and color of these species made them conspicuous against the background. For white-tailed deer, which were typically cryptic against their backgrounds and often in shadows in visible images, the added information from thermal images improved detection and classification in fusion methods from 15 to 85%. Our results suggest that image fusion is ideal for surveying animals inconspicuous from their backgrounds, and our approach uses few image pairs to train compared to typical machine-learning methods. We discuss computational and field considerations to improve drone surveys using our fusion approach. Supplemental files attached below

    Three-Dimensional Medical Image Fusion with Deformable Cross-Attention

    Full text link
    Multimodal medical image fusion plays an instrumental role in several areas of medical image processing, particularly in disease recognition and tumor detection. Traditional fusion methods tend to process each modality independently before combining the features and reconstructing the fusion image. However, this approach often neglects the fundamental commonalities and disparities between multimodal information. Furthermore, the prevailing methodologies are largely confined to fusing two-dimensional (2D) medical image slices, leading to a lack of contextual supervision in the fusion images and subsequently, a decreased information yield for physicians relative to three-dimensional (3D) images. In this study, we introduce an innovative unsupervised feature mutual learning fusion network designed to rectify these limitations. Our approach incorporates a Deformable Cross Feature Blend (DCFB) module that facilitates the dual modalities in discerning their respective similarities and differences. We have applied our model to the fusion of 3D MRI and PET images obtained from 660 patients in the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset. Through the application of the DCFB module, our network generates high-quality MRI-PET fusion images. Experimental results demonstrate that our method surpasses traditional 2D image fusion methods in performance metrics such as Peak Signal to Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM). Importantly, the capacity of our method to fuse 3D images enhances the information available to physicians and researchers, thus marking a significant step forward in the field. The code will soon be available online

    AdaFuse: Adaptive Medical Image Fusion Based on Spatial-Frequential Cross Attention

    Full text link
    Multi-modal medical image fusion is essential for the precise clinical diagnosis and surgical navigation since it can merge the complementary information in multi-modalities into a single image. The quality of the fused image depends on the extracted single modality features as well as the fusion rules for multi-modal information. Existing deep learning-based fusion methods can fully exploit the semantic features of each modality, they cannot distinguish the effective low and high frequency information of each modality and fuse them adaptively. To address this issue, we propose AdaFuse, in which multimodal image information is fused adaptively through frequency-guided attention mechanism based on Fourier transform. Specifically, we propose the cross-attention fusion (CAF) block, which adaptively fuses features of two modalities in the spatial and frequency domains by exchanging key and query values, and then calculates the cross-attention scores between the spatial and frequency features to further guide the spatial-frequential information fusion. The CAF block enhances the high-frequency features of the different modalities so that the details in the fused images can be retained. Moreover, we design a novel loss function composed of structure loss and content loss to preserve both low and high frequency information. Extensive comparison experiments on several datasets demonstrate that the proposed method outperforms state-of-the-art methods in terms of both visual quality and quantitative metrics. The ablation experiments also validate the effectiveness of the proposed loss and fusion strategy
    corecore