215 research outputs found
TransY-Net:Learning Fully Transformer Networks for Change Detection of Remote Sensing Images
In the remote sensing field, Change Detection (CD) aims to identify and
localize the changed regions from dual-phase images over the same places.
Recently, it has achieved great progress with the advances of deep learning.
However, current methods generally deliver incomplete CD regions and irregular
CD boundaries due to the limited representation ability of the extracted visual
features. To relieve these issues, in this work we propose a novel
Transformer-based learning framework named TransY-Net for remote sensing image
CD, which improves the feature extraction from a global view and combines
multi-level visual features in a pyramid manner. More specifically, the
proposed framework first utilizes the advantages of Transformers in long-range
dependency modeling. It can help to learn more discriminative global-level
features and obtain complete CD regions. Then, we introduce a novel pyramid
structure to aggregate multi-level visual features from Transformers for
feature enhancement. The pyramid structure grafted with a Progressive Attention
Module (PAM) can improve the feature representation ability with additional
inter-dependencies through spatial and channel attentions. Finally, to better
train the whole framework, we utilize the deeply-supervised learning with
multiple boundary-aware loss functions. Extensive experiments demonstrate that
our proposed method achieves a new state-of-the-art performance on four optical
and two SAR image CD benchmarks. The source code is released at
https://github.com/Drchip61/TransYNet.Comment: This work is accepted by TGRS2023. It is an extension of our ACCV2022
paper and arXiv:2210.0075
Synthetic Aperture Radar (SAR) Meets Deep Learning
This reprint focuses on the application of the combination of synthetic aperture radars and depth learning technology. It aims to further promote the development of SAR image intelligent interpretation technology. A synthetic aperture radar (SAR) is an important active microwave imaging sensor, whose all-day and all-weather working capacity give it an important place in the remote sensing community. Since the United States launched the first SAR satellite, SAR has received much attention in the remote sensing community, e.g., in geological exploration, topographic mapping, disaster forecast, and traffic monitoring. It is valuable and meaningful, therefore, to study SAR-based remote sensing applications. In recent years, deep learning represented by convolution neural networks has promoted significant progress in the computer vision community, e.g., in face recognition, the driverless field and Internet of things (IoT). Deep learning can enable computational models with multiple processing layers to learn data representations with multiple-level abstractions. This can greatly improve the performance of various applications. This reprint provides a platform for researchers to handle the above significant challenges and present their innovative and cutting-edge research results when applying deep learning to SAR in various manuscript types, e.g., articles, letters, reviews and technical reports
Advancing Land Cover Mapping in Remote Sensing with Deep Learning
Automatic mapping of land cover in remote sensing data plays an increasingly significant role in several earth observation (EO) applications, such as sustainable development, autonomous agriculture, and urban planning. Due to the complexity of the real ground surface and environment, accurate classification of land cover types is facing many challenges. This thesis provides novel deep learning-based solutions to land cover mapping challenges such as how to deal with intricate objects and imbalanced classes in multi-spectral and high-spatial resolution remote sensing data.
The first work presents a novel model to learn richer multi-scale and global contextual representations in very high-resolution remote sensing images, namely the dense dilated convolutions' merging (DDCM) network. The proposed method is light-weighted, flexible and extendable, so that it can be used as a simple yet effective encoder and decoder module to address different classification and semantic mapping challenges. Intensive experiments on different benchmark remote sensing datasets demonstrate that the proposed method can achieve better performance but consume much fewer computation resources compared with other published methods.
Next, a novel graph model is developed for capturing long-range pixel dependencies in remote sensing images to improve land cover mapping. One key component in the method is the self-constructing graph (SCG) module that can effectively construct global context relations (latent graph structure) without requiring prior knowledge graphs. The proposed SCG-based models achieved competitive performance on different representative remote sensing datasets with faster training and lower computational cost compared to strong baseline models.
The third work introduces a new framework, namely the multi-view self-constructing graph (MSCG) network, to extend the vanilla SCG model to be able to capture multi-view context representations with rotation invariance to achieve improved segmentation performance. Meanwhile, a novel adaptive class weighting loss function is developed to alleviate the issue of class imbalance commonly found in EO datasets for semantic segmentation. Experiments on benchmark data demonstrate the proposed framework is computationally efficient and robust to produce improved segmentation results for imbalanced classes.
To address the key challenges in multi-modal land cover mapping of remote sensing data, namely, 'what', 'how' and 'where' to effectively fuse multi-source features and to efficiently learn optimal joint representations of different modalities, the last work presents a compact and scalable multi-modal deep learning framework (MultiModNet) based on two novel modules: the pyramid attention fusion module and the gated fusion unit. The proposed MultiModNet outperforms the strong baselines on two representative remote sensing datasets with fewer parameters and at a lower computational cost. Extensive ablation studies also validate the effectiveness and flexibility of the framework
More Diverse Means Better: Multimodal Deep Learning Meets Remote Sensing Imagery Classification
Classification and identification of the materials lying over or beneath the
Earth's surface have long been a fundamental but challenging research topic in
geoscience and remote sensing (RS) and have garnered a growing concern owing to
the recent advancements of deep learning techniques. Although deep networks
have been successfully applied in single-modality-dominated classification
tasks, yet their performance inevitably meets the bottleneck in complex scenes
that need to be finely classified, due to the limitation of information
diversity. In this work, we provide a baseline solution to the aforementioned
difficulty by developing a general multimodal deep learning (MDL) framework. In
particular, we also investigate a special case of multi-modality learning (MML)
-- cross-modality learning (CML) that exists widely in RS image classification
applications. By focusing on "what", "where", and "how" to fuse, we show
different fusion strategies as well as how to train deep networks and build the
network architecture. Specifically, five fusion architectures are introduced
and developed, further being unified in our MDL framework. More significantly,
our framework is not only limited to pixel-wise classification tasks but also
applicable to spatial information modeling with convolutional neural networks
(CNNs). To validate the effectiveness and superiority of the MDL framework,
extensive experiments related to the settings of MML and CML are conducted on
two different multimodal RS datasets. Furthermore, the codes and datasets will
be available at https://github.com/danfenghong/IEEE_TGRS_MDL-RS, contributing
to the RS community
Efficient Building Extraction for High Spatial Resolution Images Based on Dual Attention Network
Building extraction with high spatial resolution images becomes an important research in the field of computer vision for urban-related applications. Due to the rich detailed information and complex texture features presented in high spatial resolution images, the distribution of buildings is non-proportional and their difference of scales is obvious. General methods often provide confusion results with other ground objects. In this paper, a building extraction framework based on deep residual neural network with a self-attention mechanism is proposed. This mechanism contains two parts: one is the spatial attention module, which is used to aggregate and relate the local and global features at each position (short and long distance context information) of buildings; the other is channel attention module, in which the representation of comprehensive features (includes color, texture, geometric and high-level semantic feature) are improved. The combination of the dual attention modules makes buildings can be extracted from the complex backgrounds. The effectiveness of our method is validated by the experiments counted on a wide range high spatial resolution image, i.e., Jilin-1 Gaofen 02A imagery. Compared with some state-of-the-art segmentation methods, i.e., DeepLab-v3+, PSPNet, and PSANet algorithms, the proposed dual attention network-based method achieved high accuracy and intersection-over-union for extraction performance and show finest recognition integrity of buildings
Review on Active and Passive Remote Sensing Techniques for Road Extraction
Digital maps of road networks are a vital part of digital cities and intelligent transportation. In this paper, we provide a comprehensive review on road extraction based on various remote sensing data sources, including high-resolution images, hyperspectral images, synthetic aperture radar images, and light detection and ranging. This review is divided into three parts. Part 1 provides an overview of the existing data acquisition techniques for road extraction, including data acquisition methods, typical sensors, application status, and prospects. Part 2 underlines the main road extraction methods based on four data sources. In this section, road extraction methods based on different data sources are described and analysed in detail. Part 3 presents the combined application of multisource data for road extraction. Evidently, different data acquisition techniques have unique advantages, and the combination of multiple sources can improve the accuracy of road extraction. The main aim of this review is to provide a comprehensive reference for research on existing road extraction technologies.Peer reviewe
Convolutional Neural Networks for Water segmentation using Sentinel-2 Red, Green, Blue (RGB) composites and derived Spectral Indices
Near-real time water segmentation with medium resolution satellite imagery plays a critical role in water management. Automated water segmentation of satellite imagery has traditionally been achieved using spectral indices. Spectral water segmentation is limited by environmental factors and requires human expertise to be applied effectively. In recent years, the use of convolutional neural networks (CNN’s) for water segmentation has been successful when used on high-resolution satellite imagery, but to a lesser extent for medium resolution imagery. Existing studies have been limited to geographically localized datasets and reported metrics have been benchmarked against a limited range of spectral indices. This study seeks to determine if a single CNN based on Red, Green, Blue (RGB) image classification can effectively segment water on a global scale and outperform traditional spectral methods. Additionally, this study evaluates the extent to which smaller datasets (of very complex pattern, e.g harbour megacities) can be used to improve globally applicable CNNs within a specific region. Multispectral imagery from the European Space Agency, Sentinel-2 satellite (10 m spatial resolution) was sourced. Test sites were selected in Florida, New York, and Shanghai to represent a globally diverse range of waterbody typologies. Region-specific spectral water segmentation algorithms were developed on each test site, to represent benchmarks of spectral index performance. DeepLabV3-ResNet101 was trained on 33,311 semantically labelled true-colour samples. The resulting model was retrained on three smaller subsets of the data, specific to New York, Shanghai and Florida. CNN predictions reached a maximum mean intersection over union result of 0.986 and F1-Score of 0.983. At the Shanghai test site, the CNN’s predictions outperformed the spectral benchmark, primarily due to the CNN’s ability to process contextual features at multiple scales. In all test cases, retraining the networks to localized subsets of the dataset improved the localized region’s segmentation predictions. The CNN’s presented are suitable for cloud-based deployment and could contribute to the wider use of satellite imagery for water management
A review of technical factors to consider when designing neural networks for semantic segmentation of Earth Observation imagery
Semantic segmentation (classification) of Earth Observation imagery is a
crucial task in remote sensing. This paper presents a comprehensive review of
technical factors to consider when designing neural networks for this purpose.
The review focuses on Convolutional Neural Networks (CNNs), Recurrent Neural
Networks (RNNs), Generative Adversarial Networks (GANs), and transformer
models, discussing prominent design patterns for these ANN families and their
implications for semantic segmentation. Common pre-processing techniques for
ensuring optimal data preparation are also covered. These include methods for
image normalization and chipping, as well as strategies for addressing data
imbalance in training samples, and techniques for overcoming limited data,
including augmentation techniques, transfer learning, and domain adaptation. By
encompassing both the technical aspects of neural network design and the
data-related considerations, this review provides researchers and practitioners
with a comprehensive and up-to-date understanding of the factors involved in
designing effective neural networks for semantic segmentation of Earth
Observation imagery.Comment: 145 pages with 32 figure
- …