3,156 research outputs found
A review of technical factors to consider when designing neural networks for semantic segmentation of Earth Observation imagery
Semantic segmentation (classification) of Earth Observation imagery is a
crucial task in remote sensing. This paper presents a comprehensive review of
technical factors to consider when designing neural networks for this purpose.
The review focuses on Convolutional Neural Networks (CNNs), Recurrent Neural
Networks (RNNs), Generative Adversarial Networks (GANs), and transformer
models, discussing prominent design patterns for these ANN families and their
implications for semantic segmentation. Common pre-processing techniques for
ensuring optimal data preparation are also covered. These include methods for
image normalization and chipping, as well as strategies for addressing data
imbalance in training samples, and techniques for overcoming limited data,
including augmentation techniques, transfer learning, and domain adaptation. By
encompassing both the technical aspects of neural network design and the
data-related considerations, this review provides researchers and practitioners
with a comprehensive and up-to-date understanding of the factors involved in
designing effective neural networks for semantic segmentation of Earth
Observation imagery.Comment: 145 pages with 32 figure
GFF: Gated Fully Fusion for Semantic Segmentation
Semantic segmentation generates comprehensive understanding of scenes through
densely predicting the category for each pixel. High-level features from Deep
Convolutional Neural Networks already demonstrate their effectiveness in
semantic segmentation tasks, however the coarse resolution of high-level
features often leads to inferior results for small/thin objects where detailed
information is important. It is natural to consider importing low level
features to compensate for the lost detailed information in high-level
features.Unfortunately, simply combining multi-level features suffers from the
semantic gap among them. In this paper, we propose a new architecture, named
Gated Fully Fusion (GFF), to selectively fuse features from multiple levels
using gates in a fully connected way. Specifically, features at each level are
enhanced by higher-level features with stronger semantics and lower-level
features with more details, and gates are used to control the propagation of
useful information which significantly reduces the noises during fusion. We
achieve the state of the art results on four challenging scene parsing datasets
including Cityscapes, Pascal Context, COCO-stuff and ADE20K.Comment: accepted by AAAI-2020(oral
Advancing Land Cover Mapping in Remote Sensing with Deep Learning
Automatic mapping of land cover in remote sensing data plays an increasingly significant role in several earth observation (EO) applications, such as sustainable development, autonomous agriculture, and urban planning. Due to the complexity of the real ground surface and environment, accurate classification of land cover types is facing many challenges. This thesis provides novel deep learning-based solutions to land cover mapping challenges such as how to deal with intricate objects and imbalanced classes in multi-spectral and high-spatial resolution remote sensing data.
The first work presents a novel model to learn richer multi-scale and global contextual representations in very high-resolution remote sensing images, namely the dense dilated convolutions' merging (DDCM) network. The proposed method is light-weighted, flexible and extendable, so that it can be used as a simple yet effective encoder and decoder module to address different classification and semantic mapping challenges. Intensive experiments on different benchmark remote sensing datasets demonstrate that the proposed method can achieve better performance but consume much fewer computation resources compared with other published methods.
Next, a novel graph model is developed for capturing long-range pixel dependencies in remote sensing images to improve land cover mapping. One key component in the method is the self-constructing graph (SCG) module that can effectively construct global context relations (latent graph structure) without requiring prior knowledge graphs. The proposed SCG-based models achieved competitive performance on different representative remote sensing datasets with faster training and lower computational cost compared to strong baseline models.
The third work introduces a new framework, namely the multi-view self-constructing graph (MSCG) network, to extend the vanilla SCG model to be able to capture multi-view context representations with rotation invariance to achieve improved segmentation performance. Meanwhile, a novel adaptive class weighting loss function is developed to alleviate the issue of class imbalance commonly found in EO datasets for semantic segmentation. Experiments on benchmark data demonstrate the proposed framework is computationally efficient and robust to produce improved segmentation results for imbalanced classes.
To address the key challenges in multi-modal land cover mapping of remote sensing data, namely, 'what', 'how' and 'where' to effectively fuse multi-source features and to efficiently learn optimal joint representations of different modalities, the last work presents a compact and scalable multi-modal deep learning framework (MultiModNet) based on two novel modules: the pyramid attention fusion module and the gated fusion unit. The proposed MultiModNet outperforms the strong baselines on two representative remote sensing datasets with fewer parameters and at a lower computational cost. Extensive ablation studies also validate the effectiveness and flexibility of the framework
A Hyper-pixel-wise Contrastive Learning Augmented Segmentation Network for Old Landslide Detection Using High-Resolution Remote Sensing Images and Digital Elevation Model Data
As a harzard disaster, landslide often brings tremendous losses to humanity,
so it's necessary to achieve reliable detection of landslide. However, the
problems of visual blur and small-sized dataset cause great challenges for old
landslide detection task when using remote sensing data. To reliably extract
semantic features, a hyper-pixel-wise contrastive learning augmented
segmentation network (HPCL-Net) is proposed, which augments the local salient
feature extraction from the boundaries of landslides through HPCL and fuses the
heterogeneous infromation in the semantic space from High-Resolution Remote
Sensing Images and Digital Elevation Model Data data. For full utilization of
the precious samples, a global hyper-pixel-wise sample pair queues-based
contrastive learning method, which includes the construction of global queues
that store hyper-pixel-wise samples and the updating scheme of a momentum
encoder, is developed, reliably enhancing the extraction ability of semantic
features. The proposed HPCL-Net is evaluated on a Loess Plateau old landslide
dataset and experiment results show that the model greatly improves the
reliablity of old landslide detection compared to the previous old landslide
segmentation model, where mIoU metric is increased from 0.620 to 0.651,
Landslide IoU metric is increased from 0.334 to 0.394 and F1-score metric is
increased from 0.501 to 0.565
Semantic Labeling of High Resolution Images Using EfficientUNets and Transformers
Semantic segmentation necessitates approaches that learn high-level
characteristics while dealing with enormous amounts of data. Convolutional
neural networks (CNNs) can learn unique and adaptive features to achieve this
aim. However, due to the large size and high spatial resolution of remote
sensing images, these networks cannot analyze an entire scene efficiently.
Recently, deep transformers have proven their capability to record global
interactions between different objects in the image. In this paper, we propose
a new segmentation model that combines convolutional neural networks with
transformers, and show that this mixture of local and global feature extraction
techniques provides significant advantages in remote sensing segmentation. In
addition, the proposed model includes two fusion layers that are designed to
represent multi-modal inputs and output of the network efficiently. The input
fusion layer extracts feature maps summarizing the relationship between image
content and elevation maps (DSM). The output fusion layer uses a novel
multi-task segmentation strategy where class labels are identified using
class-specific feature extraction layers and loss functions. Finally, a
fast-marching method is used to convert all unidentified class labels to their
closest known neighbors. Our results demonstrate that the proposed methodology
improves segmentation accuracy compared to state-of-the-art techniques
DFPENet-geology: A Deep Learning Framework for High Precision Recognition and Segmentation of Co-seismic Landslides
The following lists two main reasons for withdrawal for the public. 1. There
are some problems in the method and results, and there is a lot of room for
improvement. In terms of method, "Pre-trained Datasets (PD)" represents
selecting a small amount from the online test set, which easily causes the
model to overfit the online test set and could not obtain robust performance.
More importantly, the proposed DFPENet has a high redundancy by combining the
Attention Gate Mechanism and Gate Convolution Networks, and we need to revisit
the section of geological feature fusion, in terms of results, we need to
further improve and refine. 2. arXiv is an open-access repository of electronic
preprints without peer reviews. However, for our own research, we need experts
to provide comments on my work whether negative or positive. I then would use
their comments to significantly improve this manuscript. Therefore, we finally
decided to withdraw this manuscript in arXiv, and we will update to arXiv with
the final accepted manuscript to facilitate more researchers to use our
proposed comprehensive and general scheme to recognize and segment seismic
landslides more efficiently.Comment: 1. There are some problems in the method and results, and there is a
lot of room for improvement. Overall, the proposed DFPENet has a high
redundancy by combining the Attention Gate Mechanism and Gate Convolution
Networks, and we need to further improve and refine the results. 2. For our
own research, we need experts to provide comments on my work whether negative
or positiv
Contrastive Transformer: Contrastive Learning Scheme with Transformer innate Patches
This paper presents Contrastive Transformer, a contrastive learning scheme
using the Transformer innate patches. Contrastive Transformer enables existing
contrastive learning techniques, often used for image classification, to
benefit dense downstream prediction tasks such as semantic segmentation. The
scheme performs supervised patch-level contrastive learning, selecting the
patches based on the ground truth mask, subsequently used for hard-negative and
hard-positive sampling. The scheme applies to all vision-transformer
architectures, is easy to implement, and introduces minimal additional memory
footprint. Additionally, the scheme removes the need for huge batch sizes, as
each patch is treated as an image.
We apply and test Contrastive Transformer for the case of aerial image
segmentation, known for low-resolution data, large class imbalance, and similar
semantic classes. We perform extensive experiments to show the efficacy of the
Contrastive Transformer scheme on the ISPRS Potsdam aerial image segmentation
dataset. Additionally, we show the generalizability of our scheme by applying
it to multiple inherently different Transformer architectures. Ultimately, the
results show a consistent increase in mean IoU across all classes.Comment: 7 pages, 3 figure
RemoteNet: Remote Sensing Image Segmentation Network based on Global-Local Information
Remotely captured images possess an immense scale and object appearance
variability due to the complex scene. It becomes challenging to capture the
underlying attributes in the global and local context for their segmentation.
Existing networks struggle to capture the inherent features due to the
cluttered background. To address these issues, we propose a remote sensing
image segmentation network, RemoteNet, for semantic segmentation of remote
sensing images. We capture the global and local features by leveraging the
benefits of the transformer and convolution mechanisms. RemoteNet is an
encoder-decoder design that uses multi-scale features. We construct an
attention map module to generate channel-wise attention scores for fusing these
features. We construct a global-local transformer block (GLTB) in the decoder
network to support learning robust representations during a decoding phase.
Further, we designed a feature refinement module to refine the fused output of
the shallow stage encoder feature and the deepest GLTB feature of the decoder.
Experimental findings on the two public datasets show the effectiveness of the
proposed RemoteNet
- …