13,390 research outputs found
LEDCNet: A Lightweight and Efficient Semantic Segmentation Algorithm Using Dual Context Module for Extracting Ground Objects from UAV Aerial Remote Sensing Images
Semantic segmentation for extracting ground objects, such as road and house,
from UAV remote sensing images by deep learning becomes a more efficient and
convenient method than traditional manual segmentation in surveying and mapping
field. In recent years, with the deepening of layers and boosting of
complexity, the number of parameters in convolution-based semantic segmentation
neural networks considerably increases, which is obviously not conducive to the
wide application especially in the industry. In order to make the model
lightweight and improve the model accuracy, a new lightweight and efficient
network for the extraction of ground objects from UAV remote sensing images,
named LEDCNet, is proposed. The proposed network adopts an encoder-decoder
architecture in which a powerful lightweight backbone network called LDCNet is
developed as the encoder. We would extend the LDCNet become a new generation
backbone network of lightweight semantic segmentation algorithms. In the
decoder part, the dual multi-scale context modules which consist of the ASPP
module and the OCR module are designed to capture more context information from
feature maps of UAV remote sensing images. Between ASPP and OCR, a FPN module
is used to and fuse multi-scale features extracting from ASPP. A private
dataset of remote sensing images taken by UAV which contains 2431 training
sets, 945 validation sets, and 475 test sets is constructed. The proposed model
performs well on this dataset, with only 1.4M parameters and 5.48G FLOPs,
achieving an mIoU of 71.12%. The more extensive experiments on the public
LoveDA dataset and CITY-OSM dataset to further verify the effectiveness of the
proposed model with excellent results on mIoU of 65.27% and 74.39%,
respectively. All the experimental results show the proposed model can not only
lighten the network with few parameters but also improve the segmentation
performance.Comment: 11 page
A Comparison and Strategy of Semantic Segmentation on Remote Sensing Images
In recent years, with the development of aerospace technology, we use more
and more images captured by satellites to obtain information. But a large
number of useless raw images, limited data storage resource and poor
transmission capability on satellites hinder our use of valuable images.
Therefore, it is necessary to deploy an on-orbit semantic segmentation model to
filter out useless images before data transmission. In this paper, we present a
detailed comparison on the recent deep learning models. Considering the
computing environment of satellites, we compare methods from accuracy,
parameters and resource consumption on the same public dataset. And we also
analyze the relation between them. Based on experimental results, we further
propose a viable on-orbit semantic segmentation strategy. It will be deployed
on the TianZhi-2 satellite which supports deep learning methods and will be
lunched soon.Comment: 8 pages, 3 figures, ICNC-FSKD 201
Advancing Land Cover Mapping in Remote Sensing with Deep Learning
Automatic mapping of land cover in remote sensing data plays an increasingly significant role in several earth observation (EO) applications, such as sustainable development, autonomous agriculture, and urban planning. Due to the complexity of the real ground surface and environment, accurate classification of land cover types is facing many challenges. This thesis provides novel deep learning-based solutions to land cover mapping challenges such as how to deal with intricate objects and imbalanced classes in multi-spectral and high-spatial resolution remote sensing data.
The first work presents a novel model to learn richer multi-scale and global contextual representations in very high-resolution remote sensing images, namely the dense dilated convolutions' merging (DDCM) network. The proposed method is light-weighted, flexible and extendable, so that it can be used as a simple yet effective encoder and decoder module to address different classification and semantic mapping challenges. Intensive experiments on different benchmark remote sensing datasets demonstrate that the proposed method can achieve better performance but consume much fewer computation resources compared with other published methods.
Next, a novel graph model is developed for capturing long-range pixel dependencies in remote sensing images to improve land cover mapping. One key component in the method is the self-constructing graph (SCG) module that can effectively construct global context relations (latent graph structure) without requiring prior knowledge graphs. The proposed SCG-based models achieved competitive performance on different representative remote sensing datasets with faster training and lower computational cost compared to strong baseline models.
The third work introduces a new framework, namely the multi-view self-constructing graph (MSCG) network, to extend the vanilla SCG model to be able to capture multi-view context representations with rotation invariance to achieve improved segmentation performance. Meanwhile, a novel adaptive class weighting loss function is developed to alleviate the issue of class imbalance commonly found in EO datasets for semantic segmentation. Experiments on benchmark data demonstrate the proposed framework is computationally efficient and robust to produce improved segmentation results for imbalanced classes.
To address the key challenges in multi-modal land cover mapping of remote sensing data, namely, 'what', 'how' and 'where' to effectively fuse multi-source features and to efficiently learn optimal joint representations of different modalities, the last work presents a compact and scalable multi-modal deep learning framework (MultiModNet) based on two novel modules: the pyramid attention fusion module and the gated fusion unit. The proposed MultiModNet outperforms the strong baselines on two representative remote sensing datasets with fewer parameters and at a lower computational cost. Extensive ablation studies also validate the effectiveness and flexibility of the framework
Text2Seg: Remote Sensing Image Semantic Segmentation via Text-Guided Visual Foundation Models
Recent advancements in foundation models (FMs), such as GPT-4 and LLaMA, have
attracted significant attention due to their exceptional performance in
zero-shot learning scenarios. Similarly, in the field of visual learning,
models like Grounding DINO and the Segment Anything Model (SAM) have exhibited
remarkable progress in open-set detection and instance segmentation tasks. It
is undeniable that these FMs will profoundly impact a wide range of real-world
visual learning tasks, ushering in a new paradigm shift for developing such
models. In this study, we concentrate on the remote sensing domain, where the
images are notably dissimilar from those in conventional scenarios. We
developed a pipeline that leverages multiple FMs to facilitate remote sensing
image semantic segmentation tasks guided by text prompt, which we denote as
Text2Seg. The pipeline is benchmarked on several widely-used remote sensing
datasets, and we present preliminary results to demonstrate its effectiveness.
Through this work, we aim to provide insights into maximizing the applicability
of visual FMs in specific contexts with minimal model tuning. The code is
available at https://github.com/Douglas2Code/Text2Seg.Comment: 10 pages, 6 figure
Learning Aerial Image Segmentation from Online Maps
This study deals with semantic segmentation of high-resolution (aerial)
images where a semantic class label is assigned to each pixel via supervised
classification as a basis for automatic map generation. Recently, deep
convolutional neural networks (CNNs) have shown impressive performance and have
quickly become the de-facto standard for semantic segmentation, with the added
benefit that task-specific feature design is no longer necessary. However, a
major downside of deep learning methods is that they are extremely data-hungry,
thus aggravating the perennial bottleneck of supervised classification, to
obtain enough annotated training data. On the other hand, it has been observed
that they are rather robust against noise in the training labels. This opens up
the intriguing possibility to avoid annotating huge amounts of training data,
and instead train the classifier from existing legacy data or crowd-sourced
maps which can exhibit high levels of noise. The question addressed in this
paper is: can training with large-scale, publicly available labels replace a
substantial part of the manual labeling effort and still achieve sufficient
performance? Such data will inevitably contain a significant portion of errors,
but in return virtually unlimited quantities of it are available in larger
parts of the world. We adapt a state-of-the-art CNN architecture for semantic
segmentation of buildings and roads in aerial images, and compare its
performance when using different training data sets, ranging from manually
labeled, pixel-accurate ground truth of the same city to automatic training
data derived from OpenStreetMap data from distant locations. We report our
results that indicate that satisfying performance can be obtained with
significantly less manual annotation effort, by exploiting noisy large-scale
training data.Comment: Published in IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSIN
- …