281 research outputs found
Advancing Land Cover Mapping in Remote Sensing with Deep Learning
Automatic mapping of land cover in remote sensing data plays an increasingly significant role in several earth observation (EO) applications, such as sustainable development, autonomous agriculture, and urban planning. Due to the complexity of the real ground surface and environment, accurate classification of land cover types is facing many challenges. This thesis provides novel deep learning-based solutions to land cover mapping challenges such as how to deal with intricate objects and imbalanced classes in multi-spectral and high-spatial resolution remote sensing data.
The first work presents a novel model to learn richer multi-scale and global contextual representations in very high-resolution remote sensing images, namely the dense dilated convolutions' merging (DDCM) network. The proposed method is light-weighted, flexible and extendable, so that it can be used as a simple yet effective encoder and decoder module to address different classification and semantic mapping challenges. Intensive experiments on different benchmark remote sensing datasets demonstrate that the proposed method can achieve better performance but consume much fewer computation resources compared with other published methods.
Next, a novel graph model is developed for capturing long-range pixel dependencies in remote sensing images to improve land cover mapping. One key component in the method is the self-constructing graph (SCG) module that can effectively construct global context relations (latent graph structure) without requiring prior knowledge graphs. The proposed SCG-based models achieved competitive performance on different representative remote sensing datasets with faster training and lower computational cost compared to strong baseline models.
The third work introduces a new framework, namely the multi-view self-constructing graph (MSCG) network, to extend the vanilla SCG model to be able to capture multi-view context representations with rotation invariance to achieve improved segmentation performance. Meanwhile, a novel adaptive class weighting loss function is developed to alleviate the issue of class imbalance commonly found in EO datasets for semantic segmentation. Experiments on benchmark data demonstrate the proposed framework is computationally efficient and robust to produce improved segmentation results for imbalanced classes.
To address the key challenges in multi-modal land cover mapping of remote sensing data, namely, 'what', 'how' and 'where' to effectively fuse multi-source features and to efficiently learn optimal joint representations of different modalities, the last work presents a compact and scalable multi-modal deep learning framework (MultiModNet) based on two novel modules: the pyramid attention fusion module and the gated fusion unit. The proposed MultiModNet outperforms the strong baselines on two representative remote sensing datasets with fewer parameters and at a lower computational cost. Extensive ablation studies also validate the effectiveness and flexibility of the framework
A review of technical factors to consider when designing neural networks for semantic segmentation of Earth Observation imagery
Semantic segmentation (classification) of Earth Observation imagery is a
crucial task in remote sensing. This paper presents a comprehensive review of
technical factors to consider when designing neural networks for this purpose.
The review focuses on Convolutional Neural Networks (CNNs), Recurrent Neural
Networks (RNNs), Generative Adversarial Networks (GANs), and transformer
models, discussing prominent design patterns for these ANN families and their
implications for semantic segmentation. Common pre-processing techniques for
ensuring optimal data preparation are also covered. These include methods for
image normalization and chipping, as well as strategies for addressing data
imbalance in training samples, and techniques for overcoming limited data,
including augmentation techniques, transfer learning, and domain adaptation. By
encompassing both the technical aspects of neural network design and the
data-related considerations, this review provides researchers and practitioners
with a comprehensive and up-to-date understanding of the factors involved in
designing effective neural networks for semantic segmentation of Earth
Observation imagery.Comment: 145 pages with 32 figure
Imbalance Knowledge-Driven Multi-modal Network for Land-Cover Semantic Segmentation Using Images and LiDAR Point Clouds
Despite the good results that have been achieved in unimodal segmentation,
the inherent limitations of individual data increase the difficulty of
achieving breakthroughs in performance. For that reason, multi-modal learning
is increasingly being explored within the field of remote sensing. The present
multi-modal methods usually map high-dimensional features to low-dimensional
spaces as a preprocess before feature extraction to address the nonnegligible
domain gap, which inevitably leads to information loss. To address this issue,
in this paper we present our novel Imbalance Knowledge-Driven Multi-modal
Network (IKD-Net) to extract features from raw multi-modal heterogeneous data
directly. IKD-Net is capable of mining imbalance information across modalities
while utilizing a strong modal to drive the feature map refinement of the
weaker ones in the global and categorical perspectives by way of two
sophisticated plug-and-play modules: the Global Knowledge-Guided (GKG) and
Class Knowledge-Guided (CKG) gated modules. The whole network then is optimized
using a holistic loss function. While we were developing IKD-Net, we also
established a new dataset called the National Agriculture Imagery Program and
3D Elevation Program Combined dataset in California (N3C-California), which
provides a particular benchmark for multi-modal joint segmentation tasks. In
our experiments, IKD-Net outperformed the benchmarks and state-of-the-art
methods both in the N3C-California and the small-scale ISPRS Vaihingen dataset.
IKD-Net has been ranked first on the real-time leaderboard for the GRSS DFC
2018 challenge evaluation until this paper's submission
Dense Dilated Convolutions Merging Network for Land Cover Classification
Land cover classification of remote sensing images is a challenging task due
to limited amounts of annotated data, highly imbalanced classes, frequent
incorrect pixel-level annotations, and an inherent complexity in the semantic
segmentation task. In this article, we propose a novel architecture called the
dense dilated convolutions' merging network (DDCM-Net) to address this task.
The proposed DDCM-Net consists of dense dilated image convolutions merged with
varying dilation rates. This effectively utilizes rich combinations of dilated
convolutions that enlarge the network's receptive fields with fewer parameters
and features compared with the state-of-the-art approaches in the remote
sensing domain. Importantly, DDCM-Net obtains fused local- and global-context
information, in effect incorporating surrounding discriminative capability for
multiscale and complex-shaped objects with similar color and textures in very
high-resolution aerial imagery. We demonstrate the effectiveness, robustness,
and flexibility of the proposed DDCM-Net on the publicly available ISPRS
Potsdam and Vaihingen data sets, as well as the DeepGlobe land cover data set.
Our single model, trained on three-band Potsdam and Vaihingen data sets,
achieves better accuracy in terms of both mean intersection over union (mIoU)
and F1-score compared with other published models trained with more than
three-band data. We further validate our model on the DeepGlobe data set,
achieving state-of-the-art result 56.2% mIoU with much fewer parameters and at
a lower computational cost compared with related recent work. Code available at
https://github.com/samleoqh/DDCM-Semantic-Segmentation-PyTorchComment: Semantic Segmentation, 12 pages, TGRS-2020 early access in IEEE
Transactions on Geoscience and Remote Sensing. 2020, Code available at
https://github.com/samleoqh/DDCM-Semantic-Segmentation-PyTorc
Deep learning in remote sensing: a review
Standing at the paradigm shift towards data-intensive science, machine
learning techniques are becoming increasingly important. In particular, as a
major breakthrough in the field, deep learning has proven as an extremely
powerful tool in many fields. Shall we embrace deep learning as the key to all?
Or, should we resist a 'black-box' solution? There are controversial opinions
in the remote sensing community. In this article, we analyze the challenges of
using deep learning for remote sensing data analysis, review the recent
advances, and provide resources to make deep learning in remote sensing
ridiculously simple to start with. More importantly, we advocate remote sensing
scientists to bring their expertise into deep learning, and use it as an
implicit general model to tackle unprecedented large-scale influential
challenges, such as climate change and urbanization.Comment: Accepted for publication IEEE Geoscience and Remote Sensing Magazin
Object Detection in 20 Years: A Survey
Object detection, as of one the most fundamental and challenging problems in
computer vision, has received great attention in recent years. Its development
in the past two decades can be regarded as an epitome of computer vision
history. If we think of today's object detection as a technical aesthetics
under the power of deep learning, then turning back the clock 20 years we would
witness the wisdom of cold weapon era. This paper extensively reviews 400+
papers of object detection in the light of its technical evolution, spanning
over a quarter-century's time (from the 1990s to 2019). A number of topics have
been covered in this paper, including the milestone detectors in history,
detection datasets, metrics, fundamental building blocks of the detection
system, speed up techniques, and the recent state of the art detection methods.
This paper also reviews some important detection applications, such as
pedestrian detection, face detection, text detection, etc, and makes an in-deep
analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible
publicatio
Efficient Semantic Segmentation on Edge Devices
Semantic segmentation works on the computer vision algorithm for assigning
each pixel of an image into a class. The task of semantic segmentation should
be performed with both accuracy and efficiency. Most of the existing deep FCNs
yield to heavy computations and these networks are very power hungry,
unsuitable for real-time applications on portable devices. This project
analyzes current semantic segmentation models to explore the feasibility of
applying these models for emergency response during catastrophic events. We
compare the performance of real-time semantic segmentation models with
non-real-time counterparts constrained by aerial images under oppositional
settings. Furthermore, we train several models on the Flood-Net dataset,
containing UAV images captured after Hurricane Harvey, and benchmark their
execution on special classes such as flooded buildings vs. non-flooded
buildings or flooded roads vs. non-flooded roads. In this project, we developed
a real-time UNet based model and deployed that network on Jetson AGX Xavier
module
Multi-Modal Earth Observation and Deep Learning for Urban Scene Understanding
This research explores the nuances of semantic segmentation in remote sensing data through deep learning, with a focus on multi-modal data integration, the impact of label noise, and the need for diverse datasets in Earth Observation (EO). It introduces a novel model named TransFusion, designed to integrate 2D images and 3D point clouds directly, avoiding the complexities common with traditional fusion methods used in the conext of semantic segmentation. This approach led to improvements in segmentation accuracy, demonstrated by higher mean Intersection over Union (mIoU) scores for the Vaihingen and Potsdam datasets. This indicates the model's capability to better interpret spatial and structural information from multi-modal data.The study also investigates the effects of label noise—incorrect annotations in training data, a prevalent issue in remote sensing. Through experiments involving high-resolution aerial images with intentionally inaccurate labels, it was discovered that label noise influences model performance differently across various object classes, with the size of an object significantly affecting the model's ability to handle errors. The research highlights that models are somewhat resilient to random noise, although accuracy decreases even with a small proportion of incorrect labels.Addressing the challenge of geographic bias in urban semantic segmentation datasets, primarily focused on Europe and North America, the research introduces the UAVPal dataset from Bhopal, India. This effort, along with the development of a new dense predictor head for semantic segmentation, aims to better represent the diverse urban landscapes globally. The new segmentation head, which efficiently leverages multi-scale features and notably reduces computational demands, showed improved mIoU scores across various classes and datasets.Overall, the study contributes to the field of semantic segmentation for EO by improving data fusion methods, offering insights into the effects of label noise, and encouraging the inclusion of diverse geographic data for broader representation. These efforts are steps toward more accurate and efficient remote sensing applications
SuperYOLO: Super Resolution Assisted Object Detection in Multimodal Remote Sensing Imagery
Accurately and timely detecting multiscale small objects that contain tens of
pixels from remote sensing images (RSI) remains challenging. Most of the
existing solutions primarily design complex deep neural networks to learn
strong feature representations for objects separated from the background, which
often results in a heavy computation burden. In this article, we propose an
accurate yet fast object detection method for RSI, named SuperYOLO, which fuses
multimodal data and performs high-resolution (HR) object detection on
multiscale objects by utilizing the assisted super resolution (SR) learning and
considering both the detection accuracy and computation cost. First, we utilize
a symmetric compact multimodal fusion (MF) to extract supplementary information
from various data for improving small object detection in RSI. Furthermore, we
design a simple and flexible SR branch to learn HR feature representations that
can discriminate small objects from vast backgrounds with low-resolution (LR)
input, thus further improving the detection accuracy. Moreover, to avoid
introducing additional computation, the SR branch is discarded in the inference
stage, and the computation of the network model is reduced due to the LR input.
Experimental results show that, on the widely used VEDAI RS dataset, SuperYOLO
achieves an accuracy of 75.09% (in terms of mAP50 ), which is more than 10%
higher than the SOTA large models, such as YOLOv5l, YOLOv5x, and RS designed
YOLOrs. Meanwhile, the parameter size and GFLOPs of SuperYOLO are about 18
times and 3.8 times less than YOLOv5x. Our proposed model shows a favorable
accuracy and speed tradeoff compared to the state-of-the-art models. The code
will be open-sourced at https://github.com/icey-zhang/SuperYOLO.Comment: The article is accepted by IEEE Transactions on Geoscience and Remote
Sensin
A Comprehensive Review on Computer Vision Analysis of Aerial Data
With the emergence of new technologies in the field of airborne platforms and
imaging sensors, aerial data analysis is becoming very popular, capitalizing on
its advantages over land data. This paper presents a comprehensive review of
the computer vision tasks within the domain of aerial data analysis. While
addressing fundamental aspects such as object detection and tracking, the
primary focus is on pivotal tasks like change detection, object segmentation,
and scene-level analysis. The paper provides the comparison of various hyper
parameters employed across diverse architectures and tasks. A substantial
section is dedicated to an in-depth discussion on libraries, their
categorization, and their relevance to different domain expertise. The paper
encompasses aerial datasets, the architectural nuances adopted, and the
evaluation metrics associated with all the tasks in aerial data analysis.
Applications of computer vision tasks in aerial data across different domains
are explored, with case studies providing further insights. The paper
thoroughly examines the challenges inherent in aerial data analysis, offering
practical solutions. Additionally, unresolved issues of significance are
identified, paving the way for future research directions in the field of
aerial data analysis.Comment: 112 page
- …