892 research outputs found
Urban Land Cover Classification with Missing Data Modalities Using Deep Convolutional Neural Networks
Automatic urban land cover classification is a fundamental problem in remote
sensing, e.g. for environmental monitoring. The problem is highly challenging,
as classes generally have high inter-class and low intra-class variance.
Techniques to improve urban land cover classification performance in remote
sensing include fusion of data from different sensors with different data
modalities. However, such techniques require all modalities to be available to
the classifier in the decision-making process, i.e. at test time, as well as in
training. If a data modality is missing at test time, current state-of-the-art
approaches have in general no procedure available for exploiting information
from these modalities. This represents a waste of potentially useful
information. We propose as a remedy a convolutional neural network (CNN)
architecture for urban land cover classification which is able to embed all
available training modalities in a so-called hallucination network. The network
will in effect replace missing data modalities in the test phase, enabling
fusion capabilities even when data modalities are missing in testing. We
demonstrate the method using two datasets consisting of optical and digital
surface model (DSM) images. We simulate missing modalities by assuming that DSM
images are missing during testing. Our method outperforms both standard CNNs
trained only on optical images as well as an ensemble of two standard CNNs. We
further evaluate the potential of our method to handle situations where only
some DSM images are missing during testing. Overall, we show that we can
clearly exploit training time information of the missing modality during
testing
Advancing Land Cover Mapping in Remote Sensing with Deep Learning
Automatic mapping of land cover in remote sensing data plays an increasingly significant role in several earth observation (EO) applications, such as sustainable development, autonomous agriculture, and urban planning. Due to the complexity of the real ground surface and environment, accurate classification of land cover types is facing many challenges. This thesis provides novel deep learning-based solutions to land cover mapping challenges such as how to deal with intricate objects and imbalanced classes in multi-spectral and high-spatial resolution remote sensing data.
The first work presents a novel model to learn richer multi-scale and global contextual representations in very high-resolution remote sensing images, namely the dense dilated convolutions' merging (DDCM) network. The proposed method is light-weighted, flexible and extendable, so that it can be used as a simple yet effective encoder and decoder module to address different classification and semantic mapping challenges. Intensive experiments on different benchmark remote sensing datasets demonstrate that the proposed method can achieve better performance but consume much fewer computation resources compared with other published methods.
Next, a novel graph model is developed for capturing long-range pixel dependencies in remote sensing images to improve land cover mapping. One key component in the method is the self-constructing graph (SCG) module that can effectively construct global context relations (latent graph structure) without requiring prior knowledge graphs. The proposed SCG-based models achieved competitive performance on different representative remote sensing datasets with faster training and lower computational cost compared to strong baseline models.
The third work introduces a new framework, namely the multi-view self-constructing graph (MSCG) network, to extend the vanilla SCG model to be able to capture multi-view context representations with rotation invariance to achieve improved segmentation performance. Meanwhile, a novel adaptive class weighting loss function is developed to alleviate the issue of class imbalance commonly found in EO datasets for semantic segmentation. Experiments on benchmark data demonstrate the proposed framework is computationally efficient and robust to produce improved segmentation results for imbalanced classes.
To address the key challenges in multi-modal land cover mapping of remote sensing data, namely, 'what', 'how' and 'where' to effectively fuse multi-source features and to efficiently learn optimal joint representations of different modalities, the last work presents a compact and scalable multi-modal deep learning framework (MultiModNet) based on two novel modules: the pyramid attention fusion module and the gated fusion unit. The proposed MultiModNet outperforms the strong baselines on two representative remote sensing datasets with fewer parameters and at a lower computational cost. Extensive ablation studies also validate the effectiveness and flexibility of the framework
Overcoming Missing and Incomplete Modalities with Generative Adversarial Networks for Building Footprint Segmentation
The integration of information acquired with different modalities, spatial
resolution and spectral bands has shown to improve predictive accuracies. Data
fusion is therefore one of the key challenges in remote sensing. Most prior
work focusing on multi-modal fusion, assumes that modalities are always
available during inference. This assumption limits the applications of
multi-modal models since in practice the data collection process is likely to
generate data with missing, incomplete or corrupted modalities. In this paper,
we show that Generative Adversarial Networks can be effectively used to
overcome the problems that arise when modalities are missing or incomplete.
Focusing on semantic segmentation of building footprints with missing
modalities, our approach achieves an improvement of about 2% on the
Intersection over Union (IoU) against the same network that relies only on the
available modality
Dense Dilated Convolutions Merging Network for Land Cover Classification
Land cover classification of remote sensing images is a challenging task due
to limited amounts of annotated data, highly imbalanced classes, frequent
incorrect pixel-level annotations, and an inherent complexity in the semantic
segmentation task. In this article, we propose a novel architecture called the
dense dilated convolutions' merging network (DDCM-Net) to address this task.
The proposed DDCM-Net consists of dense dilated image convolutions merged with
varying dilation rates. This effectively utilizes rich combinations of dilated
convolutions that enlarge the network's receptive fields with fewer parameters
and features compared with the state-of-the-art approaches in the remote
sensing domain. Importantly, DDCM-Net obtains fused local- and global-context
information, in effect incorporating surrounding discriminative capability for
multiscale and complex-shaped objects with similar color and textures in very
high-resolution aerial imagery. We demonstrate the effectiveness, robustness,
and flexibility of the proposed DDCM-Net on the publicly available ISPRS
Potsdam and Vaihingen data sets, as well as the DeepGlobe land cover data set.
Our single model, trained on three-band Potsdam and Vaihingen data sets,
achieves better accuracy in terms of both mean intersection over union (mIoU)
and F1-score compared with other published models trained with more than
three-band data. We further validate our model on the DeepGlobe data set,
achieving state-of-the-art result 56.2% mIoU with much fewer parameters and at
a lower computational cost compared with related recent work. Code available at
https://github.com/samleoqh/DDCM-Semantic-Segmentation-PyTorchComment: Semantic Segmentation, 12 pages, TGRS-2020 early access in IEEE
Transactions on Geoscience and Remote Sensing. 2020, Code available at
https://github.com/samleoqh/DDCM-Semantic-Segmentation-PyTorc
More Diverse Means Better: Multimodal Deep Learning Meets Remote Sensing Imagery Classification
Classification and identification of the materials lying over or beneath the
Earth's surface have long been a fundamental but challenging research topic in
geoscience and remote sensing (RS) and have garnered a growing concern owing to
the recent advancements of deep learning techniques. Although deep networks
have been successfully applied in single-modality-dominated classification
tasks, yet their performance inevitably meets the bottleneck in complex scenes
that need to be finely classified, due to the limitation of information
diversity. In this work, we provide a baseline solution to the aforementioned
difficulty by developing a general multimodal deep learning (MDL) framework. In
particular, we also investigate a special case of multi-modality learning (MML)
-- cross-modality learning (CML) that exists widely in RS image classification
applications. By focusing on "what", "where", and "how" to fuse, we show
different fusion strategies as well as how to train deep networks and build the
network architecture. Specifically, five fusion architectures are introduced
and developed, further being unified in our MDL framework. More significantly,
our framework is not only limited to pixel-wise classification tasks but also
applicable to spatial information modeling with convolutional neural networks
(CNNs). To validate the effectiveness and superiority of the MDL framework,
extensive experiments related to the settings of MML and CML are conducted on
two different multimodal RS datasets. Furthermore, the codes and datasets will
be available at https://github.com/danfenghong/IEEE_TGRS_MDL-RS, contributing
to the RS community
Tile2Vec: Unsupervised representation learning for spatially distributed data
Geospatial analysis lacks methods like the word vector representations and
pre-trained networks that significantly boost performance across a wide range
of natural language and computer vision tasks. To fill this gap, we introduce
Tile2Vec, an unsupervised representation learning algorithm that extends the
distributional hypothesis from natural language -- words appearing in similar
contexts tend to have similar meanings -- to spatially distributed data. We
demonstrate empirically that Tile2Vec learns semantically meaningful
representations on three datasets. Our learned representations significantly
improve performance in downstream classification tasks and, similar to word
vectors, visual analogies can be obtained via simple arithmetic in the latent
space.Comment: 8 pages, 4 figures in main text; 9 pages, 11 figures in appendi
A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community
In recent years, deep learning (DL), a re-branding of neural networks (NNs),
has risen to the top in numerous areas, namely computer vision (CV), speech
recognition, natural language processing, etc. Whereas remote sensing (RS)
possesses a number of unique challenges, primarily related to sensors and
applications, inevitably RS draws from many of the same theories as CV; e.g.,
statistics, fusion, and machine learning, to name a few. This means that the RS
community should be aware of, if not at the leading edge of, of advancements
like DL. Herein, we provide the most comprehensive survey of state-of-the-art
RS DL research. We also review recent new developments in the DL field that can
be used in DL for RS. Namely, we focus on theories, tools and challenges for
the RS community. Specifically, we focus on unsolved challenges and
opportunities as it relates to (i) inadequate data sets, (ii)
human-understandable solutions for modelling physical phenomena, (iii) Big
Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and
learning algorithms for spectral, spatial and temporal data, (vi) transfer
learning, (vii) an improved theoretical understanding of DL systems, (viii)
high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote
Sensin
Imbalance Knowledge-Driven Multi-modal Network for Land-Cover Semantic Segmentation Using Images and LiDAR Point Clouds
Despite the good results that have been achieved in unimodal segmentation,
the inherent limitations of individual data increase the difficulty of
achieving breakthroughs in performance. For that reason, multi-modal learning
is increasingly being explored within the field of remote sensing. The present
multi-modal methods usually map high-dimensional features to low-dimensional
spaces as a preprocess before feature extraction to address the nonnegligible
domain gap, which inevitably leads to information loss. To address this issue,
in this paper we present our novel Imbalance Knowledge-Driven Multi-modal
Network (IKD-Net) to extract features from raw multi-modal heterogeneous data
directly. IKD-Net is capable of mining imbalance information across modalities
while utilizing a strong modal to drive the feature map refinement of the
weaker ones in the global and categorical perspectives by way of two
sophisticated plug-and-play modules: the Global Knowledge-Guided (GKG) and
Class Knowledge-Guided (CKG) gated modules. The whole network then is optimized
using a holistic loss function. While we were developing IKD-Net, we also
established a new dataset called the National Agriculture Imagery Program and
3D Elevation Program Combined dataset in California (N3C-California), which
provides a particular benchmark for multi-modal joint segmentation tasks. In
our experiments, IKD-Net outperformed the benchmarks and state-of-the-art
methods both in the N3C-California and the small-scale ISPRS Vaihingen dataset.
IKD-Net has been ranked first on the real-time leaderboard for the GRSS DFC
2018 challenge evaluation until this paper's submission
- …