1,788 research outputs found
Residual Shuffling Convolutional Neural Networks for Deep Semantic Image Segmentation Using Multi-Modal Data
In this paper, we address the deep semantic segmentation of aerial imagery based on multi-modal data. Given multi-modal data composed of true orthophotos and the corresponding Digital Surface Models (DSMs), we extract a variety of hand-crafted radiometric and geometric features which are provided separately and in different combinations as input to a modern deep learning framework. The latter is represented by a Residual Shuffling Convolutional Neural Network (RSCNN) combining the characteristics of a Residual Network with the advantages of atrous convolution and a shuffling operator to achieve a dense semantic labeling. Via performance evaluation on a benchmark dataset, we analyze the value of different feature sets for the semantic segmentation task. The derived results reveal that the use of radiometric features yields better classification results than the use of geometric features for the considered dataset. Furthermore, the consideration of data on both modalities leads to an improvement of the classification results. However, the derived results also indicate that the use of all defined features is less favorable than the use of selected features. Consequently, data representations derived via feature extraction and feature selection techniques still provide a gain if used as the basis for deep semantic segmentation
Bootstrapped CNNs for Building Segmentation on RGB-D Aerial Imagery
Detection of buildings and other objects from aerial images has various
applications in urban planning and map making. Automated building detection
from aerial imagery is a challenging task, as it is prone to varying lighting
conditions, shadows and occlusions. Convolutional Neural Networks (CNNs) are
robust against some of these variations, although they fail to distinguish easy
and difficult examples. We train a detection algorithm from RGB-D images to
obtain a segmented mask by using the CNN architecture DenseNet.First, we
improve the performance of the model by applying a statistical re-sampling
technique called Bootstrapping and demonstrate that more informative examples
are retained. Second, the proposed method outperforms the non-bootstrapped
version by utilizing only one-sixth of the original training data and it
obtains a precision-recall break-even of 95.10% on our aerial imagery dataset.Comment: Published at ISPRS Annals of the Photogrammetry, Remote Sensing and
Spatial Information Science
Imbalance Knowledge-Driven Multi-modal Network for Land-Cover Semantic Segmentation Using Images and LiDAR Point Clouds
Despite the good results that have been achieved in unimodal segmentation,
the inherent limitations of individual data increase the difficulty of
achieving breakthroughs in performance. For that reason, multi-modal learning
is increasingly being explored within the field of remote sensing. The present
multi-modal methods usually map high-dimensional features to low-dimensional
spaces as a preprocess before feature extraction to address the nonnegligible
domain gap, which inevitably leads to information loss. To address this issue,
in this paper we present our novel Imbalance Knowledge-Driven Multi-modal
Network (IKD-Net) to extract features from raw multi-modal heterogeneous data
directly. IKD-Net is capable of mining imbalance information across modalities
while utilizing a strong modal to drive the feature map refinement of the
weaker ones in the global and categorical perspectives by way of two
sophisticated plug-and-play modules: the Global Knowledge-Guided (GKG) and
Class Knowledge-Guided (CKG) gated modules. The whole network then is optimized
using a holistic loss function. While we were developing IKD-Net, we also
established a new dataset called the National Agriculture Imagery Program and
3D Elevation Program Combined dataset in California (N3C-California), which
provides a particular benchmark for multi-modal joint segmentation tasks. In
our experiments, IKD-Net outperformed the benchmarks and state-of-the-art
methods both in the N3C-California and the small-scale ISPRS Vaihingen dataset.
IKD-Net has been ranked first on the real-time leaderboard for the GRSS DFC
2018 challenge evaluation until this paper's submission
Deep learning in remote sensing: a review
Standing at the paradigm shift towards data-intensive science, machine
learning techniques are becoming increasingly important. In particular, as a
major breakthrough in the field, deep learning has proven as an extremely
powerful tool in many fields. Shall we embrace deep learning as the key to all?
Or, should we resist a 'black-box' solution? There are controversial opinions
in the remote sensing community. In this article, we analyze the challenges of
using deep learning for remote sensing data analysis, review the recent
advances, and provide resources to make deep learning in remote sensing
ridiculously simple to start with. More importantly, we advocate remote sensing
scientists to bring their expertise into deep learning, and use it as an
implicit general model to tackle unprecedented large-scale influential
challenges, such as climate change and urbanization.Comment: Accepted for publication IEEE Geoscience and Remote Sensing Magazin
Advancing Land Cover Mapping in Remote Sensing with Deep Learning
Automatic mapping of land cover in remote sensing data plays an increasingly significant role in several earth observation (EO) applications, such as sustainable development, autonomous agriculture, and urban planning. Due to the complexity of the real ground surface and environment, accurate classification of land cover types is facing many challenges. This thesis provides novel deep learning-based solutions to land cover mapping challenges such as how to deal with intricate objects and imbalanced classes in multi-spectral and high-spatial resolution remote sensing data.
The first work presents a novel model to learn richer multi-scale and global contextual representations in very high-resolution remote sensing images, namely the dense dilated convolutions' merging (DDCM) network. The proposed method is light-weighted, flexible and extendable, so that it can be used as a simple yet effective encoder and decoder module to address different classification and semantic mapping challenges. Intensive experiments on different benchmark remote sensing datasets demonstrate that the proposed method can achieve better performance but consume much fewer computation resources compared with other published methods.
Next, a novel graph model is developed for capturing long-range pixel dependencies in remote sensing images to improve land cover mapping. One key component in the method is the self-constructing graph (SCG) module that can effectively construct global context relations (latent graph structure) without requiring prior knowledge graphs. The proposed SCG-based models achieved competitive performance on different representative remote sensing datasets with faster training and lower computational cost compared to strong baseline models.
The third work introduces a new framework, namely the multi-view self-constructing graph (MSCG) network, to extend the vanilla SCG model to be able to capture multi-view context representations with rotation invariance to achieve improved segmentation performance. Meanwhile, a novel adaptive class weighting loss function is developed to alleviate the issue of class imbalance commonly found in EO datasets for semantic segmentation. Experiments on benchmark data demonstrate the proposed framework is computationally efficient and robust to produce improved segmentation results for imbalanced classes.
To address the key challenges in multi-modal land cover mapping of remote sensing data, namely, 'what', 'how' and 'where' to effectively fuse multi-source features and to efficiently learn optimal joint representations of different modalities, the last work presents a compact and scalable multi-modal deep learning framework (MultiModNet) based on two novel modules: the pyramid attention fusion module and the gated fusion unit. The proposed MultiModNet outperforms the strong baselines on two representative remote sensing datasets with fewer parameters and at a lower computational cost. Extensive ablation studies also validate the effectiveness and flexibility of the framework
SynDrone -- Multi-modal UAV Dataset for Urban Scenarios
The development of computer vision algorithms for Unmanned Aerial Vehicles
(UAVs) imagery heavily relies on the availability of annotated high-resolution
aerial data. However, the scarcity of large-scale real datasets with
pixel-level annotations poses a significant challenge to researchers as the
limited number of images in existing datasets hinders the effectiveness of deep
learning models that require a large amount of training data. In this paper, we
propose a multimodal synthetic dataset containing both images and 3D data taken
at multiple flying heights to address these limitations. In addition to
object-level annotations, the provided data also include pixel-level labeling
in 28 classes, enabling exploration of the potential advantages in tasks like
semantic segmentation. In total, our dataset contains 72k labeled samples that
allow for effective training of deep architectures showing promising results in
synthetic-to-real adaptation. The dataset will be made publicly available to
support the development of novel computer vision methods targeting UAV
applications.Comment: Accepted at ICCV Workshops, downloadable dataset with CC-BY license,
8 pages, 4 figures, 8 table
- …