68 research outputs found
Land cover mapping at very high resolution with rotation equivariant CNNs: towards small yet accurate models
In remote sensing images, the absolute orientation of objects is arbitrary.
Depending on an object's orientation and on a sensor's flight path, objects of
the same semantic class can be observed in different orientations in the same
image. Equivariance to rotation, in this context understood as responding with
a rotated semantic label map when subject to a rotation of the input image, is
therefore a very desirable feature, in particular for high capacity models,
such as Convolutional Neural Networks (CNNs). If rotation equivariance is
encoded in the network, the model is confronted with a simpler task and does
not need to learn specific (and redundant) weights to address rotated versions
of the same object class. In this work we propose a CNN architecture called
Rotation Equivariant Vector Field Network (RotEqNet) to encode rotation
equivariance in the network itself. By using rotating convolutions as building
blocks and passing only the the values corresponding to the maximally
activating orientation throughout the network in the form of orientation
encoding vector fields, RotEqNet treats rotated versions of the same object
with the same filter bank and therefore achieves state-of-the-art performances
even when using very small architectures trained from scratch. We test RotEqNet
in two challenging sub-decimeter resolution semantic labeling problems, and
show that we can perform better than a standard CNN while requiring one order
of magnitude less parameters
A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community
In recent years, deep learning (DL), a re-branding of neural networks (NNs),
has risen to the top in numerous areas, namely computer vision (CV), speech
recognition, natural language processing, etc. Whereas remote sensing (RS)
possesses a number of unique challenges, primarily related to sensors and
applications, inevitably RS draws from many of the same theories as CV; e.g.,
statistics, fusion, and machine learning, to name a few. This means that the RS
community should be aware of, if not at the leading edge of, of advancements
like DL. Herein, we provide the most comprehensive survey of state-of-the-art
RS DL research. We also review recent new developments in the DL field that can
be used in DL for RS. Namely, we focus on theories, tools and challenges for
the RS community. Specifically, we focus on unsolved challenges and
opportunities as it relates to (i) inadequate data sets, (ii)
human-understandable solutions for modelling physical phenomena, (iii) Big
Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and
learning algorithms for spectral, spatial and temporal data, (vi) transfer
learning, (vii) an improved theoretical understanding of DL systems, (viii)
high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote
Sensin
BARNet:Boundary-Aware Refined Network for Automatic Building Extraction in Very High-Resolution Urban Aerial Images
The convolutional neural networks (CNNs), such as U-Net, have shown competitive performance in automatic extraction of buildings from very high-resolution (VHR) remotely sensed imagery. However, due to the unstable multi-scale context aggregation, the insufficient combination of multi-level features, and the lack of consideration about semantic boundary, most existing CNNs produce incomplete segmentation for large-scale buildings and result in predictions with huge uncertainty at building boundaries. This paper presents a novel network embedded a special boundary-aware loss, called Boundary-aware Refined Network (BARNet), to address the gap above. The unique property of BARNet is the gated-attention refined fusion unit (GARFU), the denser atrous spatial pyramid pooling (DASPP) module, and the boundary-aware (BA) loss. The performance of BARNet is tested on two popular benchmark datasets that include various urban scenes and diverse patterns of buildings. Experimental results demonstrate that the proposed method outperforms several state-of-the-art (SOTA) benchmark approaches in both visual interpretation and quantitative evaluations
Learning Aerial Image Segmentation from Online Maps
This study deals with semantic segmentation of high-resolution (aerial)
images where a semantic class label is assigned to each pixel via supervised
classification as a basis for automatic map generation. Recently, deep
convolutional neural networks (CNNs) have shown impressive performance and have
quickly become the de-facto standard for semantic segmentation, with the added
benefit that task-specific feature design is no longer necessary. However, a
major downside of deep learning methods is that they are extremely data-hungry,
thus aggravating the perennial bottleneck of supervised classification, to
obtain enough annotated training data. On the other hand, it has been observed
that they are rather robust against noise in the training labels. This opens up
the intriguing possibility to avoid annotating huge amounts of training data,
and instead train the classifier from existing legacy data or crowd-sourced
maps which can exhibit high levels of noise. The question addressed in this
paper is: can training with large-scale, publicly available labels replace a
substantial part of the manual labeling effort and still achieve sufficient
performance? Such data will inevitably contain a significant portion of errors,
but in return virtually unlimited quantities of it are available in larger
parts of the world. We adapt a state-of-the-art CNN architecture for semantic
segmentation of buildings and roads in aerial images, and compare its
performance when using different training data sets, ranging from manually
labeled, pixel-accurate ground truth of the same city to automatic training
data derived from OpenStreetMap data from distant locations. We report our
results that indicate that satisfying performance can be obtained with
significantly less manual annotation effort, by exploiting noisy large-scale
training data.Comment: Published in IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSIN
A review of technical factors to consider when designing neural networks for semantic segmentation of Earth Observation imagery
Semantic segmentation (classification) of Earth Observation imagery is a
crucial task in remote sensing. This paper presents a comprehensive review of
technical factors to consider when designing neural networks for this purpose.
The review focuses on Convolutional Neural Networks (CNNs), Recurrent Neural
Networks (RNNs), Generative Adversarial Networks (GANs), and transformer
models, discussing prominent design patterns for these ANN families and their
implications for semantic segmentation. Common pre-processing techniques for
ensuring optimal data preparation are also covered. These include methods for
image normalization and chipping, as well as strategies for addressing data
imbalance in training samples, and techniques for overcoming limited data,
including augmentation techniques, transfer learning, and domain adaptation. By
encompassing both the technical aspects of neural network design and the
data-related considerations, this review provides researchers and practitioners
with a comprehensive and up-to-date understanding of the factors involved in
designing effective neural networks for semantic segmentation of Earth
Observation imagery.Comment: 145 pages with 32 figure
EAGLE: Large-scale Vehicle Detection Dataset in Real-World Scenarios using Aerial Imagery
Multi-class vehicle detection from airborne imagery with orientation
estimation is an important task in the near and remote vision domains with
applications in traffic monitoring and disaster management. In the last decade,
we have witnessed significant progress in object detection in ground imagery,
but it is still in its infancy in airborne imagery, mostly due to the scarcity
of diverse and large-scale datasets. Despite being a useful tool for different
applications, current airborne datasets only partially reflect the challenges
of real-world scenarios. To address this issue, we introduce EAGLE (oriEnted
vehicle detection using Aerial imaGery in real-worLd scEnarios), a large-scale
dataset for multi-class vehicle detection with object orientation information
in aerial imagery. It features high-resolution aerial images composed of
different real-world situations with a wide variety of camera sensor,
resolution, flight altitude, weather, illumination, haze, shadow, time, city,
country, occlusion, and camera angle. The annotation was done by airborne
imagery experts with small- and large-vehicle classes. EAGLE contains 215,986
instances annotated with oriented bounding boxes defined by four points and
orientation, making it by far the largest dataset to date in this task. It also
supports researches on the haze and shadow removal as well as super-resolution
and in-painting applications. We define three tasks: detection by (1)
horizontal bounding boxes, (2) rotated bounding boxes, and (3) oriented
bounding boxes. We carried out several experiments to evaluate several
state-of-the-art methods in object detection on our dataset to form a baseline.
Experiments show that the EAGLE dataset accurately reflects real-world
situations and correspondingly challenging applications.Comment: Accepted in ICPR 202
- …