1,084 research outputs found
Learning Aerial Image Segmentation from Online Maps
This study deals with semantic segmentation of high-resolution (aerial)
images where a semantic class label is assigned to each pixel via supervised
classification as a basis for automatic map generation. Recently, deep
convolutional neural networks (CNNs) have shown impressive performance and have
quickly become the de-facto standard for semantic segmentation, with the added
benefit that task-specific feature design is no longer necessary. However, a
major downside of deep learning methods is that they are extremely data-hungry,
thus aggravating the perennial bottleneck of supervised classification, to
obtain enough annotated training data. On the other hand, it has been observed
that they are rather robust against noise in the training labels. This opens up
the intriguing possibility to avoid annotating huge amounts of training data,
and instead train the classifier from existing legacy data or crowd-sourced
maps which can exhibit high levels of noise. The question addressed in this
paper is: can training with large-scale, publicly available labels replace a
substantial part of the manual labeling effort and still achieve sufficient
performance? Such data will inevitably contain a significant portion of errors,
but in return virtually unlimited quantities of it are available in larger
parts of the world. We adapt a state-of-the-art CNN architecture for semantic
segmentation of buildings and roads in aerial images, and compare its
performance when using different training data sets, ranging from manually
labeled, pixel-accurate ground truth of the same city to automatic training
data derived from OpenStreetMap data from distant locations. We report our
results that indicate that satisfying performance can be obtained with
significantly less manual annotation effort, by exploiting noisy large-scale
training data.Comment: Published in IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSIN
Learning to Simulate Realistic LiDARs
Simulating realistic sensors is a challenging part in data generation for
autonomous systems, often involving carefully handcrafted sensor design, scene
properties, and physics modeling. To alleviate this, we introduce a pipeline
for data-driven simulation of a realistic LiDAR sensor. We propose a model that
learns a mapping between RGB images and corresponding LiDAR features such as
raydrop or per-point intensities directly from real datasets. We show that our
model can learn to encode realistic effects such as dropped points on
transparent surfaces or high intensity returns on reflective materials. When
applied to naively raycasted point clouds provided by off-the-shelf simulator
software, our model enhances the data by predicting intensities and removing
points based on the scene's appearance to match a real LiDAR sensor. We use our
technique to learn models of two distinct LiDAR sensors and use them to improve
simulated LiDAR data accordingly. Through a sample task of vehicle
segmentation, we show that enhancing simulated point clouds with our technique
improves downstream task performance.Comment: IROS2022 pape
A review of technical factors to consider when designing neural networks for semantic segmentation of Earth Observation imagery
Semantic segmentation (classification) of Earth Observation imagery is a
crucial task in remote sensing. This paper presents a comprehensive review of
technical factors to consider when designing neural networks for this purpose.
The review focuses on Convolutional Neural Networks (CNNs), Recurrent Neural
Networks (RNNs), Generative Adversarial Networks (GANs), and transformer
models, discussing prominent design patterns for these ANN families and their
implications for semantic segmentation. Common pre-processing techniques for
ensuring optimal data preparation are also covered. These include methods for
image normalization and chipping, as well as strategies for addressing data
imbalance in training samples, and techniques for overcoming limited data,
including augmentation techniques, transfer learning, and domain adaptation. By
encompassing both the technical aspects of neural network design and the
data-related considerations, this review provides researchers and practitioners
with a comprehensive and up-to-date understanding of the factors involved in
designing effective neural networks for semantic segmentation of Earth
Observation imagery.Comment: 145 pages with 32 figure
Assessing thermal imagery integration into object detection methods on ground-based and air-based collection platforms
Object detection models commonly deployed on uncrewed aerial systems (UAS)
focus on identifying objects in the visible spectrum using Red-Green-Blue (RGB)
imagery. However, there is growing interest in fusing RGB with thermal long
wave infrared (LWIR) images to increase the performance of object detection
machine learning (ML) models. Currently LWIR ML models have received less
research attention, especially for both ground- and air-based platforms,
leading to a lack of baseline performance metrics evaluating LWIR, RGB and
LWIR-RGB fused object detection models. Therefore, this research contributes
such quantitative metrics to the literature. The results found that the
ground-based blended RGB-LWIR model exhibited superior performance compared to
the RGB or LWIR approaches, achieving a mAP of 98.4%. Additionally, the blended
RGB-LWIR model was also the only object detection model to work in both day and
night conditions, providing superior operational capabilities. This research
additionally contributes a novel labelled training dataset of 12,600 images for
RGB, LWIR, and RGB-LWIR fused imagery, collected from ground-based and
air-based platforms, enabling further multispectral machine-driven object
detection research.Comment: 18 pages, 12 figures, 2 table
Injecting spatial priors in Earth observation with machine vision
Remote Sensing (RS) imagery with submeter resolution is becoming ubiquitous. Be it from satellites, aerial campaigns or Unmanned Aerial Vehicles, this spatial resolution allows to recognize individual objects and their parts from above. This has driven, during the last few years, a big interest in the RS community on Computer Vision (CV) methods developed for the automated understanding of natural images. A central element to the success of \CV is the use of prior information about the image generation process and the objects these images contain: neighboring pixels are likely to belong to the same object; objects of the same nature tend to look similar with independence of their location in the image; certain objects tend to occur in particular geometric configurations; etc. When using RS imagery, additional prior knowledge exists on how the images were formed, since we know roughly the geographical location of the objects, the geospatial prior, and the direction they were observed from, the overhead-view prior. This thesis explores ways of encoding these priors in CV models to improve their performance on RS imagery, with a focus on land-cover and land-use mapping.</p
- …