2,812 research outputs found
Domain Adaptive Transfer Attack (DATA)-based Segmentation Networks for Building Extraction from Aerial Images
Semantic segmentation models based on convolutional neural networks (CNNs)
have gained much attention in relation to remote sensing and have achieved
remarkable performance for the extraction of buildings from high-resolution
aerial images. However, the issue of limited generalization for unseen images
remains. When there is a domain gap between the training and test datasets,
CNN-based segmentation models trained by a training dataset fail to segment
buildings for the test dataset. In this paper, we propose segmentation networks
based on a domain adaptive transfer attack (DATA) scheme for building
extraction from aerial images. The proposed system combines the domain transfer
and adversarial attack concepts. Based on the DATA scheme, the distribution of
the input images can be shifted to that of the target images while turning
images into adversarial examples against a target network. Defending
adversarial examples adapted to the target domain can overcome the performance
degradation due to the domain gap and increase the robustness of the
segmentation model. Cross-dataset experiments and the ablation study are
conducted for the three different datasets: the Inria aerial image labeling
dataset, the Massachusetts building dataset, and the WHU East Asia dataset.
Compared to the performance of the segmentation network without the DATA
scheme, the proposed method shows improvements in the overall IoU. Moreover, it
is verified that the proposed method outperforms even when compared to feature
adaptation (FA) and output space adaptation (OSA).Comment: 11pages, 12 figure
Learning Aerial Image Segmentation from Online Maps
This study deals with semantic segmentation of high-resolution (aerial)
images where a semantic class label is assigned to each pixel via supervised
classification as a basis for automatic map generation. Recently, deep
convolutional neural networks (CNNs) have shown impressive performance and have
quickly become the de-facto standard for semantic segmentation, with the added
benefit that task-specific feature design is no longer necessary. However, a
major downside of deep learning methods is that they are extremely data-hungry,
thus aggravating the perennial bottleneck of supervised classification, to
obtain enough annotated training data. On the other hand, it has been observed
that they are rather robust against noise in the training labels. This opens up
the intriguing possibility to avoid annotating huge amounts of training data,
and instead train the classifier from existing legacy data or crowd-sourced
maps which can exhibit high levels of noise. The question addressed in this
paper is: can training with large-scale, publicly available labels replace a
substantial part of the manual labeling effort and still achieve sufficient
performance? Such data will inevitably contain a significant portion of errors,
but in return virtually unlimited quantities of it are available in larger
parts of the world. We adapt a state-of-the-art CNN architecture for semantic
segmentation of buildings and roads in aerial images, and compare its
performance when using different training data sets, ranging from manually
labeled, pixel-accurate ground truth of the same city to automatic training
data derived from OpenStreetMap data from distant locations. We report our
results that indicate that satisfying performance can be obtained with
significantly less manual annotation effort, by exploiting noisy large-scale
training data.Comment: Published in IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSIN
Deep Learning Methods for 3D Aerial and Satellite Data
Recent advances in digital electronics have led to an overabundance of observations from electro-optical (EO) imaging sensors spanning high spatial, spectral and temporal resolution. This unprecedented volume, variety, and velocity is overwhelming our capacity to manage and translate that data into actionable information. Although decades of image processing research have taken the human out of the loop for many important tasks, the human analyst is still an irreplaceable link in the image exploitation chain, especially for more complex tasks requiring contextual understanding, memory, discernment, and learning. If knowledge discovery is to keep pace with the growing availability of data, new processing paradigms are needed in order to automate the analysis of earth observation imagery and ease the burden of manual interpretation.
To address this gap, this dissertation advances fundamental and applied research in deep learning for aerial and satellite imagery. We show how deep learning---a computational model inspired by the human brain---can be used for (1) tracking, (2) classifying, and (3) modeling from a variety of data sources including full-motion video (FMV), Light Detection and Ranging (LiDAR), and stereo photogrammetry. First we assess the ability of a bio-inspired tracking method to track small targets using aerial videos. The tracker uses three kinds of saliency maps: appearance, location, and motion. Our approach achieves the best overall performance, including being the only method capable of handling long-term occlusions.
Second, we evaluate the classification accuracy of a multi-scale fully convolutional network to label individual points in LiDAR data. Our method uses only the 3D-coordinates and corresponding low-dimensional spectral features for each point. Evaluated using the ISPRS 3D Semantic Labeling Contest, our method scored second place with an overall accuracy of 81.6\%. Finally, we validate the prediction capability of our neighborhood-aware network to model the bare-earth surface of LiDAR and stereo photogrammetry point clouds. The network bypasses traditionally-used ground classifications and seamlessly integrate neighborhood features with point-wise and global features to predict a per point Digital Terrain Model (DTM). We compare our results with two widely used softwares for DTM extraction, ENVI and LAStools. Together, these efforts have the potential to alleviate the manual burden associated with some of the most challenging and time-consuming geospatial processing tasks, with implications for improving our response to issues of global security, emergency management, and disaster response
Aerial LaneNet: Lane Marking Semantic Segmentation in Aerial Imagery using Wavelet-Enhanced Cost-sensitive Symmetric Fully Convolutional Neural Networks
The knowledge about the placement and appearance of lane markings is a
prerequisite for the creation of maps with high precision, necessary for
autonomous driving, infrastructure monitoring, lane-wise traffic management,
and urban planning. Lane markings are one of the important components of such
maps. Lane markings convey the rules of roads to drivers. While these rules are
learned by humans, an autonomous driving vehicle should be taught to learn them
to localize itself. Therefore, accurate and reliable lane marking semantic
segmentation in the imagery of roads and highways is needed to achieve such
goals. We use airborne imagery which can capture a large area in a short period
of time by introducing an aerial lane marking dataset. In this work, we propose
a Symmetric Fully Convolutional Neural Network enhanced by Wavelet Transform in
order to automatically carry out lane marking segmentation in aerial imagery.
Due to a heavily unbalanced problem in terms of number of lane marking pixels
compared with background pixels, we use a customized loss function as well as a
new type of data augmentation step. We achieve a very high accuracy in
pixel-wise localization of lane markings without using 3rd-party information.
In this work, we introduce the first high-quality dataset used within our
experiments which contains a broad range of situations and classes of lane
markings representative of current transportation systems. This dataset will be
publicly available and hence, it can be used as the benchmark dataset for
future algorithms within this domain.Comment: IEEE TGRS 2018 - Accepte
An Approach to Semantically Segmenting Building Components and Outdoor Scenes Based on Multichannel Aerial Imagery Datasets
As-is building modeling plays an important role in energy audits and retrofits. However, in order to understand the source(s) of energy loss, researchers must know the semantic information of the buildings and outdoor scenes. Thermal information can potentially be used to distinguish objects that have similar surface colors but are composed of different materials. To utilize both the redâgreenâblue (RGB) color model and thermal information for the semantic segmentation of buildings and outdoor scenes, we deployed and adapted various pioneering deep convolutional neural network (DCNN) tools that combine RGB information with thermal information to improve the semantic and instance segmentation processes. When both types of information are available, the resulting DCNN models allow us to achieve better segmentation performance. By deploying three case studies, we experimented with our proposed DCNN framework, deploying datasets of building components and outdoor scenes, and testing the models to determine whether the segmentation performance had improved or not. In our observation, the fusion of RGB and thermal information can help the segmentation task in specific cases, but it might also make the neural networks hard to train or deteriorate their prediction performance in some cases. Additionally, different algorithms perform differently in semantic and instance segmentation
Deep Learning based Vehicle Detection in Aerial Imagery
Der Einsatz von luftgestĂŒtzten Plattformen, die mit bildgebender Sensorik ausgestattet sind, ist ein wesentlicher Bestandteil von vielen Anwendungen im Bereich der zivilen Sicherheit. Bekannte Anwendungsgebiete umfassen unter anderem die Entdeckung verbotener oder krimineller AktivitĂ€ten, VerkehrsĂŒberwachung, Suche und Rettung, Katastrophenhilfe und UmweltĂŒberwachung. Aufgrund der groĂen Menge zu verarbeitender Daten und der daraus resultierenden kognitiven Ăberbelastung ist jedoch eine Analyse der Luftbilddaten ausschlieĂlich durch menschliche Auswerter in der Praxis nicht anwendbar. Zur UnterstĂŒtzung der menschlichen Auswerter kommen daher in der Regel automatische Bild- und Videoverarbeitungsalgorithmen zum Einsatz. Eine zentrale Aufgabe bildet dabei eine zuverlĂ€ssige Detektion relevanter Objekte im Sichtfeld der Kamera, bevor eine Interpretation der gegebenen Szene stattfinden kann. Die geringe Bodenauflösung aufgrund der groĂen Distanz zwischen Kamera und Erde macht die Objektdetektion in Luftbilddaten zu einer herausfordernden Aufgabe, welche durch BewegungsunschĂ€rfe, Verdeckungen und Schattenwurf zusĂ€tzlich erschwert wird. Obwohl in der Literatur eine Vielzahl konventioneller AnsĂ€tze zur Detektion von Objekten in Luftbilddaten existiert, ist die Detektionsgenauigkeit durch die ReprĂ€sentationsfĂ€higkeit der verwendeten manuell entworfenen Merkmale beschrĂ€nkt.
Im Rahmen dieser Arbeit wird ein neuer Deep-Learning basierter Ansatz zur Detektion von Objekten in Luftbilddaten prĂ€sentiert. Der Fokus der Arbeit liegt dabei auf der Detektion von Fahrzeugen in Luftbilddaten, die senkrecht von oben aufgenommen wurden. Grundlage des entwickelten Ansatzes bildet der Faster R-CNN Detektor, der im Vergleich zu anderen Deep-Learning basierten Detektionsverfahren eine höhere Detektionsgenauigkeit besitzt. Da Faster R-CNN wie auch die anderen Deep-Learning basierten Detektionsverfahren auf Benchmark DatensĂ€tzen optimiert wurden, werden in einem ersten Schritt notwendige Anpassungen an die Eigenschaften der Luftbilddaten, wie die geringen Abmessungen der zu detektierenden Fahrzeuge, systematisch untersucht und daraus resultierende Probleme identifiziert. Im Hinblick auf reale Anwendungen sind hier vor allem die hohe Anzahl fehlerhafter Detektionen durch fahrzeugĂ€hnliche Strukturen und die deutlich erhöhte Laufzeit problematisch. Zur Reduktion der fehlerhaften Detektionen werden zwei neue AnsĂ€tze vorgeschlagen. Beide AnsĂ€tze verfolgen dabei das Ziel, die verwendete MerkmalsreprĂ€sentation durch zusĂ€tzliche Kontextinformationen zu verbessern. Der erste Ansatz verfeinert die rĂ€umlichen Kontextinformationen durch eine Kombination der Merkmale von frĂŒhen und tiefen Schichten der zugrundeliegenden CNN Architektur, so dass feine und grobe Strukturen besser reprĂ€sentiert werden. Der zweite Ansatz macht Gebrauch von semantischer Segmentierung um den semantischen Informationsgehalt zu erhöhen. Hierzu werden zwei verschiedene Varianten zur Integration der semantischen Segmentierung in das Detektionsverfahren realisiert: zum einen die Verwendung der semantischen Segmentierungsergebnisse zur Filterung von unwahrscheinlichen Detektionen und zum anderen explizit durch Verschmelzung der CNN Architekturen zur Detektion und Segmentierung. Sowohl durch die Verfeinerung der rĂ€umlichen Kontextinformationen als auch durch die Integration der semantischen Kontextinformationen wird die Anzahl der fehlerhaften Detektionen deutlich reduziert und somit die Detektionsgenauigkeit erhöht. Insbesondere der starke RĂŒckgang von fehlerhaften Detektionen in unwahrscheinlichen Bildregionen, wie zum Beispiel auf GebĂ€uden, zeigt die erhöhte Robustheit der gelernten MerkmalsreprĂ€sentationen. Zur Reduktion der Laufzeit werden im Rahmen der Arbeit zwei alternative Strategien verfolgt. Die erste Strategie ist das Ersetzen der zur Merkmalsextraktion standardmĂ€Ăig verwendeten CNN Architektur mit einer laufzeitoptimierten CNN Architektur unter BerĂŒcksichtigung der Eigenschaften der Luftbilddaten, wĂ€hrend die zweite Strategie ein neues Modul zur Reduktion des Suchraumes umfasst. Mit Hilfe der vorgeschlagenen Strategien wird die Gesamtlaufzeit sowie die Laufzeit fĂŒr jede Komponente des Detektionsverfahrens deutlich reduziert. Durch Kombination der vorgeschlagenen AnsĂ€tze kann sowohl die Detektionsgenauigkeit als auch die Laufzeit im Vergleich zur Faster R-CNN Baseline signifikant verbessert werden. ReprĂ€sentative AnsĂ€tze zur Fahrzeugdetektion in Luftbilddaten aus der Literatur werden quantitativ und qualitativ auf verschiedenen DatensĂ€tzen ĂŒbertroffen. Des Weiteren wird die Generalisierbarkeit des entworfenen Ansatzes auf ungesehenen Bildern von weiteren LuftbilddatensĂ€tzen mit abweichenden Eigenschaften demonstriert
- âŠ