6 research outputs found
Bounding Box-Free Instance Segmentation Using Semi-Supervised Learning for Generating a City-Scale Vehicle Dataset
Vehicle classification is a hot computer vision topic, with studies ranging
from ground-view up to top-view imagery. In remote sensing, the usage of
top-view images allows for understanding city patterns, vehicle concentration,
traffic management, and others. However, there are some difficulties when
aiming for pixel-wise classification: (a) most vehicle classification studies
use object detection methods, and most publicly available datasets are designed
for this task, (b) creating instance segmentation datasets is laborious, and
(c) traditional instance segmentation methods underperform on this task since
the objects are small. Thus, the present research objectives are: (1) propose a
novel semi-supervised iterative learning approach using GIS software, (2)
propose a box-free instance segmentation approach, and (3) provide a city-scale
vehicle dataset. The iterative learning procedure considered: (1) label a small
number of vehicles, (2) train on those samples, (3) use the model to classify
the entire image, (4) convert the image prediction into a polygon shapefile,
(5) correct some areas with errors and include them in the training data, and
(6) repeat until results are satisfactory. To separate instances, we considered
vehicle interior and vehicle borders, and the DL model was the U-net with the
Efficient-net-B7 backbone. When removing the borders, the vehicle interior
becomes isolated, allowing for unique object identification. To recover the
deleted 1-pixel borders, we proposed a simple method to expand each prediction.
The results show better pixel-wise metrics when compared to the Mask-RCNN (82%
against 67% in IoU). On per-object analysis, the overall accuracy, precision,
and recall were greater than 90%. This pipeline applies to any remote sensing
target, being very efficient for segmentation and generating datasets.Comment: 38 pages, 10 figures, submitted to journa
A CNN-Based Method of Vehicle Detection from Aerial Images Using Hard Example Mining
Recently, deep learning techniques have had a practical role in vehicle detection. While much effort has been spent on applying deep learning to vehicle detection, the effective use of training data has not been thoroughly studied, although it has great potential for improving training results, especially in cases where the training data are sparse. In this paper, we proposed using hard example mining (HEM) in the training process of a convolutional neural network (CNN) for vehicle detection in aerial images. We applied HEM to stochastic gradient descent (SGD) to choose the most informative training data by calculating the loss values in each batch and employing the examples with the largest losses. We picked 100 out of both 500 and 1000 examples for training in one iteration, and we tested different ratios of positive to negative examples in the training data to evaluate how the balance of positive and negative examples would affect the performance. In any case, our method always outperformed the plain SGD. The experimental results for images from New York showed improved performance over a CNN trained in plain SGD where the F1 score of our method was 0.02 higher
Aplicações de modelos de deep learning para monitoramento ambiental e agrícola no Brasil
Tese (doutorado) — Universidade de Brasília, Instituto de Ciências Humanas, Departamento de Geografia, Programa de Pós-Graduação em Geografia, 2022.Algoritmos do novo campo de aprendizado de máquina conhecido como Deep Learning têm se popularizado recentemente, mostrando resultados superiores a modelos tradicionais em métodos de classificação e regressão. O histórico de sua utilização no campo
do sensoriamento remoto ainda é breve, porém eles têm mostrado resultados similarmente
superiores em processos como a classificação de uso e cobertura da terra e detecção de
mudança. Esta tese teve como objetivo o desenvolvimento de metodologias utilizando
estes algoritmos com um enfoque no monitoramento de alvos críticos no Brasil por via
de imagens de satélite a fim de buscar modelos de alta precisão e acurácia para substituir
metodologias utilizadas atualmente. Ao longo de seu desenvolvimento, foram produzidos
três artigos onde foi avaliado o uso destes algoritmos para a detecção de três alvos distintos:
(a) áreas queimadas no Cerrado brasileiro, (b) áreas desmatadas na região da Amazônia e
(c) plantios de arroz no sul do Brasil. Apesar do objetivo similar na produção dos artigos,
procurou-se distinguir suficientemente suas metodologias a fim de expandir o espaço metodológico conhecido para fornecer uma base teórica para facilitar e incentivar a adoção
destes algoritmos em contexto nacional. O primeiro artigo avaliou diferentes dimensões
de amostras para a classificação de áreas queimadas em imagens Landsat-8. O segundo
artigo avaliou a utilização de séries temporais binárias de imagens Landsat para a detecção
de novas áreas desmatadas entre os anos de 2017, 2018 e 2019. O último artigo utilizou
imagens de radar Sentinel-1 (SAR) em uma série temporal contínua para a delimitação dos
plantios de arroz no Rio Grande do Sul. Modelos similares foram utilizados em todos os
artigos, porém certos modelos foram exclusivos a cada publicação, produzindo diferentes
resultados. De maneira geral, os resultados encontrados mostram que algoritmos de Deep
Learning são não só viáveis para detecção destes alvos mas também oferecem desempenho superior a métodos existentes na literatura, representando uma alternativa altamente
eficiente para classificação e detecção de mudança dos alvos avaliados.Algorithms belonging to the new field of machine learning called Deep Learning have
been gaining popularity recently, showing superior results when compared to traditional
classification and regression methods. The history of their use in the field of remote sensing is not long, however they have been showing similarly superior results in processes
such as land use classification and change detection. This thesis had as its objective the
development of methodologies using these algorithms with a focus on monitoring critical
targets in Brazil through satellite imagery in order to find high accuracy and precision models to substitute methods used currently. Through the development of this thesis, articles
were produced evaluating their use for the detection of three distinct targets: (a) burnt
areas in the Brazilian Cerrado, (b) deforested areas in the Amazon region and (c) rice fields in the south of Brazil. Despite the similar objective in the production of these articles,
the methodologies in each of them was made sufficiently distinct in order to expand the
methodological space known. The first article evaluated the use of differently sized samples to classify burnt areas in Landsat-8 imagery. The second article evaluated the use of
binary Landsat time series to detect new deforested areas between the years of 2017, 2018
and 2019. The last article used continuous radar Sentinel-1 (SAR) time series to map rice
fields in the state of Rio Grande do Sul. Similar models were used in all articles, however
certain models were exclusive to each one. In general, the results show that not only are
the Deep Learning models viable but also offer better results in comparison to other existing methods, representing an efficient alternative when it comes to the classification and
change detection of the targets evaluated
Deep Learning based Vehicle Detection in Aerial Imagery
Der Einsatz von luftgestützten Plattformen, die mit bildgebender Sensorik ausgestattet sind, ist ein wesentlicher Bestandteil von vielen Anwendungen im Bereich der zivilen Sicherheit. Bekannte Anwendungsgebiete umfassen unter anderem die Entdeckung verbotener oder krimineller Aktivitäten, Verkehrsüberwachung, Suche und Rettung, Katastrophenhilfe und Umweltüberwachung. Aufgrund der großen Menge zu verarbeitender Daten und der daraus resultierenden kognitiven Überbelastung ist jedoch eine Analyse der Luftbilddaten ausschließlich durch menschliche Auswerter in der Praxis nicht anwendbar. Zur Unterstützung der menschlichen Auswerter kommen daher in der Regel automatische Bild- und Videoverarbeitungsalgorithmen zum Einsatz. Eine zentrale Aufgabe bildet dabei eine zuverlässige Detektion relevanter Objekte im Sichtfeld der Kamera, bevor eine Interpretation der gegebenen Szene stattfinden kann. Die geringe Bodenauflösung aufgrund der großen Distanz zwischen Kamera und Erde macht die Objektdetektion in Luftbilddaten zu einer herausfordernden Aufgabe, welche durch Bewegungsunschärfe, Verdeckungen und Schattenwurf zusätzlich erschwert wird. Obwohl in der Literatur eine Vielzahl konventioneller Ansätze zur Detektion von Objekten in Luftbilddaten existiert, ist die Detektionsgenauigkeit durch die Repräsentationsfähigkeit der verwendeten manuell entworfenen Merkmale beschränkt.
Im Rahmen dieser Arbeit wird ein neuer Deep-Learning basierter Ansatz zur Detektion von Objekten in Luftbilddaten präsentiert. Der Fokus der Arbeit liegt dabei auf der Detektion von Fahrzeugen in Luftbilddaten, die senkrecht von oben aufgenommen wurden. Grundlage des entwickelten Ansatzes bildet der Faster R-CNN Detektor, der im Vergleich zu anderen Deep-Learning basierten Detektionsverfahren eine höhere Detektionsgenauigkeit besitzt. Da Faster R-CNN wie auch die anderen Deep-Learning basierten Detektionsverfahren auf Benchmark Datensätzen optimiert wurden, werden in einem ersten Schritt notwendige Anpassungen an die Eigenschaften der Luftbilddaten, wie die geringen Abmessungen der zu detektierenden Fahrzeuge, systematisch untersucht und daraus resultierende Probleme identifiziert. Im Hinblick auf reale Anwendungen sind hier vor allem die hohe Anzahl fehlerhafter Detektionen durch fahrzeugähnliche Strukturen und die deutlich erhöhte Laufzeit problematisch. Zur Reduktion der fehlerhaften Detektionen werden zwei neue Ansätze vorgeschlagen. Beide Ansätze verfolgen dabei das Ziel, die verwendete Merkmalsrepräsentation durch zusätzliche Kontextinformationen zu verbessern. Der erste Ansatz verfeinert die räumlichen Kontextinformationen durch eine Kombination der Merkmale von frühen und tiefen Schichten der zugrundeliegenden CNN Architektur, so dass feine und grobe Strukturen besser repräsentiert werden. Der zweite Ansatz macht Gebrauch von semantischer Segmentierung um den semantischen Informationsgehalt zu erhöhen. Hierzu werden zwei verschiedene Varianten zur Integration der semantischen Segmentierung in das Detektionsverfahren realisiert: zum einen die Verwendung der semantischen Segmentierungsergebnisse zur Filterung von unwahrscheinlichen Detektionen und zum anderen explizit durch Verschmelzung der CNN Architekturen zur Detektion und Segmentierung. Sowohl durch die Verfeinerung der räumlichen Kontextinformationen als auch durch die Integration der semantischen Kontextinformationen wird die Anzahl der fehlerhaften Detektionen deutlich reduziert und somit die Detektionsgenauigkeit erhöht. Insbesondere der starke Rückgang von fehlerhaften Detektionen in unwahrscheinlichen Bildregionen, wie zum Beispiel auf Gebäuden, zeigt die erhöhte Robustheit der gelernten Merkmalsrepräsentationen. Zur Reduktion der Laufzeit werden im Rahmen der Arbeit zwei alternative Strategien verfolgt. Die erste Strategie ist das Ersetzen der zur Merkmalsextraktion standardmäßig verwendeten CNN Architektur mit einer laufzeitoptimierten CNN Architektur unter Berücksichtigung der Eigenschaften der Luftbilddaten, während die zweite Strategie ein neues Modul zur Reduktion des Suchraumes umfasst. Mit Hilfe der vorgeschlagenen Strategien wird die Gesamtlaufzeit sowie die Laufzeit für jede Komponente des Detektionsverfahrens deutlich reduziert. Durch Kombination der vorgeschlagenen Ansätze kann sowohl die Detektionsgenauigkeit als auch die Laufzeit im Vergleich zur Faster R-CNN Baseline signifikant verbessert werden. Repräsentative Ansätze zur Fahrzeugdetektion in Luftbilddaten aus der Literatur werden quantitativ und qualitativ auf verschiedenen Datensätzen übertroffen. Des Weiteren wird die Generalisierbarkeit des entworfenen Ansatzes auf ungesehenen Bildern von weiteren Luftbilddatensätzen mit abweichenden Eigenschaften demonstriert
Mapping agricultural land in support of opium monitoring in Afghanistan with Convolutional Neural Networks (CNNs).
This work investigates the use of advanced image classification techniques for
improving the accuracy and efficiency in determining agricultural areas from
satellite images. The United Nations Office on Drugs and Crime (UNODC) need
to accurately delineate the potential area under opium cultivation as part of their
opium monitoring programme in Afghanistan. They currently use unsupervised
image classification, but this is unable to separate some areas of agriculture from
natural vegetation and requires time-consuming manual editing. This is a
significant task as each image must be classified and interpreted separately. The
aim of this research is to derive information about annual changes in land-use
related to opium cultivation using convolutional neural networks with Earth
observation data.
Supervised machine learning techniques were investigated for agricultural land
classification using training data from existing manual interpretations. Although
pixel-based machine learning techniques achieved high overall classification
accuracy (89%) they had difficulty separating between agriculture and natural
vegetation at some locations.
Convolutional Neural Networks (CNNs) have achieved ground-breaking
performance in computer vision applications. They use localised image features
and offer transfer learning to overcome the limitations of pixel-based methods.
There are challenges related to training CNNs for land cover classification
because of underlying radiometric and temporal variations in satellite image
datasets. Optimisation of CNNs with a targeted sampling strategy focused on
areas of known confusion (agricultural boundaries and natural vegetation). The
results showed an improved overall classification accuracy of +6%. Localised
differences in agricultural mapping were identified using a new tool called
‘localised intersection over union’. This provides greater insight than commonly
used assessment techniques (overall accuracy and kappa statistic), that are not
suitable for comparing smaller differences in mapping accuracy.
A generalised fully convolutional model (FCN) was developed and evaluated
using six years of data and transfer learning. Image datasets were standardised
across image dates and different sensors (DMC, Landsat, and Sentinel-2),
achieving high classification accuracy (up to 95%) with no additional training.
Further fine-tuning with minimal training data and a targeted training strategy
further increased model performance between years (up to +5%).
The annual changes in agricultural area from 2010 to 2019 were mapped using
the generalised FCN model in Helmand Province, Afghanistan. This provided
new insight into the expansion of agriculture into marginal areas in response to
counter-narcotic and alternative livelihoods policy. New areas of cultivation were
found to contribute to the expansion of opium cultivation in Helmand Province.
The approach demonstrates the use of FCNs for fully automated land cover
classification. They are fast and efficient, can be used to classify satellite imagery
from different sensors and can be continually refined using transfer learning.
The proposed method overcomes the manual effort associated with mapping
agricultural areas within the opium survey while improving accuracy. These
findings have wider implications for improving land cover classification using
legacy data on scalable cloud-based platforms.Simms, Daniel M. (Associate)PhD in Environment and Agrifoo