57 research outputs found
The Role of Synthetic Data in Improving Supervised Learning Methods: The Case of Land Use/Land Cover Classification
A thesis submitted in partial fulfillment of the requirements for the degree of Doctor in Information ManagementIn remote sensing, Land Use/Land Cover (LULC) maps constitute important assets for
various applications, promoting environmental sustainability and good resource management.
Although, their production continues to be a challenging task. There are various factors
that contribute towards the difficulty of generating accurate, timely updated LULC maps,
both via automatic or photo-interpreted LULC mapping. Data preprocessing, being a
crucial step for any Machine Learning task, is particularly important in the remote sensing
domain due to the overwhelming amount of raw, unlabeled data continuously gathered
from multiple remote sensing missions. However a significant part of the state-of-the-art
focuses on scenarios with full access to labeled training data with relatively balanced class
distributions. This thesis focuses on the challenges found in automatic LULC classification
tasks, specifically in data preprocessing tasks. We focus on the development of novel
Active Learning (AL) and imbalanced learning techniques, to improve ML performance in
situations with limited training data and/or the existence of rare classes. We also show
that much of the contributions presented are not only successful in remote sensing problems,
but also in various other multidisciplinary classification problems. The work presented
in this thesis used open access datasets to test the contributions made in imbalanced
learning and AL. All the data pulling, preprocessing and experiments are made available at
https://github.com/joaopfonseca/publications. The algorithmic implementations are made
available in the Python package ml-research at https://github.com/joaopfonseca/ml-research
Slum image detection and localization using transfer learning: a case study in Northern Morocco
Developing countries are faced with social and economic challenges, including the emergence and proliferation of slums. Slum detection and localization methods typically rely on regular topographic surveys or on visual identification of high-resolution spatial satellite images, as well as socio-environmental surveys from land surveys and general population censuses. Yet, they consume so much time and effort. To overcome these problems, this paper exploits well-known seven pretrained models using transfer learning approaches such as MobileNets, InceptionV3, NASNetMobile, Xception, VGG16, EfficientNet, and ResNet50, consecutively, on a smaller dataset of medium-resolution satellite imagery. The accuracies obtained from these experiments, respectively, demonstrate that the top three pretrained models achieve 98.78%, 97.9%, and 97.56%. Besides, MobileNets have the smallest memory sizes of 9.1 Mo and the shortest latency of 17.01 s, which can be implemented as needed. The results show the good performance of the top three pretrained models to be used for detecting and localizing slum housing in northern Morocco
An uncertainty prediction approach for active learning - application to earth observation
Mapping land cover and land usage dynamics are crucial in remote sensing since farmers
are encouraged to either intensify or extend crop use due to the ongoing rise in the world’s
population. A major issue in this area is interpreting and classifying a scene captured in
high-resolution satellite imagery. Several methods have been put forth, including neural
networks which generate data-dependent models (i.e. model is biased toward data) and
static rule-based approaches with thresholds which are limited in terms of diversity(i.e.
model lacks diversity in terms of rules). However, the problem of having a machine learning
model that, given a large amount of training data, can classify multiple classes over different
geographic Sentinel-2 imagery that out scales existing approaches remains open.
On the other hand, supervised machine learning has evolved into an essential part of many
areas due to the increasing number of labeled datasets. Examples include creating classifiers
for applications that recognize images and voices, anticipate traffic, propose products, act
as a virtual personal assistant and detect online fraud, among many more. Since these
classifiers are highly dependent from the training datasets, without human interaction or
accurate labels, the performance of these generated classifiers with unseen observations
is uncertain. Thus, researchers attempted to evaluate a number of independent models
using a statistical distance. However, the problem of, given a train-test split and classifiers
modeled over the train set, identifying a prediction error using the relation between train
and test sets remains open.
Moreover, while some training data is essential for supervised machine learning, what
happens if there is insufficient labeled data? After all, assigning labels to unlabeled datasets
is a time-consuming process that may need significant expert human involvement. When
there aren’t enough expert manual labels accessible for the vast amount of openly available
data, active learning becomes crucial. However, given a large amount of training and
unlabeled datasets, having an active learning model that can reduce the training cost of
the classifier and at the same time assist in labeling new data points remains an open
problem.
From the experimental approaches and findings, the main research contributions, which
concentrate on the issue of optical satellite image scene classification include: building
labeled Sentinel-2 datasets with surface reflectance values; proposal of machine learning
models for pixel-based image scene classification; proposal of a statistical distance based
Evidence Function Model (EFM) to detect ML models misclassification; and proposal of
a generalised sampling approach for active learning that, together with the EFM enables
a way of determining the most informative examples.
Firstly, using a manually annotated Sentinel-2 dataset, Machine Learning (ML) models
for scene classification were developed and their performance was compared to Sen2Cor the reference package from the European Space Agency – a micro-F1 value of 84%
was attained by the ML model, which is a significant improvement over the corresponding
Sen2Cor performance of 59%. Secondly, to quantify the misclassification of the ML models,
the Mahalanobis distance-based EFM was devised. This model achieved, for the labeled
Sentinel-2 dataset, a micro-F1 of 67.89% for misclassification detection. Lastly, EFM was
engineered as a sampling strategy for active learning leading to an approach that attains
the same level of accuracy with only 0.02% of the total training samples when compared
to a classifier trained with the full training set.
With the help of the above-mentioned research contributions, we were able to provide
an open-source Sentinel-2 image scene classification package which consists of ready-touse
Python scripts and a ML model that classifies Sentinel-2 L1C images generating a
20m-resolution RGB image with the six studied classes (Cloud, Cirrus, Shadow, Snow,
Water, and Other) giving academics a straightforward method for rapidly and effectively
classifying Sentinel-2 scene images. Additionally, an active learning approach that uses, as
sampling strategy, the observed prediction uncertainty given by EFM, will allow labeling
only the most informative points to be used as input to build classifiers; Sumário:
Uma Abordagem de Previsão de Incerteza para
Aprendizagem Ativa – Aplicação à Observação da Terra
O mapeamento da cobertura do solo e a dinâmica da utilização do solo são cruciais na
deteção remota uma vez que os agricultores são incentivados a intensificar ou estender as
culturas devido ao aumento contínuo da população mundial. Uma questão importante
nesta área é interpretar e classificar cenas capturadas em imagens de satélite de alta resolução.
Várias aproximações têm sido propostas incluindo a utilização de redes neuronais
que produzem modelos dependentes dos dados (ou seja, o modelo é tendencioso em relação
aos dados) e aproximações baseadas em regras que apresentam restrições de diversidade
(ou seja, o modelo carece de diversidade em termos de regras). No entanto, a criação de
um modelo de aprendizagem automática que, dada uma uma grande quantidade de dados
de treino, é capaz de classificar, com desempenho superior, as imagens do Sentinel-2 em
diferentes áreas geográficas permanece um problema em aberto.
Por outro lado, têm sido utilizadas técnicas de aprendizagem supervisionada na resolução
de problemas nas mais diversas áreas de devido à proliferação de conjuntos de dados etiquetados.
Exemplos disto incluem classificadores para aplicações que reconhecem imagem
e voz, antecipam tráfego, propõem produtos, atuam como assistentes pessoais virtuais e
detetam fraudes online, entre muitos outros. Uma vez que estes classificadores são fortemente
dependente do conjunto de dados de treino, sem interação humana ou etiquetas
precisas, o seu desempenho sobre novos dados é incerta. Neste sentido existem propostas
para avaliar modelos independentes usando uma distância estatística. No entanto, o problema
de, dada uma divisão de treino-teste e um classificador, identificar o erro de previsão
usando a relação entre aqueles conjuntos, permanece aberto.
Mais ainda, embora alguns dados de treino sejam essenciais para a aprendizagem supervisionada,
o que acontece quando a quantidade de dados etiquetados é insuficiente? Afinal,
atribuir etiquetas é um processo demorado e que exige perícia, o que se traduz num envolvimento
humano significativo. Quando a quantidade de dados etiquetados manualmente por
peritos é insuficiente a aprendizagem ativa torna-se crucial. No entanto, dada uma grande
quantidade dados de treino não etiquetados, ter um modelo de aprendizagem ativa que
reduz o custo de treino do classificador e, ao mesmo tempo, auxilia a etiquetagem de novas
observações permanece um problema em aberto.
A partir das abordagens e estudos experimentais, as principais contribuições deste trabalho,
que se concentra na classificação de cenas de imagens de satélite óptico incluem:
criação de conjuntos de dados Sentinel-2 etiquetados, com valores de refletância de superfície;
proposta de modelos de aprendizagem automática baseados em pixels para classificação de cenas de imagens de satétite; proposta de um Modelo de Função de Evidência (EFM)
baseado numa distância estatística para detetar erros de classificação de modelos de aprendizagem;
e proposta de uma abordagem de amostragem generalizada para aprendizagem
ativa que, em conjunto com o EFM, possibilita uma forma de determinar os exemplos mais
informativos.
Em primeiro lugar, usando um conjunto de dados Sentinel-2 etiquetado manualmente,
foram desenvolvidos modelos de Aprendizagem Automática (AA) para classificação de cenas
e seu desempenho foi comparado com o do Sen2Cor – o produto de referência da
Agência Espacial Europeia – tendo sido alcançado um valor de micro-F1 de 84% pelo classificador,
o que representa uma melhoria significativa em relação ao desempenho Sen2Cor
correspondente, de 59%. Em segundo lugar, para quantificar o erro de classificação dos
modelos de AA, foi concebido o Modelo de Função de Evidência baseado na distância de
Mahalanobis. Este modelo conseguiu, para o conjunto de dados etiquetado do Sentinel-2
um micro-F1 de 67,89% na deteção de classificação incorreta. Por fim, o EFM foi utilizado
como uma estratégia de amostragem para a aprendizagem ativa, uma abordagem
que permitiu atingir o mesmo nível de desempenho com apenas 0,02% do total de exemplos
de treino quando comparado com um classificador treinado com o conjunto de treino
completo.
Com a ajuda das contribuições acima mencionadas, foi possível desenvolver um pacote
de código aberto para classificação de cenas de imagens Sentinel-2 que, utilizando num
conjunto de scripts Python, um modelo de classificação, e uma imagem Sentinel-2 L1C,
gera a imagem RGB correspondente (com resolução de 20m) com as seis classes estudadas
(Cloud, Cirrus, Shadow, Snow, Water e Other), disponibilizando à academia um método
direto para a classificação de cenas de imagens do Sentinel-2 rápida e eficaz. Além disso, a
abordagem de aprendizagem ativa que usa, como estratégia de amostragem, a deteção de
classificacão incorreta dada pelo EFM, permite etiquetar apenas os pontos mais informativos
a serem usados como entrada na construção de classificadores
Joint Energy-based Model for Remote Sensing Image Processing
The peta-scale, continuously increasing amount of publicly available remote sensing information forms an unprecedented archive of Earth observation data. Although advances in deep learning provide tools to exploit big amounts of digital information, most supervised methods rely on accurately annotated sets to train models. Access to large amounts of high-quality annotations proves costly due to the human labor involved. Such limitations have been studied in semi-supervised learning where unlabeled samples aid the generalization of models trained with limited amounts of labeled data. The Joint Energy-based Model (JEM) is a recent, physics-inspired approach simultaneously optimizing a supervised task along with a generative process to train a sampler approximating a data distribution. Although a promising formulation of such models, current JEM implementations are predominantly applied to classification tasks. Their potential improving semantic segmentation tasks remains locked.
Our work investigates JEM training behavior from a conceptual perspective, studying mechanisms of loss function divergences that numerically destabilizes the model optimization. We explore three regularization terms imposed on energy values and optimization gradients to alleviate the training complexity. Our experiments indicate that the proposed regularization mitigates loss function divergences for remote sensing imagery classification. Regularization on energy values of real samples performed the best.
Additionally, we present an extended definition of JEM for image segmentation, sJEM. In our experiments, the generation branch did not perform as expected. sJEM was unable to generate realistic remote-sensing-like samples. Correspondingly performance is biased for the sJEM segmentation branch. Initial model optimization runs demand additional research to stabilize the methodology given spatial auto-correlations in remote sensing multi-spectral imagery. Our insights pave the way for the design of follow-up research to advance sJEM for Earth observation
Scalable computing for earth observation - Application on Sea Ice analysis
In recent years, Deep learning (DL) networks have shown considerable improvements and have become a preferred methodology in many different applications. These networks have outperformed other classical techniques, particularly in large data settings. In earth observation from the satellite field, for example, DL algorithms have demonstrated the ability to learn complicated nonlinear relationships in input data accurately. Thus, it contributed to advancement in this field. However, the training process of these networks has heavy computational overheads. The reason is two-fold: The sizable complexity of these networks and the high number of training samples needed to learn all parameters comprising these architectures. Although the quantity of training data enhances the accuracy of the trained models in general, the computational cost may restrict the amount of analysis that can be done. This issue is particularly critical in satellite remote sensing, where a myriad of satellites generate an enormous amount of data daily, and acquiring in-situ ground truth for building a large training dataset is a fundamental prerequisite.
This dissertation considers various aspects of deep learning based sea ice monitoring from SAR data. In this application, labeling data is very costly and time-consuming. Also, in some cases, it is not even achievable due to challenges in establishing the required domain knowledge, specifically when it comes to monitoring Arctic Sea ice with Synthetic Aperture Radar (SAR), which is the application domain of this thesis. Because the Arctic is remote, has long dark seasons, and has a very dynamic weather system, the collection of reliable in-situ data is very demanding. In addition to the challenges of interpreting SAR data of sea ice, this issue makes SAR-based sea ice analysis with DL networks a complicated process.
We propose novel DL methods to cope with the problems of scarce training data and address the computational cost of the training process. We analyze DL network capabilities based on self-designed architectures and learn strategies, such as transfer learning for sea ice classification. We also address the scarcity of training data by proposing a novel deep semi-supervised learning method based on SAR data for incorporating unlabeled data information into the training process. Finally, a new distributed DL method that can be used in a semi-supervised manner is proposed to address the computational complexity of deep neural network training
Deep Learning Based Classification Techniques for Hyperspectral Images in Real Time
Remote sensing can be defined as the acquisition of information from a
given scene without coming into physical contact with it, through the use of sensors, mainly located on aerial
platforms, which capture information in different ranges of the electromagnetic spectrum. The objective of this
thesis is the development of efficient schemes, based on the use of deep learning neural networks, for the
classification of remotely sensed multi and hyperspectral land cover images. Efficient schemes are those that are
capable of obtaining good results in terms of classification accuracy and that can be computed in a reasonable
amount of time depending on the task performed. Regarding computational platforms, multicore architectures and
Graphics Processing Units (GPUs) will be considered
Artificial Intelligence Tools for Facial Expression Analysis.
Inner emotions show visibly upon the human face and are understood as a basic guide to an individual’s inner world. It is, therefore, possible to determine a person’s attitudes and the effects of others’ behaviour on their deeper feelings through examining facial expressions. In real world applications, machines that interact with people need strong facial expression recognition. This recognition is seen to hold advantages for varied applications in affective computing, advanced human-computer interaction, security, stress and depression analysis, robotic systems, and machine learning. This thesis starts by proposing a benchmark of dynamic versus static methods for facial Action Unit (AU) detection. AU activation is a set of local individual facial muscle parts that occur in unison constituting a natural facial expression event. Detecting AUs automatically can provide explicit benefits since it considers both static and dynamic facial features. For this research, AU occurrence activation detection was conducted by extracting features (static and dynamic) of both nominal hand-crafted and deep learning representation from each static image of a video. This confirmed the superior ability of a pretrained model that leaps in performance. Next, temporal modelling was investigated to detect the underlying temporal variation phases using supervised and unsupervised methods from dynamic sequences. During these processes, the importance of stacking dynamic on top of static was discovered in encoding deep features for learning temporal information when combining the spatial and temporal schemes simultaneously. Also, this study found that fusing both temporal and temporal features will give more long term temporal pattern information. Moreover, we hypothesised that using an unsupervised method would enable the leaching of invariant information from dynamic textures. Recently, fresh cutting-edge developments have been created by approaches based on Generative Adversarial Networks (GANs). In the second section of this thesis, we propose a model based on the adoption of an unsupervised DCGAN for the facial features’ extraction and classification to achieve the following: the creation of facial expression images under different arbitrary poses (frontal, multi-view, and in the wild), and the recognition of emotion categories and AUs, in an attempt to resolve the problem of recognising the static seven classes of emotion in the wild. Thorough experimentation with the proposed cross-database performance demonstrates that this approach can improve the generalization results. Additionally, we showed that the features learnt by the DCGAN process are poorly suited to encoding facial expressions when observed under multiple views, or when trained from a limited number of positive examples. Finally, this research focuses on disentangling identity from expression for facial expression recognition. A novel technique was implemented for emotion recognition from a single monocular image. A large-scale dataset (Face vid) was created from facial image videos which were rich in variations and distribution of facial dynamics, appearance, identities, expressions, and 3D poses. This dataset was used to train a DCNN (ResNet) to regress the expression parameters from a 3D Morphable Model jointly with a back-end classifier
Detecting Lithium (Li) Mineralizations from Space: Current Research and Future Perspectives
Optical and thermal remote sensing data have been an important tool in geological exploration for certain deposit types. However, the present economic and technological advances demand the adaptation of the remote sensing data and image processing techniques to the exploration of other raw materials like lithium (Li). A bibliometric analysis, using a systematic review approach, was made to understand the recent interest in the application of remote sensing methods in Li exploration. A review of the application studies and developments in this field was also made. Throughout the paper, the addressed topics include: (i) achievements made in Li exploration using remote sensing methods; (ii) the main weaknesses of the approaches; (iii) how to overcome these difficulties; and (iv) the expected research perspectives. We expect that the number of studies concerning this topic will increase in the near future and that remote sensing will become an integrated and fundamental tool in Li exploration
Development and Applications of Machine Learning Methods for Hyperspectral Data
Die hyperspektrale Fernerkundung der Erde stützt sich auf Daten passiver optischer Sensoren, die auf Plattformen wie Satelliten und unbemannten Luftfahrzeugen montiert sind. Hyperspektrale Daten umfassen Informationen zur Identifizierung von Materialien und zur Überwachung von Umweltvariablen wie Bodentextur, Bodenfeuchte, Chlorophyll a und Landbedeckung. Methoden zur Datenanalyse sind erforderlich, um Informationen aus hyperspektralen Daten zu erhalten. Ein leistungsstarkes Werkzeug bei der Analyse von Hyperspektraldaten ist das Maschinelle Lernen, eine Untergruppe von Künstlicher Intelligenz. Maschinelle Lernverfahren können nichtlineare Korrelationen lösen und sind bei steigenden Datenmengen skalierbar. Jeder Datensatz und jedes maschinelle Lernverfahren bringt neue Herausforderungen mit sich, die innovative Lösungen erfordern. Das Ziel dieser Arbeit ist die Entwicklung und Anwendung von maschinellen Lernverfahren auf hyperspektrale Fernerkundungsdaten. Im Rahmen dieser Arbeit werden Studien vorgestellt, die sich mit drei wesentlichen Herausforderungen befassen: (I) Datensätze, welche nur wenige Datenpunkte mit dazugehörigen Ausgabedaten enthalten, (II) das begrenzte Potential von nicht-tiefen maschinellen Lernverfahren auf hyperspektralen Daten und (III) Unterschiede zwischen den Verteilungen der Trainings- und Testdatensätzen.
Die Studien zur Herausforderung (I) führen zur Entwicklung und Veröffentlichung eines Frameworks von Selbstorganisierten Karten (SOMs) für unüberwachtes, überwachtes und teilüberwachtes Lernen. Die SOM wird auf einen hyperspektralen Datensatz in der (teil-)überwachten Regression der Bodenfeuchte angewendet und übertrifft ein Standardverfahren des maschinellen Lernens. Das SOM-Framework zeigt eine angemessene Leistung in der (teil-)überwachten Klassifikation der Landbedeckung. Es bietet zusätzliche Visualisierungsmöglichkeiten, um das Verständnis des zugrunde liegenden Datensatzes zu verbessern. In den Studien, die sich mit Herausforderung (II) befassen, werden drei innovative eindimensionale Convolutional Neural Network (CNN) Architekturen entwickelt. Die CNNs werden für eine Bodentexturklassifikation auf einen frei verfügbaren hyperspektralen Datensatz angewendet. Ihre Leistung wird mit zwei bestehenden CNN-Ansätzen und einem Random Forest verglichen. Die beiden wichtigsten Erkenntnisse lassen sich wie folgt zusammenfassen: Erstens zeigen die CNN-Ansätze eine deutlich bessere Leistung als der angewandte nicht-tiefe Random Forest-Ansatz. Zweitens verbessert das Hinzufügen von Informationen über hyperspektrale Bandnummern zur Eingabeschicht eines CNNs die Leistung im Bezug auf die einzelnen Klassen. Die Studien über die Herausforderung (III) basieren auf einem Datensatz, der auf fünf verschiedenen Messgebieten in Peru im Jahr 2019 erfasst wurde. Die Unterschiede zwischen den Messgebieten werden mit qualitativen Methoden und mit unüberwachten maschinellen Lernverfahren, wie zum Beispiel Principal Component Analysis und Autoencoder, analysiert. Basierend auf den Ergebnissen wird eine überwachte Regression der Bodenfeuchte bei verschiedenen Kombinationen von Messgebieten durchgeführt. Zusätzlich wird der Datensatz mit Monte-Carlo-Methoden ergänzt, um die Auswirkungen der Verschiebung der Verteilungen des Datensatzes auf die Regression zu untersuchen. Der angewandte SOM-Regressor ist relativ robust gegenüber dem Rauschen des Bodenfeuchtesensors und zeigt eine gute Leistung bei kleinen Datensätzen, während der angewandte Random Forest auf dem gesamten Datensatz am besten funktioniert. Die Verschiebung der Verteilungen macht diese Regressionsaufgabe schwierig; einige Kombinationen von Messgebieten bilden einen deutlich sinnvolleren Trainingsdatensatz als andere. Insgesamt zeigen die vorgestellten Studien, die sich mit den drei größten Herausforderungen befassen, vielversprechende Ergebnisse. Die Arbeit gibt schließlich Hinweise darauf, wie die entwickelten maschinellen Lernverfahren in der zukünftigen Forschung weiter verbessert werden können
Spectral-Spatial Neural Networks and Probabilistic Graph Models for Hyperspectral Image Classification
Pixel-wise hyperspectral image (HSI) classification has been actively studied since it shares similar characteristics with related computer vision tasks, including image classification, object detection, and semantic segmentation, but also possesses inherent differences. The research surrounding HSI classification sheds light on an approach to bridge computer vision and remote sensing. Modern deep neural networks dominate and repeatedly set new records in all image recognition challenges, largely due to their excellence in extracting discriminative features through multi-layer nonlinear transformation. However, three challenges hinder the direct adoption of convolutional neural networks (CNNs) for HSI classification. First, typical HSIs contain hundreds of spectral channels that encode abundant pixel-wise spectral information, leading to the curse of dimensionality. Second, HSIs usually have relatively small numbers of annotated pixels for training along with large numbers of unlabeled pixels, resulting in the problem of generalization. Third, the scarcity of annotations and the complexity of HSI data induce noisy classification maps, which are a common issue in various types of remotely sensed data interpretation.
Recent studies show that taking the data attributes into the designing of fundamental components of deep neural networks can improve their representational capacity and then facilitates these models to achieve better recognition performance. To the best of our knowledge, no research has exploited this finding or proposed corresponding models for supervised HSI classification given enough labeled HSI data. In cases of limited labeled HSI samples for training, conditional random fields (CRFs) are an effective graph model to impose data-agnostic constraints upon the intermediate outputs of trained discriminators. Although CRFs have been widely used to enhance HSI classification performance, the integration of deep learning and probabilistic graph models in the framework of semi-supervised learning remains an open question.
To this end, this thesis presents supervised spectral-spatial residual networks (SSRNs) and semi-supervised generative adversarial network (GAN) -based models that account for the characteristics of HSIs and make three main contributions. First, spectral and spatial convolution layers are introduced to learn representative HSI features for supervised learning models. Second, generative adversarial networks (GANs) composed of spectral/spatial convolution and transposed-convolution layers are proposed to take advantage of adversarial training using limited amounts of labeled data for semi-supervised learning. Third, fully-connected CRFs are adopted to impose smoothness constraints on the predictions of the trained discriminators of GANs to enhance HSI classification performance. Empirical evidence acquired by experimental comparison to state-of-the-art models validates the effectiveness and generalizability of SSRN, SS-GAN, and GAN-CRF models
- …