15 research outputs found
An uncertainty prediction approach for active learning - application to earth observation
Mapping land cover and land usage dynamics are crucial in remote sensing since farmers
are encouraged to either intensify or extend crop use due to the ongoing rise in the world’s
population. A major issue in this area is interpreting and classifying a scene captured in
high-resolution satellite imagery. Several methods have been put forth, including neural
networks which generate data-dependent models (i.e. model is biased toward data) and
static rule-based approaches with thresholds which are limited in terms of diversity(i.e.
model lacks diversity in terms of rules). However, the problem of having a machine learning
model that, given a large amount of training data, can classify multiple classes over different
geographic Sentinel-2 imagery that out scales existing approaches remains open.
On the other hand, supervised machine learning has evolved into an essential part of many
areas due to the increasing number of labeled datasets. Examples include creating classifiers
for applications that recognize images and voices, anticipate traffic, propose products, act
as a virtual personal assistant and detect online fraud, among many more. Since these
classifiers are highly dependent from the training datasets, without human interaction or
accurate labels, the performance of these generated classifiers with unseen observations
is uncertain. Thus, researchers attempted to evaluate a number of independent models
using a statistical distance. However, the problem of, given a train-test split and classifiers
modeled over the train set, identifying a prediction error using the relation between train
and test sets remains open.
Moreover, while some training data is essential for supervised machine learning, what
happens if there is insufficient labeled data? After all, assigning labels to unlabeled datasets
is a time-consuming process that may need significant expert human involvement. When
there aren’t enough expert manual labels accessible for the vast amount of openly available
data, active learning becomes crucial. However, given a large amount of training and
unlabeled datasets, having an active learning model that can reduce the training cost of
the classifier and at the same time assist in labeling new data points remains an open
problem.
From the experimental approaches and findings, the main research contributions, which
concentrate on the issue of optical satellite image scene classification include: building
labeled Sentinel-2 datasets with surface reflectance values; proposal of machine learning
models for pixel-based image scene classification; proposal of a statistical distance based
Evidence Function Model (EFM) to detect ML models misclassification; and proposal of
a generalised sampling approach for active learning that, together with the EFM enables
a way of determining the most informative examples.
Firstly, using a manually annotated Sentinel-2 dataset, Machine Learning (ML) models
for scene classification were developed and their performance was compared to Sen2Cor the reference package from the European Space Agency – a micro-F1 value of 84%
was attained by the ML model, which is a significant improvement over the corresponding
Sen2Cor performance of 59%. Secondly, to quantify the misclassification of the ML models,
the Mahalanobis distance-based EFM was devised. This model achieved, for the labeled
Sentinel-2 dataset, a micro-F1 of 67.89% for misclassification detection. Lastly, EFM was
engineered as a sampling strategy for active learning leading to an approach that attains
the same level of accuracy with only 0.02% of the total training samples when compared
to a classifier trained with the full training set.
With the help of the above-mentioned research contributions, we were able to provide
an open-source Sentinel-2 image scene classification package which consists of ready-touse
Python scripts and a ML model that classifies Sentinel-2 L1C images generating a
20m-resolution RGB image with the six studied classes (Cloud, Cirrus, Shadow, Snow,
Water, and Other) giving academics a straightforward method for rapidly and effectively
classifying Sentinel-2 scene images. Additionally, an active learning approach that uses, as
sampling strategy, the observed prediction uncertainty given by EFM, will allow labeling
only the most informative points to be used as input to build classifiers; Sumário:
Uma Abordagem de Previsão de Incerteza para
Aprendizagem Ativa – Aplicação à Observação da Terra
O mapeamento da cobertura do solo e a dinâmica da utilização do solo são cruciais na
deteção remota uma vez que os agricultores são incentivados a intensificar ou estender as
culturas devido ao aumento contÃnuo da população mundial. Uma questão importante
nesta área é interpretar e classificar cenas capturadas em imagens de satélite de alta resolução.
Várias aproximações têm sido propostas incluindo a utilização de redes neuronais
que produzem modelos dependentes dos dados (ou seja, o modelo é tendencioso em relação
aos dados) e aproximações baseadas em regras que apresentam restrições de diversidade
(ou seja, o modelo carece de diversidade em termos de regras). No entanto, a criação de
um modelo de aprendizagem automática que, dada uma uma grande quantidade de dados
de treino, é capaz de classificar, com desempenho superior, as imagens do Sentinel-2 em
diferentes áreas geográficas permanece um problema em aberto.
Por outro lado, têm sido utilizadas técnicas de aprendizagem supervisionada na resolução
de problemas nas mais diversas áreas de devido à proliferação de conjuntos de dados etiquetados.
Exemplos disto incluem classificadores para aplicações que reconhecem imagem
e voz, antecipam tráfego, propõem produtos, atuam como assistentes pessoais virtuais e
detetam fraudes online, entre muitos outros. Uma vez que estes classificadores são fortemente
dependente do conjunto de dados de treino, sem interação humana ou etiquetas
precisas, o seu desempenho sobre novos dados é incerta. Neste sentido existem propostas
para avaliar modelos independentes usando uma distância estatÃstica. No entanto, o problema
de, dada uma divisão de treino-teste e um classificador, identificar o erro de previsão
usando a relação entre aqueles conjuntos, permanece aberto.
Mais ainda, embora alguns dados de treino sejam essenciais para a aprendizagem supervisionada,
o que acontece quando a quantidade de dados etiquetados é insuficiente? Afinal,
atribuir etiquetas é um processo demorado e que exige perÃcia, o que se traduz num envolvimento
humano significativo. Quando a quantidade de dados etiquetados manualmente por
peritos é insuficiente a aprendizagem ativa torna-se crucial. No entanto, dada uma grande
quantidade dados de treino não etiquetados, ter um modelo de aprendizagem ativa que
reduz o custo de treino do classificador e, ao mesmo tempo, auxilia a etiquetagem de novas
observações permanece um problema em aberto.
A partir das abordagens e estudos experimentais, as principais contribuições deste trabalho,
que se concentra na classificação de cenas de imagens de satélite óptico incluem:
criação de conjuntos de dados Sentinel-2 etiquetados, com valores de refletância de superfÃcie;
proposta de modelos de aprendizagem automática baseados em pixels para classificação de cenas de imagens de satétite; proposta de um Modelo de Função de Evidência (EFM)
baseado numa distância estatÃstica para detetar erros de classificação de modelos de aprendizagem;
e proposta de uma abordagem de amostragem generalizada para aprendizagem
ativa que, em conjunto com o EFM, possibilita uma forma de determinar os exemplos mais
informativos.
Em primeiro lugar, usando um conjunto de dados Sentinel-2 etiquetado manualmente,
foram desenvolvidos modelos de Aprendizagem Automática (AA) para classificação de cenas
e seu desempenho foi comparado com o do Sen2Cor – o produto de referência da
Agência Espacial Europeia – tendo sido alcançado um valor de micro-F1 de 84% pelo classificador,
o que representa uma melhoria significativa em relação ao desempenho Sen2Cor
correspondente, de 59%. Em segundo lugar, para quantificar o erro de classificação dos
modelos de AA, foi concebido o Modelo de Função de Evidência baseado na distância de
Mahalanobis. Este modelo conseguiu, para o conjunto de dados etiquetado do Sentinel-2
um micro-F1 de 67,89% na deteção de classificação incorreta. Por fim, o EFM foi utilizado
como uma estratégia de amostragem para a aprendizagem ativa, uma abordagem
que permitiu atingir o mesmo nÃvel de desempenho com apenas 0,02% do total de exemplos
de treino quando comparado com um classificador treinado com o conjunto de treino
completo.
Com a ajuda das contribuições acima mencionadas, foi possÃvel desenvolver um pacote
de código aberto para classificação de cenas de imagens Sentinel-2 que, utilizando num
conjunto de scripts Python, um modelo de classificação, e uma imagem Sentinel-2 L1C,
gera a imagem RGB correspondente (com resolução de 20m) com as seis classes estudadas
(Cloud, Cirrus, Shadow, Snow, Water e Other), disponibilizando à academia um método
direto para a classificação de cenas de imagens do Sentinel-2 rápida e eficaz. Além disso, a
abordagem de aprendizagem ativa que usa, como estratégia de amostragem, a deteção de
classificacão incorreta dada pelo EFM, permite etiquetar apenas os pontos mais informativos
a serem usados como entrada na construção de classificadores
Dynamic Virtual Machine Migration using Network Aware Topology
Clients of the applications communicate with the services hosted in the VMs. Many applications have clients all over the world. An application is expected to provide faster access and transmission of data to its clients if it is geographically close to its clients, as some of the research work suggests that geographical distance has impact on quality of service (QoS) [1,2,3]. In order to provide a faster access and data transfer, applications which have clients all over the world should be hosted in a data center, which is on average close to its clients geographically
Multi-Language Neural Network Model with Advance Preprocessor for Gender Classification over Social Media
This paper describes approaches for the Author Profiling Shared Task
at PAN 2018. The goal was to classify the gender of a Twitter user solely by their
tweets. Paper explores a simple and efficient Multi-Language model for gender
classification. The approach consists of tweet preprocessing, text representation
and classification model construction. The model achieved the best results on
the English language with an accuracy of 72.79%; for the Spanish and Arabic
languages the accuracy was 72.20% and 64.36%, respectively
Fully Connected Neural Network with Advance Preprocessor to Identify Aggression over Facebook and Twitter
Aggression Identification and Hate Speech detection had become an essential part of
cyberharassment and cyberbullying and an automatic aggression identification can lead to the
interception of such trolling. Following the same idealization, vista.ue team participated in the
workshop which included a shared task on ’Aggression Identification’.
A dataset of 15,000 aggression-annotated Facebook Posts and Comments written in Hindi (in
both Roman and Devanagari script) and English languages were made available and different
classification models were designed. This paper presents a model that outperforms Facebook
FastText (Joulin et al., 2016a) and deep learning models over this dataset. Especially, the English
developed system, when used to classify Twitter text, outperforms all the shared task submitted
systems
Event extraction and representation: A case study for the portuguese language
Text information extraction is an important natural language processing (NLP) task, which aims to automatically identify, extract, and represent information from text. In this context, event extraction plays a relevant role, allowing actions, agents, objects, places, and time periods to be identified and represented. The extracted information can be represented by specialized ontologies, supporting knowledge-based reasoning and inference processes. In this work, we will describe, in detail, our proposal for event extraction from Portuguese documents. The proposed approach is based on a pipeline of specialized natural language processing tools; namely, a part-of-speech tagger, a named entities recognizer, a dependency parser, semantic role labeling, and a knowledge extraction module. The architecture is language-independent, but its modules are language-dependent and can be built using adequate AI (i.e., rule-based or machine learning) methodologies. The developed system was evaluated with a corpus of Portuguese texts and the obtained results are presented and analysed. The current limitations and future work are discussed in detail
Sentinel 2 Image Scene Classifica- tion: A Comparison Between Bands and Spectral Indices.
Given the continuous increase in the global population, the food manufacturers are advocated to either intensify the use of cropland or expand the farmland, making land cover and land usage dynamics mapping vital
in the area of remote sensing. In this regard, identifying and classifying a high-resolution satellite imagery scene is a prime challenge. Several approaches have been proposed either by using static rule-based thresholds (with limitation of diversity) or neural network (with data-dependent limitations). This paper adopts an inductive approach to build classifiers from spectral reflectances, comparing usefulness of the various spectral indices to raw bands information. More specifically, it considers Sentinel2 data for six classes Scene Classification (Water, Shadow, Cirrus, Cloud, Snow and Other). The experimental results show that using raw bands
performs equally well, claiming that raw bands information can be used as a replacement of the spectral indices
Sentinel-2 Image Scene Classification: A Comparison between Sen2Cor and a Machine Learning Approach
Given the continuous increase in the global population, the food manufacturers are advocated to either intensify the use of cropland or expand the farmland, making land cover and land usage dynamics mapping vital in the area of remote sensing. In this regard, identifying and classifying a high-resolution satellite imagery scene is a prime challenge. Several approaches have been proposed either by using static rule-based thresholds (with limitation of diversity) or neural network (with data-dependent limitations). This paper adopts the inductive approach to learning from surface reflectances. A manually labeled Sentinel-2 dataset was used to build a Machine Learning (ML) model for scene classification, distinguishing six classes (Water, Shadow, Cirrus, Cloud, Snow, and Other). This models was accessed and further compared to the European Space Agency (ESA) Sen2Cor package. The proposed ML model presents a Micro-F1 value of 0.84, a considerable improvement when compared to the Sen2Cor corresponding performance of 0.59. Focusing on the problem of optical satellite image scene classification, the main research contributions of this paper are: (a) an extended manually labeled Sentinel-2 database adding surface reflectance values to an existing dataset; (b) an ensemble-based and a Neural-Network-based ML models; (c) an evaluation of model sensitivity, biasness, and diverse ability in classifying multiple classes over different geographic Sentinel-2 imagery, and finally, (d) the benchmarking of the ML approach against the Sen2Cor package
Sentinel-2 Image Scene Classification over Alentejo Region Farmland
Given the wide-ranging farmland area, optical satellite images of farms
are used to develop maps that reflect land dynamics and its behavior over
different time frames, crops, and regions on various environmental conditions. In this regard, it is essential to identify and remove atmospheric
distorted images to further prevent misleading information, since their
presence severely restrict the use of optical satellite images for forecasting harvest dates, yield estimation, and manufacturing control in agriculture systems. These atmospheric distortions are frequent due to cloud,
shadow, snow, and water cover over farmland. In this work, we developed
a method to identify distortion covering images of corn crop farmland situated in the Alentejo Region of Portugal. The results are compared with
the state-of-the-art (SOTA) Sen2Cor algorithm of the European Space
Agency. Further, experimental results show that the developed image
scene classifier model outperforms Sen2Cor by 10% in F1-measure
Abbreviating labelling cost for sentinel-2 image scene classification through active learning.
Over the years, due to the enrichment of paired-label datasets, supervised machine learning has become an important part of any problem-solving process. Active Learning gains importance when, given a large amount of freely available data, there’s a lack of expert’s manual labels. This paper proposes an active learning algorithm for selective choice of training samples in remote sensing image scene classification. Here, the classifier ranks the unlabeled pixels based on predefined heuristics and automatically selects those that are considered the most valuable for improvement; the expert then manually labels the selected pixels and the process is repeated. The system builds the optimal set of samples from a small and non-optimal training set, achieving a predefined classification accuracy. The experimental findings demonstrate that by adopting the proposed methodology, 0.02% of total training samples are required for Sentinel-2 Image Scene Classification while still reaching the same level of accuracy reached by complete training data sets. The advantages of the proposed method is highlighted by a comparison with the state-of-the-art active learning method named entropy sampling
Automated Event Extraction Model for Linked Portuguese Documents
In recent times, Machine Learning is booming and researchers are applying it to the most conceivable cases such as the area of linked documents. This article presents a process of automatic event extraction from Portuguese linked document whose accuracy (95.00%) was calculated by manual verification. With the help of an ontological structure, extracted events are mapped as a knowledge graph that represents the named entities and the events associated with each document. Such graphs are accessible through SPARQL queries. This way, the information existing in the linked documents can be easily accessed by resorting to a question-answering approach