Search CORE

27 research outputs found

Reversing uncertainty sampling to improve active learning schemes

Author: Alonso i Alemany Laura
Cardellino Cristian
Publication venue
Publication date: 01/01/2015
Field of study

Active learning provides promising methods to optimize the cost of manually annotating a dataset. However, practitioners in many areas do not massively resort to such methods because they present technical difficulties and do not provide a guarantee of good performance, especially in skewed distributions with scarcely populated minority classes and an undefined, catch-all majority class, which are very common in human-related phenomena like natural language. In this paper we present a comparison of the simplest active learning technique, pool-based uncertainty sampling, and its opposite, which we call reversed uncertainty sampling. We show that both obtain results comparable to the random, arguing for a more insightful approach to active learning.Sociedad Argentina de Informática e Investigación Operativa (SADIO

Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings

Author: Alonso i Alemany Laura
Cardellino Cristian
Publication venue
Publication date: 01/09/2017
Field of study

This work explores the use of word embeddings, also known as word vectors, trained on Spanish corpora, to use as features for Spanish verb sense disambiguation (VSD). This type of learning technique is named disjoint semisupervised learning [1]: an unsupervised algorithm is trained on unlabeled data separately as a first step, and then its results (i.e. the word embeddings) are fed to a supervised classifier. Throughout this paper we try to assert two hypothesis: (i) representations of training instances based on word embeddings improve the performance of supervised models for VSD, in contrast to more standard feature engineering techniques based on information taken from the training data; (ii) using word embeddings trained on a specific domain, in this case the same domain the labeled data is gathered from, has a positive impact on the model’s performance, when compared to general domain’s word embeddings. The performance of a model over the data is not only measured using standard metric techniques (e.g. accuracy or precision/recall) but also measuring the model tendency to overfit the available data by analyzing the learning curve. Measuring this overfitting tendency is important as there is a small amount of available data, thus we need to find models to generalize better the VSD problem. For the task we use SenSem [2], a corpus and lexicon of Spanish and Catalan disambiguated verbs, as our base resource for experimentation.Sociedad Argentina de Informática e Investigación Operativ

SuFLexQA: an approach to Question Answering from the lexicon

Author: Alonso i Alemany Laura
Cardellino Cristian
Publication venue
Publication date: 12/06/2019
Field of study

We present SuFLexQA, a system for Question Answering that integrates deep linguistic information from verbal lexica into Quepy, a generic framework for translating natural language questions into a query language. We are participating in the QALD-3 contest to assess the main achievements and shortcomings of the system.Sociedad Argentina de Informática e Investigación Operativ

Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings

Author: Alonso i Alemany Laura
Cardellino Cristian
Publication venue
Publication date: 01/09/2017
Field of study

Servicio de Difusión de la Creación Intelectual

SuFLexQA: an approach to Question Answering from the lexicon

Author: Alonso i Alemany Laura
Cardellino Cristian
Publication venue
Publication date: 01/09/2013
Field of study

Reversing uncertainty sampling to improve active learning schemes

Author: Alonso i Alemany Laura
Cardellino Cristian
Publication venue
Publication date: 08/04/2016
Field of study

Learning the costs for a string edit distance-based similarity measure for abbreviated language

Author: Alonso i Alemany Laura
Publication venue
Publication date: 08/05/2023
Field of study

We present work in progress on word normalization for user-generated content. The approach is simple and helps in reducing the amount of manual annotation characteristic of more classical approaches. First, ortographic variants of a word, mostly abbreviations, are grouped together. From these manually grouped examples, we learn an automated classifier that, given a previously unseen word, determines whether it is an ortographic variant of a known word or an entirely new word. To do that, we calculate the similarity between the unseen word and all known words, and classify the new word as an ortographic variant of its most similar word. The classifier applies a string similarity measure based on the Levenshtein edit distance. To improve the accuracy of this measure, we assign edit operations an error-based cost. This scheme of cost assigning aims to maximize the distance between similar strings that are variants of different words. This custom similarity measure achieves an accuracy of .68, an important improvement if we compare it with the .54 obtained by the Levenshtein distance.Sociedad Argentina de Informática e Investigación Operativ

Servicio de Difusión de la Creación Intelectual

Combining semi-supervised and active learning to recognize minority senses in a new corpus

Author: Alonso i Alemany Laura
Cardellino Cristian Adrián
Teruel Milagro
Publication venue
Publication date: 01/01/2015
Field of study

Ponencia presentada en la 24th International Joint Conference on Artificial Intelligence. Workshop on Replicability and Reproducibility in Natural Language Processing: adaptive methods, resources and software. Buenos Aires, Argentina, del 25 al 31 de julio de 2015.In this paper we study the impact of combining active learning with bootstrapping to grow a small annotated corpus from a different, unannotated corpus. The intuition underlying our approach is that bootstrapping includes instances that are closer to the generative centers of the data, while discriminative approaches to active learning include instances that are closer to the decision boundaries of classifiers. We build an initial model from the original annotated corpus, which is then iteratively enlarged by including both manually annotated examples and automatically labelled examples as training examples for the following iteration. Examples to be annotated are selected in each iteration by applying active learning techniques. We show that intertwining an active learning component in a bootstrapping approach helps to overcome an initial bias towards a majority class, thus facilitating adaptation of a starting dataset towards the real distribution of a different, unannotated corpus.Fil: Cardellino, Cristian Adrián. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina.Fil: Teruel, Milagro. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina.Fil: Alonso i Alemany, Laura. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina.Otras Ciencias de la Computación e Informació

Repositorio Digital de la Universidad Nacional de Córdoba

Reversing uncertainty sampling to improve active learning schemes

Author: Alonso i Alemany Laura
Cardellino Cristian Adrián
Teruel Milagro
Publication venue
Publication date: 01/01/2015
Field of study

Ponencia presentada en el 16º Simposio Argentino de Inteligencia Artificial. 44 Jornadas Argentinas de Informática. Rosario, Argentina, del 31 de agosto al 4 de septiembre de 2015.Active learning provides promising methods to optimize the cost of manually annotating a dataset. However, practitioners in many areas do not massively resort to such methods because they present technical difficulties and do not provide a guarantee of good performance, especially in skewed distributions with scarcely populated minority classes and an undefined, catch-all majority class, which are very common in human-related phenomena like natural language. In this paper we present a comparison of the simplest active learning technique, pool-based uncertainty sampling, and its opposite, which we call reversed uncertainty sampling. We show that both obtain results comparable to the random, arguing for a more insightful approach to active learning.http://44jaiio.sadio.org.ar/asaiFil: Cardellino, Cristian Adrián. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina.Fil: Teruel, Milagro. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina.Fil: Alonso i Alemany, Laura. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina.Ciencias de la Computació

Repositorio Digital de la Universidad Nacional de Córdoba

Reversing uncertainty sampling to improve active learning schemes

Author: Alonso i Alemany Laura
Cardellino Cristian
Publication venue
Publication date: 01/01/2015
Field of study

Servicio de Difusión de la Creación Intelectual