Search CORE

33 research outputs found

SuFLexQA: an approach to Question Answering from the lexicon

Author: Alonso i Alemany Laura
Cardellino Cristian
Publication venue
Publication date: 12/06/2019
Field of study

We present SuFLexQA, a system for Question Answering that integrates deep linguistic information from verbal lexica into Quepy, a generic framework for translating natural language questions into a query language. We are participating in the QALD-3 contest to assess the main achievements and shortcomings of the system.Sociedad Argentina de Informática e Investigación Operativ

Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings

Author: Alonso i Alemany Laura
Cardellino Cristian
Publication venue
Publication date: 01/09/2017
Field of study

This work explores the use of word embeddings, also known as word vectors, trained on Spanish corpora, to use as features for Spanish verb sense disambiguation (VSD). This type of learning technique is named disjoint semisupervised learning [1]: an unsupervised algorithm is trained on unlabeled data separately as a first step, and then its results (i.e. the word embeddings) are fed to a supervised classifier. Throughout this paper we try to assert two hypothesis: (i) representations of training instances based on word embeddings improve the performance of supervised models for VSD, in contrast to more standard feature engineering techniques based on information taken from the training data; (ii) using word embeddings trained on a specific domain, in this case the same domain the labeled data is gathered from, has a positive impact on the model’s performance, when compared to general domain’s word embeddings. The performance of a model over the data is not only measured using standard metric techniques (e.g. accuracy or precision/recall) but also measuring the model tendency to overfit the available data by analyzing the learning curve. Measuring this overfitting tendency is important as there is a small amount of available data, thus we need to find models to generalize better the VSD problem. For the task we use SenSem [2], a corpus and lexicon of Spanish and Catalan disambiguated verbs, as our base resource for experimentation.Sociedad Argentina de Informática e Investigación Operativ

Servicio de Difusión de la Creación Intelectual

Reversing uncertainty sampling to improve active learning schemes

Author: Alonso i Alemany Laura
Cardellino Cristian
Publication venue
Publication date: 08/04/2016
Field of study

Active learning provides promising methods to optimize the cost of manually annotating a dataset. However, practitioners in many areas do not massively resort to such methods because they present technical difficulties and do not provide a guarantee of good performance, especially in skewed distributions with scarcely populated minority classes and an undefined, catch-all majority class, which are very common in human-related phenomena like natural language. In this paper we present a comparison of the simplest active learning technique, pool-based uncertainty sampling, and its opposite, which we call reversed uncertainty sampling. We show that both obtain results comparable to the random, arguing for a more insightful approach to active learning.Sociedad Argentina de Informática e Investigación Operativa (SADIO

SuFLexQA: an approach to Question Answering from the lexicon

Author: Alonso i Alemany Laura
Cardellino Cristian
Publication venue
Publication date: 01/09/2013
Field of study

Reversing uncertainty sampling to improve active learning schemes

Author: Alonso i Alemany Laura
Cardellino Cristian
Publication venue
Publication date: 01/01/2015
Field of study

Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings

Author: Alonso i Alemany Laura
Cardellino Cristian
Publication venue
Publication date: 01/09/2017
Field of study

Estudio de métodos semisupervisados para la desambiguación de sentidos verbales del español

Author: Cardellino Cristian Adrián
Publication venue
Publication date: 01/01/2018
Field of study

Esta tesis explora el uso de técnicas semisupervisadas para la desambigación de sentidos verbales del español. El objetivo es el estudio de como la información de datos no etiquetados, que son mayores en tamaño, puede ayudar a un clasificador entrenado desde un conjunto de datos etiquetados pequeño. La tesis comienza desde la tarea completamente supervisada de desambiguación de sentidos verbales y estudia las siguientes técnicas semisupervisadas comparando su impacto en la tarea original: uso de vectores de palabras (o word embeddings), autoaprendizaje, aprendizaje activo y redes neuronales en escalera

Repositorio Digital de la Universidad Nacional de Córdoba

Combining semi-supervised and active learning to recognize minority senses in a new corpus

Author: Alonso i Alemany Laura
Cardellino Cristian Adrián
Teruel Milagro
Publication venue
Publication date: 01/01/2015
Field of study

Ponencia presentada en la 24th International Joint Conference on Artificial Intelligence. Workshop on Replicability and Reproducibility in Natural Language Processing: adaptive methods, resources and software. Buenos Aires, Argentina, del 25 al 31 de julio de 2015.In this paper we study the impact of combining active learning with bootstrapping to grow a small annotated corpus from a different, unannotated corpus. The intuition underlying our approach is that bootstrapping includes instances that are closer to the generative centers of the data, while discriminative approaches to active learning include instances that are closer to the decision boundaries of classifiers. We build an initial model from the original annotated corpus, which is then iteratively enlarged by including both manually annotated examples and automatically labelled examples as training examples for the following iteration. Examples to be annotated are selected in each iteration by applying active learning techniques. We show that intertwining an active learning component in a bootstrapping approach helps to overcome an initial bias towards a majority class, thus facilitating adaptation of a starting dataset towards the real distribution of a different, unannotated corpus.Fil: Cardellino, Cristian Adrián. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina.Fil: Teruel, Milagro. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina.Fil: Alonso i Alemany, Laura. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina.Otras Ciencias de la Computación e Informació

Repositorio Digital de la Universidad Nacional de Córdoba

Reversing uncertainty sampling to improve active learning schemes

Author: Alonso i Alemany Laura
Cardellino Cristian Adrián
Teruel Milagro
Publication venue
Publication date: 01/01/2015
Field of study

Ponencia presentada en el 16º Simposio Argentino de Inteligencia Artificial. 44 Jornadas Argentinas de Informática. Rosario, Argentina, del 31 de agosto al 4 de septiembre de 2015.Active learning provides promising methods to optimize the cost of manually annotating a dataset. However, practitioners in many areas do not massively resort to such methods because they present technical difficulties and do not provide a guarantee of good performance, especially in skewed distributions with scarcely populated minority classes and an undefined, catch-all majority class, which are very common in human-related phenomena like natural language. In this paper we present a comparison of the simplest active learning technique, pool-based uncertainty sampling, and its opposite, which we call reversed uncertainty sampling. We show that both obtain results comparable to the random, arguing for a more insightful approach to active learning.http://44jaiio.sadio.org.ar/asaiFil: Cardellino, Cristian Adrián. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina.Fil: Teruel, Milagro. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina.Fil: Alonso i Alemany, Laura. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina.Ciencias de la Computació

Repositorio Digital de la Universidad Nacional de Córdoba

SuFLexQA: an approach to Question Answering from the lexicon

Author: Alonso i Alemany Laura
Cardellino Cristian
Publication venue
Publication date: 12/06/2019
Field of study

Servicio de Difusión de la Creación Intelectual