33 research outputs found

    SuFLexQA: an approach to Question Answering from the lexicon

    Get PDF
    We present SuFLexQA, a system for Question Answering that integrates deep linguistic information from verbal lexica into Quepy, a generic framework for translating natural language questions into a query language. We are participating in the QALD-3 contest to assess the main achievements and shortcomings of the system.Sociedad Argentina de Inform谩tica e Investigaci贸n Operativ

    Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings

    Get PDF
    This work explores the use of word embeddings, also known as word vectors, trained on Spanish corpora, to use as features for Spanish verb sense disambiguation (VSD). This type of learning technique is named disjoint semisupervised learning [1]: an unsupervised algorithm is trained on unlabeled data separately as a first step, and then its results (i.e. the word embeddings) are fed to a supervised classifier. Throughout this paper we try to assert two hypothesis: (i) representations of training instances based on word embeddings improve the performance of supervised models for VSD, in contrast to more standard feature engineering techniques based on information taken from the training data; (ii) using word embeddings trained on a specific domain, in this case the same domain the labeled data is gathered from, has a positive impact on the model鈥檚 performance, when compared to general domain鈥檚 word embeddings. The performance of a model over the data is not only measured using standard metric techniques (e.g. accuracy or precision/recall) but also measuring the model tendency to overfit the available data by analyzing the learning curve. Measuring this overfitting tendency is important as there is a small amount of available data, thus we need to find models to generalize better the VSD problem. For the task we use SenSem [2], a corpus and lexicon of Spanish and Catalan disambiguated verbs, as our base resource for experimentation.Sociedad Argentina de Inform谩tica e Investigaci贸n Operativ

    Reversing uncertainty sampling to improve active learning schemes

    Get PDF
    Active learning provides promising methods to optimize the cost of manually annotating a dataset. However, practitioners in many areas do not massively resort to such methods because they present technical difficulties and do not provide a guarantee of good performance, especially in skewed distributions with scarcely populated minority classes and an undefined, catch-all majority class, which are very common in human-related phenomena like natural language. In this paper we present a comparison of the simplest active learning technique, pool-based uncertainty sampling, and its opposite, which we call reversed uncertainty sampling. We show that both obtain results comparable to the random, arguing for a more insightful approach to active learning.Sociedad Argentina de Inform谩tica e Investigaci贸n Operativa (SADIO

    SuFLexQA: an approach to Question Answering from the lexicon

    Get PDF
    We present SuFLexQA, a system for Question Answering that integrates deep linguistic information from verbal lexica into Quepy, a generic framework for translating natural language questions into a query language. We are participating in the QALD-3 contest to assess the main achievements and shortcomings of the system.Sociedad Argentina de Inform谩tica e Investigaci贸n Operativ

    Reversing uncertainty sampling to improve active learning schemes

    Get PDF
    Active learning provides promising methods to optimize the cost of manually annotating a dataset. However, practitioners in many areas do not massively resort to such methods because they present technical difficulties and do not provide a guarantee of good performance, especially in skewed distributions with scarcely populated minority classes and an undefined, catch-all majority class, which are very common in human-related phenomena like natural language. In this paper we present a comparison of the simplest active learning technique, pool-based uncertainty sampling, and its opposite, which we call reversed uncertainty sampling. We show that both obtain results comparable to the random, arguing for a more insightful approach to active learning.Sociedad Argentina de Inform谩tica e Investigaci贸n Operativa (SADIO

    Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings

    Get PDF
    This work explores the use of word embeddings, also known as word vectors, trained on Spanish corpora, to use as features for Spanish verb sense disambiguation (VSD). This type of learning technique is named disjoint semisupervised learning [1]: an unsupervised algorithm is trained on unlabeled data separately as a first step, and then its results (i.e. the word embeddings) are fed to a supervised classifier. Throughout this paper we try to assert two hypothesis: (i) representations of training instances based on word embeddings improve the performance of supervised models for VSD, in contrast to more standard feature engineering techniques based on information taken from the training data; (ii) using word embeddings trained on a specific domain, in this case the same domain the labeled data is gathered from, has a positive impact on the model鈥檚 performance, when compared to general domain鈥檚 word embeddings. The performance of a model over the data is not only measured using standard metric techniques (e.g. accuracy or precision/recall) but also measuring the model tendency to overfit the available data by analyzing the learning curve. Measuring this overfitting tendency is important as there is a small amount of available data, thus we need to find models to generalize better the VSD problem. For the task we use SenSem [2], a corpus and lexicon of Spanish and Catalan disambiguated verbs, as our base resource for experimentation.Sociedad Argentina de Inform谩tica e Investigaci贸n Operativ

    Estudio de m茅todos semisupervisados para la desambiguaci贸n de sentidos verbales del espa帽ol

    Get PDF
    Esta tesis explora el uso de t茅cnicas semisupervisadas para la desambigaci贸n de sentidos verbales del espa帽ol. El objetivo es el estudio de como la informaci贸n de datos no etiquetados, que son mayores en tama帽o, puede ayudar a un clasificador entrenado desde un conjunto de datos etiquetados peque帽o. La tesis comienza desde la tarea completamente supervisada de desambiguaci贸n de sentidos verbales y estudia las siguientes t茅cnicas semisupervisadas comparando su impacto en la tarea original: uso de vectores de palabras (o word embeddings), autoaprendizaje, aprendizaje activo y redes neuronales en escalera

    Combining semi-supervised and active learning to recognize minority senses in a new corpus

    Get PDF
    Ponencia presentada en la 24th International Joint Conference on Artificial Intelligence. Workshop on Replicability and Reproducibility in Natural Language Processing: adaptive methods, resources and software. Buenos Aires, Argentina, del 25 al 31 de julio de 2015.In this paper we study the impact of combining active learning with bootstrapping to grow a small annotated corpus from a different, unannotated corpus. The intuition underlying our approach is that bootstrapping includes instances that are closer to the generative centers of the data, while discriminative approaches to active learning include instances that are closer to the decision boundaries of classifiers. We build an initial model from the original annotated corpus, which is then iteratively enlarged by including both manually annotated examples and automatically labelled examples as training examples for the following iteration. Examples to be annotated are selected in each iteration by applying active learning techniques. We show that intertwining an active learning component in a bootstrapping approach helps to overcome an initial bias towards a majority class, thus facilitating adaptation of a starting dataset towards the real distribution of a different, unannotated corpus.Fil: Cardellino, Cristian Adri谩n. Universidad Nacional de C贸rdoba. Facultad de Matem谩tica, Astronom铆a y F铆sica; Argentina.Fil: Teruel, Milagro. Universidad Nacional de C贸rdoba. Facultad de Matem谩tica, Astronom铆a y F铆sica; Argentina.Fil: Alonso i Alemany, Laura. Universidad Nacional de C贸rdoba. Facultad de Matem谩tica, Astronom铆a y F铆sica; Argentina.Otras Ciencias de la Computaci贸n e Informaci贸

    Reversing uncertainty sampling to improve active learning schemes

    Get PDF
    Ponencia presentada en el 16潞 Simposio Argentino de Inteligencia Artificial. 44 Jornadas Argentinas de Inform谩tica. Rosario, Argentina, del 31 de agosto al 4 de septiembre de 2015.Active learning provides promising methods to optimize the cost of manually annotating a dataset. However, practitioners in many areas do not massively resort to such methods because they present technical difficulties and do not provide a guarantee of good performance, especially in skewed distributions with scarcely populated minority classes and an undefined, catch-all majority class, which are very common in human-related phenomena like natural language. In this paper we present a comparison of the simplest active learning technique, pool-based uncertainty sampling, and its opposite, which we call reversed uncertainty sampling. We show that both obtain results comparable to the random, arguing for a more insightful approach to active learning.http://44jaiio.sadio.org.ar/asaiFil: Cardellino, Cristian Adri谩n. Universidad Nacional de C贸rdoba. Facultad de Matem谩tica, Astronom铆a y F铆sica; Argentina.Fil: Teruel, Milagro. Universidad Nacional de C贸rdoba. Facultad de Matem谩tica, Astronom铆a y F铆sica; Argentina.Fil: Alonso i Alemany, Laura. Universidad Nacional de C贸rdoba. Facultad de Matem谩tica, Astronom铆a y F铆sica; Argentina.Ciencias de la Computaci贸

    SuFLexQA: an approach to Question Answering from the lexicon

    Get PDF
    We present SuFLexQA, a system for Question Answering that integrates deep linguistic information from verbal lexica into Quepy, a generic framework for translating natural language questions into a query language. We are participating in the QALD-3 contest to assess the main achievements and shortcomings of the system.Sociedad Argentina de Inform谩tica e Investigaci贸n Operativ