Search CORE

3,317 research outputs found

Minimally-Supervised Morphological Segmentation using Adaptor Grammars

Author: Goldwater Sharon
Sirts Kairit
Publication venue
Publication date: 01/01/2013
Field of study

This paper explores the use of Adaptor Grammars, a nonparametric Bayesian modelling framework, for minimally supervised morphological segmentation. We compare three training methods: unsupervised training, semi-supervised training, and a novel model selection method. In the model selection method, we train unsupervised Adaptor Grammars using an over-articulated metagrammar, then use a small labelled data set to select which potential morph boundaries identified by the metagrammar should be returned in the final output. We evaluate on five languages and show that semi-supervised training provides a boost over unsupervised training, while the model selection method yields the best average results over all languages and is competitive with state-of-the-art semi-supervised systems. Moreover, this method provides the potential to tune performance according to different evaluation metrics or downstream tasks.12 page(s

Acquiring Word-Meaning Mappings for Natural Language Interfaces

Author: Thompson C.
Publication venue: 'AI Access Foundation'
Publication date: 22/06/2011
Field of study

This paper focuses on a system, WOLFIE (WOrd Learning From Interpreted Examples), that acquires a semantic lexicon from a corpus of sentences paired with semantic representations. The lexicon learned consists of phrases paired with meaning representations. WOLFIE is part of an integrated system that learns to transform sentences into representations such as logical database queries. Experimental results are presented demonstrating WOLFIE's ability to learn useful lexicons for a database interface in four different natural languages. The usefulness of the lexicons learned by WOLFIE are compared to those acquired by a similar system, with results favorable to WOLFIE. A second set of experiments demonstrates WOLFIE's ability to scale to larger and more difficult, albeit artificially generated, corpora. In natural language acquisition, it is difficult to gather the annotated data needed for supervised learning; however, unannotated data is fairly plentiful. Active learning methods attempt to select for annotation and training only the most informative examples, and therefore are potentially very useful in natural language applications. However, most results to date for active learning have only considered standard classification tasks. To reduce annotation effort while maintaining accuracy, we apply active learning to semantic lexicons. We show that active learning can significantly reduce the number of annotated examples required to achieve a given level of performance

arXiv.org e-Print Archive

Introspective data and corpus data : combination instead of confrontation in the study of German metaphorical idioms of life

Author: Kispál Tamás
Publication venue
Publication date: 01/01/2010
Field of study

This paper examines the applicability of the combination of data types in a study of German idioms of life with the tools of cognitive metaphor theory. The data sources for conceptual metaphors were mainly metaphors found in the relevant literature. These metaphors are of introspective nature to a great extent. The primary data sources for metaphorical expressions were dictionaries that represent introspective data, too. These data have been complemented by corpus data. The paper discusses the problems of introspective and corpus data raised by the study of German idioms of life. Two case studies demonstrate the advantages of the combination of data and methods

Cross-lingual sentiment analysis with machine translation:utility of training corpora and sentiment lexica

Author: Demirtas E.
Publication venue
Publication date: 31/10/2013
Field of study

Automatic propbank generation for Turkish

Author: Ak Koray
Yıldız Olcay Taner
Publication venue: 'Assoc. for Computational Linguistics Bulgaria'
Publication date: 01/09/2019
Field of study

Semantic role labeling (SRL) is an important task for understanding natural languages, where the objective is to analyse propositions expressed by the verb and to identify each word that bears a semantic role. It provides an extensive dataset to enhance NLP applications such as information retrieval, machine translation, information extraction, and question answering. However, creating SRL models are difficult. Even in some languages, it is infeasible to create SRL models that have predicate-argument structure due to lack of linguistic resources. In this paper, we present our method to create an automatic Turkish PropBank by exploiting parallel data from the translated sentences of English PropBank. Experiments show that our method gives promising results. © 2019 Association for Computational Linguistics (ACL).Publisher's Versio

Cross-lingual Argumentation Mining: Machine Translation (and a bit of Projection) is All You Need!

Author: Daxenberger Johannes
Eger Steffen
Gurevych Iryna
Stab Christian
Publication venue
Publication date: 08/06/2018
Field of study

Argumentation mining (AM) requires the identification of complex discourse structures and has lately been applied with success monolingually. In this work, we show that the existing resources are, however, not adequate for assessing cross-lingual AM, due to their heterogeneity or lack of complexity. We therefore create suitable parallel corpora by (human and machine) translating a popular AM dataset consisting of persuasive student essays into German, French, Spanish, and Chinese. We then compare (i) annotation projection and (ii) bilingual word embeddings based direct transfer strategies for cross-lingual AM, finding that the former performs considerably better and almost eliminates the loss from cross-lingual transfer. Moreover, we find that annotation projection works equally well when using either costly human or cheap machine translations. Our code and data are available at \url{http://github.com/UKPLab/coling2018-xling_argument_mining}.Comment: Accepted at Coling 201

arXiv.org e-Print Archive