Search CORE

202 research outputs found

IMST: A Revisited Turkish Dependency Treebank

Author: Eryiğit Gülşen
Pamay Tuğba
Sulubacak Umut
Publication venue: EGE UNIVERSITY PRESS
Publication date: 09/04/2016
Field of study

In this paper, we present a critical analysis of the dependency annotation framework used in the METU-Sabancı Treebank (MST), and propose new annotation schemes that would alleviate the issues we have identified. Later, we describe our attempt at reannotating the treebank from the ground up using the proposed schemes, and then compare the consistencies of the two versions via cross validation using a dependency parser. According to our experiments, the reannotated version of the original treebank, which we call the ITU-METU-Sabancı Treebank (IMST), demonstrates a labeled attachment score of 75.3% and an unlabeled attachment score of 83.7%, surpassing the corresponding scores of 65.9% and 76.0% for MST by a very large margin.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

TermEval 2020 : shared task on automatic term extraction using the Annotated Corpora for term Extraction Research (ACTER) dataset

Author: Drouin Patrick
Hoste Veronique
Lefever Els
Rigouts Terryn Ayla
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2020
Field of study

The TermEval 2020 shared task provided a platform for researchers to work on automatic term extraction (ATE) with the same dataset: the Annotated Corpora for Term Extraction Research (ACTER). The dataset covers three languages (English, French, and Dutch) and four domains, of which the domain of heart failure was kept as a held-out test set on which final f1-scores were calculated. The aim was to provide a large, transparent, qualitatively annotated, and diverse dataset to the ATE research community, with the goal of promoting comparative research and thus identifying strengths and weaknesses of various state-of-the-art methodologies. The results show a lot of variation between different systems and illustrate how some methodologies reach higher precision or recall, how different systems extract different types of terms, how some are exceptionally good at finding rare terms, or are less impacted by term length. The current contribution offers an overview of the shared task with a comparative evaluation, which complements the individual papers by all participants

Ghent University Academic Bibliography

Argumentation Mining in User-Generated Web Discourse

Author: Gurevych Iryna
Habernal Ivan
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2015
Field of study

The goal of argumentation mining, an evolving research field in computational linguistics, is to design methods capable of analyzing people's argumentation. In this article, we go beyond the state of the art in several ways. (i) We deal with actual Web data and take up the challenges given by the variety of registers, multiple domains, and unrestricted noisy user-generated Web discourse. (ii) We bridge the gap between normative argumentation theories and argumentation phenomena encountered in actual data by adapting an argumentation model tested in an extensive annotation study. (iii) We create a new gold standard corpus (90k tokens in 340 documents) and experiment with several machine learning methods to identify argument components. We offer the data, source codes, and annotation guidelines to the community under free licenses. Our findings show that argumentation mining in user-generated Web discourse is a feasible but challenging task.Comment: Cite as: Habernal, I. & Gurevych, I. (2017). Argumentation Mining in User-Generated Web Discourse. Computational Linguistics 43(1), pp. 125-17

arXiv.org e-Print Archive

TUbiblio

Crossref

Directory of Open Access Journals

TUdatalib Repository (TU Darmstadt)

Parsing Argumentation Structures in Persuasive Essays

Author: Gurevych Iryna
Stab Christian
Publication venue
Publication date: 22/07/2016
Field of study

In this article, we present a novel approach for parsing argumentation structures. We identify argument components using sequence labeling at the token level and apply a new joint model for detecting argumentation structures. The proposed model globally optimizes argument component types and argumentative relations using integer linear programming. We show that our model considerably improves the performance of base classifiers and significantly outperforms challenging heuristic baselines. Moreover, we introduce a novel corpus of persuasive essays annotated with argumentation structures. We show that our annotation scheme and annotation guidelines successfully guide human annotators to substantial agreement. This corpus and the annotation guidelines are freely available for ensuring reproducibility and to encourage future research in computational argumentation.Comment: Under review in Computational Linguistics. First submission: 26 October 2015. Revised submission: 15 July 201

arXiv.org e-Print Archive

TUbiblio

Directory of Open Access Journals

TUdatalib Repository (TU Darmstadt)

Edition 1.1 of the PARSEME Shared Task on automatic identification of verbal multiword expressions

Università degli Studi di Napoli L'Orientale: CINECA IRIS

Joint Dropout: Improving Generalizability in Low-Resource Neural Machine Translation through Phrase Pair Variables

Author: Araabi Ali
Monz Christof
Niculae Vlad
Publication venue
Publication date: 24/07/2023
Field of study

Despite the tremendous success of Neural Machine Translation (NMT), its performance on low-resource language pairs still remains subpar, partly due to the limited ability to handle previously unseen inputs, i.e., generalization. In this paper, we propose a method called Joint Dropout, that addresses the challenge of low-resource neural machine translation by substituting phrases with variables, resulting in significant enhancement of compositionality, which is a key aspect of generalization. We observe a substantial improvement in translation quality for language pairs with minimal resources, as seen in BLEU and Direct Assessment scores. Furthermore, we conduct an error analysis, and find Joint Dropout to also enhance generalizability of low-resource NMT in terms of robustness and adaptability across different domainsComment: Accepted at MT Summit 202

arXiv.org e-Print Archive

Edition 1.1 of the PARSEME shared task on automatic identification of verbal multiword expressions

Author: Parra Escartín Carla
Walsh Abigail
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/08/2018
Field of study

This paper describes the PARSEME Shared Task 1.1 on automatic identification of verbal multiword expressions. We present the annotation methodology, focusing on changes from last year’s shared task. Novel aspects include enhanced annotation guidelines, additional annotated data for most languages, corpora for some new languages, and new evaluation settings. Corpora were created for 20 languages, which are also briefly discussed. We report organizational principles behind the shared task and the evaluation metrics employed for ranking. The 17 participating systems, their methods and obtained results are also presented and analysed

Irish Universities

DCU Online Research Access Service

Cross-Platform Text Mining and Natural Language Processing Interoperability - Proceedings of the LREC2016 conference

Author
Publication venue: European Language Resources Association
Publication date: 01/01/2016
Field of study

No abstract available

Cross-Platform Text Mining and Natural Language Processing Interoperability - Proceedings of the LREC2016 conference

Author
Publication venue: European Language Resources Association
Publication date: 01/01/2016
Field of study

No abstract available

Enlighten