202 research outputs found

    IMST: A Revisited Turkish Dependency Treebank

    Get PDF
    In this paper, we present a critical analysis of the dependency annotation framework used in the METU-Sabancı Treebank (MST), and propose new annotation schemes that would alleviate the issues we have identified. Later, we describe our attempt at reannotating the treebank from the ground up using the proposed schemes, and then compare the consistencies of the two versions via cross validation using a dependency parser. According to our experiments, the reannotated version of the original treebank, which we call the ITU-METU-Sabancı Treebank (IMST), demonstrates a labeled attachment score of 75.3% and an unlabeled attachment score of 83.7%, surpassing the corresponding scores of 65.9% and 76.0% for MST by a very large margin.Peer reviewe

    TermEval 2020 : shared task on automatic term extraction using the Annotated Corpora for term Extraction Research (ACTER) dataset

    Get PDF
    The TermEval 2020 shared task provided a platform for researchers to work on automatic term extraction (ATE) with the same dataset: the Annotated Corpora for Term Extraction Research (ACTER). The dataset covers three languages (English, French, and Dutch) and four domains, of which the domain of heart failure was kept as a held-out test set on which final f1-scores were calculated. The aim was to provide a large, transparent, qualitatively annotated, and diverse dataset to the ATE research community, with the goal of promoting comparative research and thus identifying strengths and weaknesses of various state-of-the-art methodologies. The results show a lot of variation between different systems and illustrate how some methodologies reach higher precision or recall, how different systems extract different types of terms, how some are exceptionally good at finding rare terms, or are less impacted by term length. The current contribution offers an overview of the shared task with a comparative evaluation, which complements the individual papers by all participants

    Argumentation Mining in User-Generated Web Discourse

    Full text link
    The goal of argumentation mining, an evolving research field in computational linguistics, is to design methods capable of analyzing people's argumentation. In this article, we go beyond the state of the art in several ways. (i) We deal with actual Web data and take up the challenges given by the variety of registers, multiple domains, and unrestricted noisy user-generated Web discourse. (ii) We bridge the gap between normative argumentation theories and argumentation phenomena encountered in actual data by adapting an argumentation model tested in an extensive annotation study. (iii) We create a new gold standard corpus (90k tokens in 340 documents) and experiment with several machine learning methods to identify argument components. We offer the data, source codes, and annotation guidelines to the community under free licenses. Our findings show that argumentation mining in user-generated Web discourse is a feasible but challenging task.Comment: Cite as: Habernal, I. & Gurevych, I. (2017). Argumentation Mining in User-Generated Web Discourse. Computational Linguistics 43(1), pp. 125-17

    Parsing Argumentation Structures in Persuasive Essays

    Full text link
    In this article, we present a novel approach for parsing argumentation structures. We identify argument components using sequence labeling at the token level and apply a new joint model for detecting argumentation structures. The proposed model globally optimizes argument component types and argumentative relations using integer linear programming. We show that our model considerably improves the performance of base classifiers and significantly outperforms challenging heuristic baselines. Moreover, we introduce a novel corpus of persuasive essays annotated with argumentation structures. We show that our annotation scheme and annotation guidelines successfully guide human annotators to substantial agreement. This corpus and the annotation guidelines are freely available for ensuring reproducibility and to encourage future research in computational argumentation.Comment: Under review in Computational Linguistics. First submission: 26 October 2015. Revised submission: 15 July 201

    Joint Dropout: Improving Generalizability in Low-Resource Neural Machine Translation through Phrase Pair Variables

    Full text link
    Despite the tremendous success of Neural Machine Translation (NMT), its performance on low-resource language pairs still remains subpar, partly due to the limited ability to handle previously unseen inputs, i.e., generalization. In this paper, we propose a method called Joint Dropout, that addresses the challenge of low-resource neural machine translation by substituting phrases with variables, resulting in significant enhancement of compositionality, which is a key aspect of generalization. We observe a substantial improvement in translation quality for language pairs with minimal resources, as seen in BLEU and Direct Assessment scores. Furthermore, we conduct an error analysis, and find Joint Dropout to also enhance generalizability of low-resource NMT in terms of robustness and adaptability across different domainsComment: Accepted at MT Summit 202

    Edition 1.1 of the PARSEME shared task on automatic identification of verbal multiword expressions

    Get PDF
    This paper describes the PARSEME Shared Task 1.1 on automatic identification of verbal multiword expressions. We present the annotation methodology, focusing on changes from last year’s shared task. Novel aspects include enhanced annotation guidelines, additional annotated data for most languages, corpora for some new languages, and new evaluation settings. Corpora were created for 20 languages, which are also briefly discussed. We report organizational principles behind the shared task and the evaluation metrics employed for ranking. The 17 participating systems, their methods and obtained results are also presented and analysed

    Cross-Platform Text Mining and Natural Language Processing Interoperability - Proceedings of the LREC2016 conference

    Get PDF
    No abstract available

    Cross-Platform Text Mining and Natural Language Processing Interoperability - Proceedings of the LREC2016 conference

    Get PDF
    No abstract available
    corecore