355 research outputs found

    Universal Dependencies and Morphology for Hungarian - and on the Price of Universality

    Get PDF
    In this paper, we present how the principles of universal dependencies and morphology have been adapted to Hungarian. We report the most challenging grammatical phenomena and our solutions to those. On the basis of the adapted guidelines, we have converted and manually corrected 1,800 sentences from the Szeged Treebank to universal dependency format. We also introduce experiments on this manually annotated corpus for evaluating automatic conversion and the added value of language-specific, i.e. non-universal, annotations. Our results reveal that converting to universal dependencies is not necessarily trivial, moreover, using language-specific morphological features may have an impact on overall performance

    Universal dependencies for Irish

    Get PDF
    Les ressources linguistiques permettant aux Ă©tudes cross-langues de se dĂ©velopper sont trĂšs importantes pour les langues minoritaires telles que l’irlandais, car elles favorisent le partage des ressources pour palier au problĂšme du manque de donnĂ©es. Le projet «Universal Dependencies » (UD) a pour but de faciliter les Ă©tudes cross-langues des arbres syntaxiques, des structures linguistiques et de l’analyse syntaxique. L’objectif principal de ce projet est de former un ensemble harmonieux d’arbres syntaxiques en utilisant un schĂ©ma d’annotations universelles. Dans cet article, nous prĂ©sentons la transformation de l’arbre de dĂ©pendance syntaxique irlandais (IDT) (Lynn, 2016) au schĂ©ma d’annotations universelles du projet UD, suivie d’une description claire des changements structurels nĂ©cessaires Ă  cette conversion. Le nouvel arbre est ainsi appelĂ© « Irish Universal Dependency Treebank » ( IUDT ). Language resources that enable cross-lingual studies have become increasingly valuable for lesserresourced languages such as Irish, as they allow for easier sharing of resources, thus overcoming the problem of data scarcity. The Universal Dependencies (UD) Project1 is an initiative aimed at cross-lingual studies of treebanks, linguistic structures and parsing. Its goal is to create a set of multilingual harmonised treebanks that are designed according to a universal annotation scheme. In this paper, we report on the conversion of the Irish Dependency Treebank (IDT) (Lynn, 2016) to a UD version of the treebank which we term the Irish Universal Dependency Treebank (IUDT). We report on the mapping of the IDT labelling scheme to the UD scheme, along with a clear description of the structural changes required in this conversion

    Towards an open-source universal-dependency treebank for Erzya

    Get PDF
    This article describes the first steps towards a open-source dependency tree- bank for Erzya based on universal dependency (UD) annotation standards. The treebank contains 610 sentences with 6661 tokens and is based on texts from a range of open-source and public domain original Erzya sources. This ensures its free availability and extensibility. Texts in the treebank are first morphologically analyzed and disambiguated after which they are annotated manually for depen- dency structure. In the article we present some issues in dependency syntax for Erzya and how they are analyzed in the universal-dependency framework. Pre- liminary statistics are given for dependency parsing of Erzya, along with points of interest for future research.Peer reviewe

    On Internal Merge

    Get PDF

    Modeling information structure in a cross-linguistic perspective

    Get PDF
    This study makes substantial contributions to both the theoretical and computational treatment of information structure, with a specific focus on creating natural language processing applications such as multilingual machine translation systems. The present study first provides cross-linguistic findings in regards to information structure meanings and markings. Building upon such findings, the current model represents information structure within the HPSG/MRS framework using Individual Constraints. The primary goal of the present study is to create a multilingual grammar model of information structure for the LinGO Grammar Matrix system. The present study explores the construction of a grammar library for creating customized grammar incorporating information structure and illustrates how the information structure-based model improves performance of transfer-based machine translation

    Formal Basis of a Language Universal

    Get PDF

    Natural Language Processing Resources for Finnish. Corpus Development in the General and Clinical Domains

    Get PDF
    Siirretty Doriast
    • 

    corecore