355 research outputs found
Universal Dependencies and Morphology for Hungarian - and on the Price of Universality
In this paper, we present how the principles of universal dependencies and morphology have been adapted to Hungarian. We report the most challenging grammatical phenomena and our solutions to those. On the basis of the adapted guidelines, we have converted and manually corrected 1,800 sentences from the Szeged Treebank to universal dependency format. We also introduce experiments on this manually annotated corpus for evaluating automatic conversion and the added value of language-specific, i.e. non-universal, annotations. Our results reveal that converting to universal dependencies is not necessarily trivial, moreover, using language-specific morphological features may have an impact on overall performance
Universal dependencies for Irish
Les ressources linguistiques permettant aux Ă©tudes cross-langues de se dĂ©velopper sont trĂšs importantes pour les langues minoritaires telles que lâirlandais, car elles favorisent le partage des ressources
pour palier au problÚme du manque de données. Le projet «Universal Dependencies » (UD) a pour
but de faciliter les Ă©tudes cross-langues des arbres syntaxiques, des structures linguistiques et de
lâanalyse syntaxique. Lâobjectif principal de ce projet est de former un ensemble harmonieux dâarbres
syntaxiques en utilisant un schĂ©ma dâannotations universelles. Dans cet article, nous prĂ©sentons
la transformation de lâarbre de dĂ©pendance syntaxique irlandais (IDT) (Lynn, 2016) au schĂ©ma
dâannotations universelles du projet UD, suivie dâune description claire des changements structurels
nécessaires à cette conversion. Le nouvel arbre est ainsi appelé « Irish Universal Dependency
Treebank » ( IUDT ).
Language resources that enable cross-lingual studies have become increasingly valuable for lesserresourced languages such as Irish, as they allow for easier sharing of resources, thus overcoming
the problem of data scarcity. The Universal Dependencies (UD) Project1
is an initiative aimed at
cross-lingual studies of treebanks, linguistic structures and parsing. Its goal is to create a set of
multilingual harmonised treebanks that are designed according to a universal annotation scheme. In
this paper, we report on the conversion of the Irish Dependency Treebank (IDT) (Lynn, 2016) to a
UD version of the treebank which we term the Irish Universal Dependency Treebank (IUDT). We
report on the mapping of the IDT labelling scheme to the UD scheme, along with a clear description
of the structural changes required in this conversion
Towards an open-source universal-dependency treebank for Erzya
This article describes the first steps towards a open-source dependency tree- bank for Erzya based on universal dependency (UD) annotation standards. The treebank contains 610 sentences with 6661 tokens and is based on texts from a range of open-source and public domain original Erzya sources. This ensures its free availability and extensibility. Texts in the treebank are first morphologically analyzed and disambiguated after which they are annotated manually for depen- dency structure. In the article we present some issues in dependency syntax for Erzya and how they are analyzed in the universal-dependency framework. Pre- liminary statistics are given for dependency parsing of Erzya, along with points of interest for future research.Peer reviewe
Modeling information structure in a cross-linguistic perspective
This study makes substantial contributions to both the theoretical and computational treatment of information structure, with a specific focus on creating natural language processing applications such as multilingual machine translation systems. The present study first provides cross-linguistic findings in regards to information structure meanings and markings. Building upon such findings, the current model represents information structure within the HPSG/MRS framework using Individual Constraints. The primary goal of the present study is to create a multilingual grammar model of information structure for the LinGO Grammar Matrix system. The present study explores the construction of a grammar library for creating customized grammar incorporating information structure and illustrates how the information structure-based model improves performance of transfer-based machine translation
Natural Language Processing Resources for Finnish. Corpus Development in the General and Clinical Domains
Siirretty Doriast
- âŠ