19 research outputs found

    Linguistic Distance between Erzya and Moksha. Dependent Morphology

    Get PDF
    The purpose of this article is to outline morphological facts about the two literary languages Erzya and Moksha, which can be used for estimating the distinctive character of these individual language forms. Whereas earlier morphological evaluations of the linguistic distance between Erzya and Moksha have placed them in the area of 90% cohesion, this one does not. This study evaluates the languages on the basis of non-ambiguity, parallel sets of ambiguity and divergent ambiguity. Non-ambiguity is found in combinatory function to morphological formant alignment, e.g. молян go+V+Ind+Prs+ScSg1. Parallel sets of ambiguity is found in combinatory-function set to morphological formant alignment where both languages share the same sets of ambiguous readings, e.g. саизь v s сявозь take+V+Ind+ScPl3+OcSg3, ScPl3+OcPl3. Divergent ambiguity is found in forms with non- symmetric alignments of combinatory functions, e.g. саинек take+V+Ind+Prt1+ScPl1, +Prt1+ScPl1+OcSg3, +Prt1+ScPl1+OcPl3 vs сявоме take+V+Ind+Prt1+ScPl1, сявоськ take+V+Ind+Prt1+ScPl1+OcSg3, +Prt1+ScPl1+OcPl3. This morphological evaluation will establish the preparatory work in syntactic disambiguation necessary for facilitating Erzya↔Moksha machine translation, whereas machine translation will enhance the usage of mutual language resources. Results show that the Erzya and Moksha languages, in the absence of loan words from the 20 th century, share less than 50% of their vocabularies, 63% of their regular nominal declensions and 48% of their regular finite conjugations.Peer reviewe

    Towards an open-source universal-dependency treebank for Erzya

    Get PDF
    This article describes the first steps towards a open-source dependency tree- bank for Erzya based on universal dependency (UD) annotation standards. The treebank contains 610 sentences with 6661 tokens and is based on texts from a range of open-source and public domain original Erzya sources. This ensures its free availability and extensibility. Texts in the treebank are first morphologically analyzed and disambiguated after which they are annotated manually for depen- dency structure. In the article we present some issues in dependency syntax for Erzya and how they are analyzed in the universal-dependency framework. Pre- liminary statistics are given for dependency parsing of Erzya, along with points of interest for future research.Peer reviewe

    Transforming Archived Resources with Language Technology : From Manuscripts to Language Documentation

    Get PDF
    Publisher Copyright: © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0)Transcriptions in different languages are a ubiquitous data format in linguistics and in many other fields in the humanities. However, the majority of these resources remain both under-used and under-studied. This may be the case even when the materials have been published in print, but is certainly the case for the majority of unpublished transcriptions. Our paper presents a workflow adapted in the research project Language Documentation Meets Language Technology, which combines text recognition, automatic transliteration and forced alignment into a process which allows us to convert earlier transcribed documents to a structure that is comparable with contemporary language documentation corpora. This has complex practical and methodological considerations.Peer reviewe

    Transforming Archived Resources with Language Technology : From Manuscripts to Language Documentation

    Get PDF
    Publisher Copyright: © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0)Transcriptions in different languages are a ubiquitous data format in linguistics and in many other fields in the humanities. However, the majority of these resources remain both under-used and under-studied. This may be the case even when the materials have been published in print, but is certainly the case for the majority of unpublished transcriptions. Our paper presents a workflow adapted in the research project Language Documentation Meets Language Technology, which combines text recognition, automatic transliteration and forced alignment into a process which allows us to convert earlier transcribed documents to a structure that is comparable with contemporary language documentation corpora. This has complex practical and methodological considerations.Peer reviewe

    Dependency parsing of code-switching data with cross-lingual feature representations

    Get PDF
    Partanen N, KyungTae L, Rießler M, Poibeau T. Dependency parsing of code-switching data with cross-lingual feature representations. In: Pirinen TA, Rießler M, Rueter J, Trosterud T, Tyers FM, eds. Proceedings of the 4th International Workshop for Computational Linguistics for Uralic Languages. Helsinki: Association for Computational Linguistics; 2018: 1-17

    Relatório de estágio em farmácia comunitária

    Get PDF
    Relatório de estágio realizado no âmbito do Mestrado Integrado em Ciências Farmacêuticas, apresentado à Faculdade de Farmácia da Universidade de Coimbr

    Transforming Archived Resources with Language Technology : From Manuscripts to Language Documentation

    No full text
    Transcriptions in different languages are a ubiquitous data format in linguistics and in many other fields in the humanities. However, the majority of these resources remain both under-used and under-studied. This may be the case even when the materials have been published in print, but is certainly the case for the majority of unpublished transcriptions. Our paper presents a workflow adapted in the research project Language Documentation Meets Language Technology, which combines text recognition, automatic transliteration and forced alignment into a process which allows us to convert earlier transcribed documents to a structure that is comparable with contemporary language documentation corpora. This has complex practical and methodological considerations

    Universal Dependencies 2.4

    No full text
    Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008)
    corecore