14 research outputs found

    UD Annotatrix: An Annotation Tool For Universal Dependencies

    Get PDF
    In this paper we introduce the UD Annotatrix annotation tool for manual annotation of Universal Dependencies. This tool has been designed with the aim that it should be tailored to the needs of the Universal Dependencies (UD) community, including that it should operate in fully-offline mode, and is freely-available under the GNU GPL licence. In this paper, we provide some background to the tool, an overview of its development, and background on how it works. We compare it with some other widely-used tools which are used for Universal Dependencies annotation, describe some features unique to UD Annotatrix, and finally outline some avenues for future work and provide a few concluding remarks

    A Constraint Grammar POS-Tagger for Tibetan

    Get PDF
    This paper describes a rule-based part-of speech tagger for Tibetan, implemented in Constraint Grammar and with rules operating over sequences of syllables rather than words

    Rule-Based Machine Translation for the Italian–Sardinian Language Pair

    Get PDF
    AbstractThis paper describes the process of creation of the first machine translation system from Italian to Sardinian, a Romance language spoken on the island of Sardinia in the Mediterranean. The project was carried out by a team of translators and computational linguists. The article focuses on the technology used (Rule-Based Machine Translation) and on some of the rules created, as well as on the orthographic model used for Sardinian

    Suoidne-varra-bleahkka-mála-bihkka-senet-dielku 'hay-blood-ink-paint-tar-mustard-stain' -Should compounds be lexicalized in NLP?

    Get PDF
    Source at http://ceur-ws.org/Vol-2769/paper_49.pdf. CEUR Workshop Proceedings home page at http://ceur-ws.org/Vol-2769/.Lexicalizing compounds, in addition to treating them dynamically, is a key element in giving us idiomatic translations and detecting compound errors. We present and evaluate an e-dictionary (NDS) and a grammar checker (GramDivvun) for North Sámi. We achieve a coverage of 98% for NDSqueries and of 96% for compound error detection in GramDivvun

    Samisk språkteknologi i 2021

    Get PDF
    Artikkelen gjev eit oversyn over samisk språkteknologi i 2021, i lag med eit stutt samandrag av historia, og går deretter inn på nokre av utfordringane i arbeidet med samisk språkteknologi og språkteknologi for minoritetar meir allment. Utfordringane blir tydeleggjorde med handfaste døme. Artikkelen blir avslutta med å peika på nokre vegar framover for å rydda dei viktigaste hindera av vegen

    Towards balance and boundaries in public discourse : expressing and perceiving online hate speech (XPEROHS)

    Get PDF
    This study presents an overview and preliminary findings from the XPEROHS-project on hate speech in online contexts. The data is extracted from large-scale Facebook and Twitter corpora, while comparing linguistic instantiations of hate speech in the Danish and German languages. Findings are based on four sub-projects involving the semantics and pragmatics of denigration, the covert dynamics of hate speech, perceptions of spoken and written hate speech, and rhetorical hate speech strategies employed in online interaction. The results demonstrate both overt and covert hate speech towards minority groups, especially Muslims, that are symptomatic of larger societal othering processes and stigmatization

    Recent advances in Apertium, a free/open-source rule-based machine translation platform for low-resource languages

    Get PDF
    This paper presents an overview of Apertium, a free and open-source rule-based machine translation platform. Translation in Apertium happens through a pipeline of modular tools, and the platform continues to be improved as more language pairs are added. Several advances have been implemented since the last publication, including some new optional modules: a module that allows rules to process recursive structures at the structural transfer stage, a module that deals with contiguous and discontiguous multi-word expressions, and a module that resolves anaphora to aid translation. Also highlighted is the hybridisation of Apertium through statistical modules that augment the pipeline, and statistical methods that augment existing modules. This includes morphological disambiguation, weighted structural transfer, and lexical selection modules that learn from limited data. The paper also discusses how a platform like Apertium can be a critical part of access to language technology for so-called low-resource languages, which might be ignored or deemed unapproachable by popular corpus-based translation technologies. Finally, the paper presents some of the released and unreleased language pairs, concluding with a brief look at some supplementary Apertium tools that prove valuable to users as well as language developers. All Apertium-related code, including language data, is free/open-source and available at https://github.com/apertium
    corecore