Search CORE

14 research outputs found

UD Annotatrix: An Annotation Tool For Universal Dependencies

Author: Sheyanova M.
Tyers F. M.
Washington Jonathan North
Publication venue: 'Transformative Works and Cultures'
Publication date: 01/01/2017
Field of study

In this paper we introduce the UD Annotatrix annotation tool for manual annotation of Universal Dependencies. This tool has been designed with the aim that it should be tailored to the needs of the Universal Dependencies (UD) community, including that it should operate in fully-offline mode, and is freely-available under the GNU GPL licence. In this paper, we provide some background to the tool, an overview of its development, and background on how it works. We compare it with some other widely-used tools which are used for Universal Dependencies annotation, describe some features unique to UD Annotatrix, and finally outline some avenues for future work and provide a few concluding remarks

Works

UralicNLP: An NLP Library for Uralic Languages

Author: Hämäläinen Mika
Publication venue
Publication date: 01/01/2019
Field of study

Peer reviewe

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Helsingin yliopiston digitaalinen arkisto

A Constraint Grammar POS-Tagger for Tibetan

Author: Garrett Edward
Hill Nathan W.
Publication venue: Institute of the Lithuanian Language
Publication date: 01/01/2015
Field of study

This paper describes a rule-based part-of speech tagger for Tibetan, implemented in Constraint Grammar and with rules operating over sequences of syllables rather than words

CiteSeerX

SOAS Research Online

Rule-Based Machine Translation for the Italian–Sardinian Language Pair

Author: Alòs i Font Hèctor
Fronteddu Gianfranco
Martín-Mor Adrià
Tyers Francis M.
Publication venue
Publication date: 01/06/2017
Field of study

AbstractThis paper describes the process of creation of the first machine translation system from Italian to Sardinian, a Romance language spoken on the island of Sardinia in the Mediterranean. The project was carried out by a team of translators and computational linguists. The article focuses on the technology used (Rule-Based Machine Translation) and on some of the rules created, as well as on the orthographic model used for Sardinian

Directory of Open Access Journals

Open Access Repository

Suoidne-varra-bleahkka-mála-bihkka-senet-dielku 'hay-blood-ink-paint-tar-mustard-stain' -Should compounds be lexicalized in NLP?

Author: Argese Chiara
Pirinen Tommi
Trosterud Trond
Wiechetek Linda
Publication venue: CEUR-WS
Publication date: 11/12/2020
Field of study

Source at http://ceur-ws.org/Vol-2769/paper_49.pdf. CEUR Workshop Proceedings home page at http://ceur-ws.org/Vol-2769/.Lexicalizing compounds, in addition to treating them dynamically, is a key element in giving us idiomatic translations and detecting compound errors. We present and evaluate an e-dictionary (NDS) and a grammar checker (GramDivvun) for North Sámi. We achieve a coverage of 98% for NDSqueries and of 96% for compound error detection in GramDivvun

Munin - Open Research Archive

Neural Morphology Dataset and Models for Multiple Languages, from the Large to the Endangered

Author: Alnajjar Khalid
Hämäläinen Mika
Partanen Niko
Rueter Jack
Publication venue: 'Linkoping University Electronic Press'
Publication date: 01/05/2021
Field of study

Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Samisk språkteknologi i 2021

Author: Nørstebø Moshagen Sjur
Publication venue: Sprognævnene i Norden
Publication date: 18/08/2022
Field of study

Artikkelen gjev eit oversyn over samisk språkteknologi i 2021, i lag med eit stutt samandrag av historia, og går deretter inn på nokre av utfordringane i arbeidet med samisk språkteknologi og språkteknologi for minoritetar meir allment. Utfordringane blir tydeleggjorde med handfaste døme. Artikkelen blir avslutta med å peika på nokre vegar framover for å rydda dei viktigaste hindera av vegen

Tidsskrift.dk (Det Kongelige Bibliotek)

Towards balance and boundaries in public discourse : expressing and perceiving online hate speech (XPEROHS)

Author: Baumgarten N.
Bick E.
Geyer K.
Iversen D.A.
Kleene A.
Lindø A.V.
Neitsch J.
Niebuhr O.
Nielsen R.
Petersen E.N.
Publication venue: University of Southern Denmark
Publication date: 01/12/2019
Field of study

This study presents an overview and preliminary findings from the XPEROHS-project on hate speech in online contexts. The data is extracted from large-scale Facebook and Twitter corpora, while comparing linguistic instantiations of hate speech in the Danish and German languages. Findings are based on four sub-projects involving the semantics and pragmatics of denigration, the covert dynamics of hate speech, perceptions of spoken and written hate speech, and rhetorical hate speech strategies employed in online interaction. The results demonstrate both overt and covert hate speech towards minority groups, especially Muslims, that are symptomatic of larger societal othering processes and stigmatization

White Rose Research Online

Recent advances in Apertium, a free/open-source rule-based machine translation platform for low-resource languages

Author: Alos i Font Héctor
Bayatlı Sevilay
Khanna Tanmai
Pirinen Flammie
Swanson Daniel
Tang Irene
Tyers Francis Morton
Washington Jonathan North
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

This paper presents an overview of Apertium, a free and open-source rule-based machine translation platform. Translation in Apertium happens through a pipeline of modular tools, and the platform continues to be improved as more language pairs are added. Several advances have been implemented since the last publication, including some new optional modules: a module that allows rules to process recursive structures at the structural transfer stage, a module that deals with contiguous and discontiguous multi-word expressions, and a module that resolves anaphora to aid translation. Also highlighted is the hybridisation of Apertium through statistical modules that augment the pipeline, and statistical methods that augment existing modules. This includes morphological disambiguation, weighted structural transfer, and lexical selection modules that learn from limited data. The paper also discusses how a platform like Apertium can be a critical part of access to language technology for so-called low-resource languages, which might be ignored or deemed unapproachable by popular corpus-based translation technologies. Finally, the paper presents some of the released and unreleased language pairs, concluding with a brief look at some supplementary Apertium tools that prove valuable to users as well as language developers. All Apertium-related code, including language data, is free/open-source and available at https://github.com/apertium

Munin - Open Research Archive

NORA - Norwegian Open Research Archives

Documentação de línguas ameaçadas na era digital

Author: Alnajjar Khalid
Hämäläinen Mika
Rueter Jack
Publication venue
Publication date: 01/01/2021
Field of study

Peer reviewe

Cadernos Espinosanos (E-Journal)

Helsingin yliopiston digitaalinen arkisto