Search CORE

108 research outputs found

The only option is open: Why should language technology and resources be free?

Author: Tyers Francis
Publication venue
Publication date: 19/10/2011
Field of study

Proceedings of the NODALIDA 2011 Workshop Visibility and Availability of LT Resources. Editors: Sjur Nørstebø Moshagen and Per Langgård. NEALT Proceedings Series, Vol. 13 (2011), 1–2. © 2011 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/1697

DSpace at Tartu University Library

An Italian to Catalan RBMT system reusing data from existing language pairs

Author: Ginestí-Rosell Mireia
Toral Antonio
Tyers Francis
Publication venue
Publication date: 01/01/2011
Field of study

This paper presents an Italian! Catalan RBMT system automatically built by combining the linguistic data of the existing pairs Spanish–Catalan and Spanish–Italian. A lightweight manual postprocessing is carried out in order to fix inconsistencies in the automatically derived dictionaries and to add very frequent words that are missing according to a corpus analysis. The system is evaluated on the KDE4 corpus and outperforms Google Translate by approximately ten absolute points in terms of both TER and GTM

DCU Online Research Access Service

OmniLingo: Listening- and speaking-based language learning

Author: Howell Nicholas
Tyers Francis M.
Publication venue
Publication date: 10/10/2023
Field of study

In this demo paper we present OmniLingo, an architecture for distributing data for listening- and speaking-based language learning applications and a demonstration client built using the architecture. The architecture is based on the Interplanetary Filesystem (IPFS) and puts at the forefront user sovereignty over data

arXiv.org e-Print Archive

Data-Driven Morphological Analysis for Uralic Languages

Author: Silfverberg Miikka
Tyers Francis M.
Publication venue: The Association for Computational Linguistics
Publication date: 01/01/2019
Field of study

This paper describes an initial set of experiments in data-driven morpholog-ical analysis of Uralic languages. The paper differs from previous work in thatour work covers both lemmatization and generating ambiguous analyses. Whilehand-crafted finite-state transducers represent the state of the art in morpholog-ical analysis for most Uralic languages, we believe that there is a place for data-driven approaches, especially with respect to making up for lack of completenessin the шlexicon. We present results for nine Uralic languages that show that, atleast for basic nominal morphology for six out of the nine languages, data-drivenmethods can achieve an F-score of over 90%, providing results that approach thoseof finite-state techniques. We also compare our system to an earlier approach toFinnish data-driven morphological analysis (Silfverberg and Hulden,2018) andshow that our system outperforms this baseline.Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto

Towards an open-source universal-dependency treebank for Erzya

Author: Rueter Jack Michael
Tyers Francis M.
Publication venue
Publication date: 01/01/2018
Field of study

This article describes the first steps towards a open-source dependency tree- bank for Erzya based on universal dependency (UD) annotation standards. The treebank contains 610 sentences with 6661 tokens and is based on texts from a range of open-source and public domain original Erzya sources. This ensures its free availability and extensibility. Texts in the treebank are first morphologically analyzed and disambiguated after which they are annotated manually for depen- dency structure. In the article we present some issues in dependency syntax for Erzya and how they are analyzed in the universal-dependency framework. Pre- liminary statistics are given for dependency parsing of Erzya, along with points of interest for future research.Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto

Delineating Turkic non-finite verb forms by syntactic function

Author: Tyers Francis Morton
Washington Jonathan North
Publication venue: 'Linguistic Society of America'
Publication date: 07/10/2019
Field of study

In this paper, we argue against the primary categories of non-finite verb used in the Turkology literature: “participle” (причастие ‹pričastije›) and “converb” (деепричастие ‹dejepričastije›). We argue that both of these terms conflate several discrete phenomena, and that they furthermore are not coherent as umbrella terms for these phenomena. Based on detailed study of the non-finite verb morphology and syntax of a wide range of Turkic languages (presented here are Turkish, Kazakh, Kyrgyz, Tatar, Tuvan, and Sakha), we instead propose delineation of these categories according to their morphological and syntactic properties. Specifically, we propose that more accurate categories are verbal noun, verbal adjective, verbal adverb, and infinitive. This approach has far-reaching implications to the study of syntactic phenomena in Turkic languages, including phenomena ranging from relative clauses to clause chaining

Proceedings Published by the LSA (Linguistic Society of America)

A morphological analyser for Maltese

Author: Gatt Albert
Ravishankar Vinit
Tyers Francis M.
Publication venue: Elsevier B.V.
Publication date: 01/01/2017
Field of study

This article describes the development of a free/open-source morphological description of Maltese, originally created as the analysis component in a rule-based machine translation system for Maltese to Arabic and later applied to other tasks. The lexicon formalism we use is lttoolbox, part of the Apertium machine translation platform. An evaluation of the analyser shows that the coverage is adequate, at 84.90%, while precision is 92.5% on a large automatically annotated test set and 96.2% on a smaller hand-validated set.peer-reviewe

OAR@UM

A Free/Open-Source Morphological Analyser and Generator for Sakha

Author: Ivanova Sardana
Tyers Francis M.
Washington Jonathan
Publication venue: European Languages Resources Association (ELRA)
Publication date: 01/06/2022
Field of study

We present, to our knowledge, the first ever published morphological analyser and generator for Sakha, a marginalised language of Siberia. The transducer, developed using HFST, has coverage of solidly above 90%, and high precision. In the development of the analyser, we have expanded linguistic knowledge about Sakha, and developed strategies for complex grammatical patterns. The transducer is already being used in downstream tasks, including computer assisted language learning applications for linguistic maintenance and computational linguistic shared tasks.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Keyword spotting for audiovisual archival search in Uralic languages

Author: Hjortnæs Nils
Partanen Niko
Tyers Francis M.
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2021
Field of study

Publisher Copyright: © 2021 IWCLUL 2021 - 7th International Workshop on Computational Linguistics of Uralic Languages, Proceedings. All rights reserved.In this study we investigate the potential of using Automatic Speech Recognition (ASR) for keyword spotting for four Uralic languages: Finnish, Hungarian, Estonian and Komi. These languages also represent different levels on the high and low resource continuum. Although the accuracy of the ASR systems show there is a long way to go, we show that they still have potential to be useful for downstream tasks such as keyword spotting. By using a simple text search after running ASR, we are already able to achieve an F1 score of between 0.15 and 0.33, a precision of nearly 0.90 for Estonian and Hungarian, and a precision of 0.76 for Komi.Peer reviewe

Helsingin yliopiston digitaalinen arkisto