Search CORE

173 research outputs found

AFRILEX 2002: 7th international conference of the African Association for Lexicography: Culture and dictionaries: programme and abstracts

Author: de Schryver Gilles-Maurice
Publication venue: (SF)2 Press
Publication date: 01/01/2002
Field of study

Ghent University Academic Bibliography

Wh-copying, phases, and successive cyclicity

Author: Felser Claudia
Publication venue: Essex Research Reports in Linguistics
Publication date: 01/01/2001
Field of study

University of Essex Research Repository

Workshop on Extracting and Using Constructions in Computational Linguistics

Author: Knutsson Ola
Sahlgren Magnus
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2010
Field of study

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

Recent advances in Apertium, a free/open-source rule-based machine translation platform for low-resource languages

Author: Alos i Font Héctor
Bayatlı Sevilay
Khanna Tanmai
Pirinen Flammie
Swanson Daniel
Tang Irene
Tyers Francis Morton
Washington Jonathan North
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

This paper presents an overview of Apertium, a free and open-source rule-based machine translation platform. Translation in Apertium happens through a pipeline of modular tools, and the platform continues to be improved as more language pairs are added. Several advances have been implemented since the last publication, including some new optional modules: a module that allows rules to process recursive structures at the structural transfer stage, a module that deals with contiguous and discontiguous multi-word expressions, and a module that resolves anaphora to aid translation. Also highlighted is the hybridisation of Apertium through statistical modules that augment the pipeline, and statistical methods that augment existing modules. This includes morphological disambiguation, weighted structural transfer, and lexical selection modules that learn from limited data. The paper also discusses how a platform like Apertium can be a critical part of access to language technology for so-called low-resource languages, which might be ignored or deemed unapproachable by popular corpus-based translation technologies. Finally, the paper presents some of the released and unreleased language pairs, concluding with a brief look at some supplementary Apertium tools that prove valuable to users as well as language developers. All Apertium-related code, including language data, is free/open-source and available at https://github.com/apertium

Munin - Open Research Archive

NORA - Norwegian Open Research Archives

AfLaT 2010: proceedings of the second workshop on African language technology (AfLaT 2010)

Author: De Pauw Guy
de Schryver Gilles-Maurice
Groenewald Handré
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2010
Field of study

Ghent University Academic Bibliography

Deep learnability: using neural networks to quantify language similarity and learnability

Author: Cohen Clara
Higham Catherine F.
Nabi Syed W.
Publication venue: 'Frontiers Media SA'
Publication date: 24/06/2020
Field of study

Learning a second language (L2) usually progresses faster if a learner's L2 is similar to their first language (L1). Yet global similarity between languages is difficult to quantify, obscuring its precise effect on learnability. Further, the combinatorial explosion of possible L1 and L2 language pairs, combined with the difficulty of controlling for idiosyncratic differences across language pairs and language learners, limits the generalisability of the experimental approach. In this study, we present a different approach, employing artificial languages and artificial learners. We built a set of five artificial languages whose underlying grammars and vocabulary were manipulated to ensure a known degree of similarity between each pair of languages. We next built a series of neural network models for each language, and sequentially trained them on pairs of languages. These models thus represented L1 speakers learning L2s. By observing the change in activity of the cells between the L1-speaker model and the L2-learner model, we estimated how much change was needed for the model to learn the new language. We then compared the change for each L1/L2 bilingual model to the underlying similarity across each language pair. The results showed that this approach can not only recover the facilitative effect of similarity on L2 acquisition, but can also offer new insights into the differential effects across different domains of similarity. These findings serve as a proof of concept for a generalisable approach that can be applied to natural languages

Enlighten

Diachronic proximity vs. data sparsity in cross-lingual parser projection: a case study on Germanic

Author: Chiarcos Christian
Sukhareva Maria
Publication venue
Publication date: 01/01/2014
Field of study

For the study of historical language varieties, the sparsity of training data imposes immense prob-lems on syntactic annotation and the development of NLP tools that automatize the process. In this paper, we explore strategies to compensate the lack of training data by including data from related varieties in a series of annotation projection experiments from English to four old Ger-manic languages: On dependency syntax projected from English to one or multiple language(s), we train a fragment-aware parser trained and apply it to the target language. For parser training, we consider small datasets from the target language as a baseline, and compare it with models trained on larger datasets from multiple varieties with different degrees of relatedness, thereby balancing sparsity and diachronic proximity. Our experiments show (a) that including related language data to training data in the target language can improve parsing performance, (b) that a parser trained on data from two related languages (and none from the target language) can reach a performance that is statistically not significantly worse than that of a parse

OPUS Augsburg

CiteSeerX

Crossref