Search CORE

27 research outputs found

HFST—Framework for Compiling and Applying Morphologies

Author: A. Savary
A.V. Aho
C. Allauzen
H. Schmid
J.A. Brzozowski
K. Oflazer
K.R. Beesley
K.R. Beesley
L. Karttunen
M. Huldén
M. Silfverberg
Publication venue: Springer
Publication date: 01/01/2011
Field of study

HFST–Helsinki Finite-State Technology ( hfst.sf.net ) is a framework for compiling and applying linguistic descriptions with finite-state methods. HFST currently connects some of the most important finite-state tools for creating morphologies and spellers into one open-source platform and supports extending and improving the descriptions with weights to accommodate the modeling of statistical information. HFST offers a path from language descriptions to efficient language applications in key environments and operating systems. HFST also provides an opportunity to exchange transducers between different software providers in order to get the best out of each finite-state library.Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto

Implementation of replace rules using preference operator

Author: Drobac Senka
Silfverberg Miikka
Yli-Jyrä Anssi Mikael
Publication venue: The Association for Computational Linguistics
Publication date: 23/07/2012
Field of study

We explain the implementation of replace rules with the .r-glc. operator and preference relations. Our modular approach combines various preference constraints to form different replace rules. In addition to describing the method, we present illustrative examples.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

HFST Training Environment and Recent Additions

Author: Axelson Erik
Hardwick Sam
Linden Krister
Publication venue: Northern European Association for Language Technology
Publication date: 01/04/2023
Field of study

HFST - the Helsinki Finite-State Technology toolkit was launched in 2009 (Lindén & al, 2009) and has since been used for developing a number of rule-based morphologies for processing natural language. To promote the uptake of the toolkit a training environment for linguists to learn how to use HFST has been designed in Jupyter. This paper presents an overview of the training environment and some of the recent features that have been added to HFST to keep the run-time size of the transducer reasonably small despite exceptions and negative constraints that need to be added during practical FST development.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

FIN-CLARIN – en humanistisk forskningsinfrastruktur med betoning på språk

Author: Lindén Krister
Publication venue: Sprognævnene i Norden
Publication date: 01/01/2015
Field of study

Miljardvis med ord och tusentals timmar med audio och video behövs som material för humanistisk forskning och i synnerhet språkforskning. Dessutom behöver forskarna redskap för att förädla och jämföra sina egna datasamlingar med allmänna datasamlingar. När ett forskningsprojekt är slut behövs det lagrings- och spridningsplatser för att göra rådata, redskap och forskningsresultat tillgängliga och användbara. Data, redskap och gemensamma användningsmöjligheter bildar tillsammans en forskningsinfrastruktur, som gör det möjligt att verifiera tidigare resultat och effektivare göra nya rön, när alla inte behöver starta från noll med att samla data och bygga analysredskap

Tidsskrift.dk (Det Kongelige Bibliotek)

A Humor új Fo(r)mája

Author: Novák Attila
Publication venue
Publication date: 01/01/2014
Field of study

A MorphoLogic Humor morfológiai elemzjéhez az utóbbi évtizedekben számos nyelven készült morfológiai adatbázis. Ezek közül némelyik igen jó lefedettséget és pontosságot ad, mások olyan nyelvekre biztosítják az automatikus morfológiai elemzés lehetségét, amelyekre más hasonló erforrás nem létezik. A Humor elemzszoftver zárt licence azonban nem tette lehetvé ezeknek a nyelvi erforrásoknak a szabad terjesztését. Ugyanakkor a Humor elemz implementációja nem teszi lehetvé az ismeretlen szavak elemzését (morphological guessing), valamint azt sem, hogy az egyes szavakhoz gyakorisági információt rendeljünk, vagy a modellt másképp súlyozzuk. Ezeket a problémákat úgy oldottuk meg, hogy a Humor morfológiai erforrásait olyan véges állapotú leírássá konvertáltuk, amely mindezeket a problémákat megoldja és rendelkezik nyílt forráskódú implementációval is

University of Szeged

Predictive Text Entry for Agglutinative Languages Using Unsupervised Morphological Segmentation

Author: Hyvärinen Mirka
Linden Krister
Silfverberg Miikka
Publication venue
Publication date: 09/03/2012
Field of study

Host publication title: Computational Linguistics and Intelligent Text Processing Host publication sub-title: 13th International Conference, CICLing 2012Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Machine Translation for Crimean Tatar to Turkish

Author: Gökırmak M.
Tyers F. M.
Washington Jonathan North
Publication venue: 'Transformative Works and Cultures'
Publication date: 01/08/2019
Field of study

In this paper a machine translation system for Crimean Tatar to Turkish is presented. To our knowledge this is the first Machine Translation system made available for public use for Crimean Tatar, and the first such system released as free and open source software. The system was built using Apertium, a free and open source machine translation system, and is currently unidirectional from Crimean Tatar to Turkish. We describe our translation system, evaluate it on parallel corpora and compare its performance with a Neural Machine Translation system, trained on the limited amount of corpora available

Works

FIN-CLARIN - a humanities research infrastructure with emphasis on language

Author: Linden Bo Krister Johan
Publication venue
Publication date: 01/07/2014
Field of study

Helsingin yliopiston digitaalinen arkisto

BabyFST : Towards a Finite-State Based Computational Model of Ancient Babylonian

Author: Arppe Antti
Linden Krister
Sahala Aleksi
Silfverberg Miikka
Publication venue: European Language Resources Association (ELRA)
Publication date: 17/05/2020
Field of study

Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Finite-state Relations Between Two Historically Closely Related Languages

Author: Koskenniemi Kimmo
Publication venue: Northern European Association for Language Technology
Publication date: 01/01/2013
Field of study

Regular correspondences between historically related languages can be modelled using finite-state transducers (FST). A new method is presented by demonstrating it with a bidirectional experiment between Finnish and Estonian. An artificial representation (resembling a proto-language) is established between two related languages. This representation, AFE (Aligned Finnish-Estonian) is based on the letter by letter alignment of the two languages and uses mechanically constructed morphophonemes which represent the corresponding characters. By describing the constraints of this AFE using two-level rules, one may construct useful mappings between the languages. In this way, the badly ambiguous FSTs from Finnish and Estonian to AFE can be composed into a practically unambiguous transducer from Finnish to Estonian. The inverse mapping from Estonian to Finnish is mildly ambiguous. Steps according to the proposed method could be repeated as such with dialectal or older written texts. Choosing a set of model words, aligning them, recording the mechanical correspondences and designing rules for the constraints could be done with a limited effort. For the purposes of indexing and searching, the mild ambiguity may be tolerable as such. The ambiguity can be further reduced by composing the resulting FST with a speller or morphological analyser of the standard language.Peer reviewe

Helsingin yliopiston digitaalinen arkisto