Search CORE

40 research outputs found

HFST tool for morphology : An efficient open-source package for construction of morphological analyzers

Author: Linden Krister
Pirinen Tommi
Silfverberg Miikka
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Peer reviewe

Helsingin yliopiston digitaalinen arkisto

HFST—Framework for Compiling and Applying Morphologies

Author: A. Savary
A.V. Aho
C. Allauzen
H. Schmid
J.A. Brzozowski
K. Oflazer
K.R. Beesley
K.R. Beesley
L. Karttunen
M. Huldén
M. Silfverberg
Publication venue: Springer
Publication date: 01/01/2011
Field of study

HFST–Helsinki Finite-State Technology ( hfst.sf.net ) is a framework for compiling and applying linguistic descriptions with finite-state methods. HFST currently connects some of the most important finite-state tools for creating morphologies and spellers into one open-source platform and supports extending and improving the descriptions with weights to accommodate the modeling of statistical information. HFST offers a path from language descriptions to efficient language applications in key environments and operating systems. HFST also provides an opportunity to exchange transducers between different software providers in order to get the best out of each finite-state library.Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto

Open-Source Morphology for Endangered Mordvinic Languages

Author: Hämäläinen Mika
Partanen Niko
Rueter Jack
Publication venue: The Association for Computational Linguistics
Publication date: 01/01/2020
Field of study

Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto

HFST Training Environment and Recent Additions

Author: Axelson Erik
Hardwick Sam
Linden Krister
Publication venue: Northern European Association for Language Technology
Publication date: 01/04/2023
Field of study

HFST - the Helsinki Finite-State Technology toolkit was launched in 2009 (Lindén & al, 2009) and has since been used for developing a number of rule-based morphologies for processing natural language. To promote the uptake of the toolkit a training environment for linguists to learn how to use HFST has been designed in Jupyter. This paper presents an overview of the training environment and some of the recent features that have been added to HFST to keep the run-time size of the transducer reasonably small despite exceptions and negative constraints that need to be added during practical FST development.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Finite-State Spell-Checking with Weighted Language and Error Models : Building and Evaluating Spell-Checkers with Wikipedia as Corpus

Author: Linden Krister
Pirinen Tommi
Publication venue
Publication date: 01/05/2010
Field of study

In this paper we present simple methods for construction and evaluation of finite-state spell-checking tools using an existing finite-state lexical automaton, freely available finite-state tools and Internet corpora acquired from projects such as Wikipedia. As an example, we use a freely available open-source implementation of Finnish morphology, made with traditional finite-state morphology tools, and demonstrate rapid building of Northern Sámi and English spell checkers from tools and resources available from the Internet.Peer reviewe

CiteSeerX

Helsingin yliopiston digitaalinen arkisto

An open, extendible, and fast Turkish morphological analyzer

Author: Avar Begüm
Ercan Gökhan
Yıldız Olcay Taner
Publication venue: 'Assoc. for Computational Linguistics Bulgaria'
Publication date: 01/09/2019
Field of study

In this paper, we present a two-level morphological analyzer for Turkish which consists of five main components: finite state transducer, rule engine for suffixation, lexicon, trie data structure, and LRU cache. We use Java language to implement finite state machine logic and rule engine, Xml language to describe the finite state transducer rules of the Turkish language, which makes the morphological analyzer both easily extendible and easily applicable to other languages. Empowered with a comprehensive lexicon of 54,000 bare-forms including 19,000 proper nouns, our morphological analyzer is amongst the most reliable analyzers produced so far. The analyzer is compared with Turkish morphological analyzers in the literature. By using LRU cache and a trie data structure, the system can analyze 100,000 words per second, which enables users to analyze huge corpora in a few hours.Publisher's Versio

Crossref

Isik University Academic Open Access

Entry Generation for New Words by Analogy for Morphological Lexicons

Author: Linden Krister
Publication venue
Publication date: 01/01/2009
Field of study

Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Building a Finnish SOM-based ontology concept tagger and harvester

Author: Nyrkkö Alpo Seppo Antero
Publication venue: The Association for Computational Linguistics
Publication date: 01/01/2018
Field of study

Kehitän luonnollisessa kielessä ilmenevien sanojen merkitysten eroteluun sopivaa automaatista koneoppivaa työkalua. Laskennallinen malli perustuu itseoppivaan kartaan (SOM, Self-Organizing Map) ja annetuun suomenkieliseen semantisen webin ontologiaan. Malli oppii tunnistamaan käsiteiden ilmenemistä mallitekstistä, johon on annotoitu (tagatu) malliksi aiemmin laaditun ongologian käsiteitä. Koe liityy aiemmin englanninkielisten käsiteiden taggaamiseen liityvään OntoR-koejärjestelyyn joka tutki tekstisyöteessä ilmenevien termien liitämistä SOM-kartan soluihin malliksi annetun annotoidun tekstiesimerkin avulla. Tällainen malli oppii annetun käsitemallin huomatavan niukalla esimerkkiaineistolla ja sopii käytökohteisiin joissa ei ole tarjolla riitävän suurta datamäärää syvän oppimisen neuroverkkomallin opetamiseksi. Suomenkielisen kokeen morfologisen analyysin pohjalla on OMORFI- ja HFST-työkalut. Koneoppimisen toteutava SOM-karta lasketaan SOM-PAK-ohjelmistopaketin avulla. Kehitetyä laskennallista mallia käytetään käsiteiden tunnistamisen lisäksi myös uusien ontologiakäsiteiden ehdokkaiden löytämiseksi

Helsingin yliopiston digitaalinen arkisto

Implementation of replace rules using preference operator

Author: Drobac Senka
Silfverberg Miikka
Yli-Jyrä Anssi Mikael
Publication venue: The Association for Computational Linguistics
Publication date: 23/07/2012
Field of study

We explain the implementation of replace rules with the .r-glc. operator and preference relations. Our modular approach combines various preference constraints to form different replace rules. In addition to describing the method, we present illustrative examples.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

FinnTreeBank: Creating a research resource and service for language researchers with Constraint Grammar

Author: Voutilainen Atro
Publication venue
Publication date: 17/11/2011
Field of study

Proceedings of the NODALIDA 2011 Workshop Constraint Grammar Applications. Editors: Eckhard Bick, Kristin Hagen, Kaili Müürisep, Trond Trosterud. NEALT Proceedings Series, Vol. 14 (2011), 41–49. © 2011 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/19231

DSpace at Tartu University Library