Search CORE

634 research outputs found

Use of Weighted Finite State Transducers in Part of Speech Tagging

Author: Radev Dragomir R.
Tzoukermann Evelyne
Publication venue
Publication date: 01/01/1997
Field of study

This paper addresses issues in part of speech disambiguation using finite-state transducers and presents two main contributions to the field. One of them is the use of finite-state machines for part of speech tagging. Linguistic and statistical information is represented in terms of weights on transitions in weighted finite-state transducers. Another contribution is the successful combination of techniques -- linguistic and statistical -- for word disambiguation, compounded with the notion of word classes.Comment: uses psfig, ipamac

arXiv.org e-Print Archive

CiteSeerX

A Finite State and Data-Oriented Method for Grapheme to Phoneme Conversion

Author: Bouma Gosse
Publication venue
Publication date: 01/01/2000
Field of study

A finite-state method, based on leftmost longest-match replacement, is presented for segmenting words into graphemes, and for converting graphemes into phonemes. A small set of hand-crafted conversion rules for Dutch achieves a phoneme accuracy of over 93%. The accuracy of the system is further improved by using transformation-based learning. The phoneme accuracy of the best system (using a large set of rule templates and a `lazy' variant of Brill's algoritm), trained on only 40K words, reaches 99% accuracy.Comment: 8 page

arXiv.org e-Print Archive

CiteSeerX

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Part-of-Speech Tagging using Parallel Weighted Finite-State Transducers

Author: Linden Krister
Silfverberg Miikka
Publication venue
Publication date: 01/08/2010
Field of study

We use parallel weighted finite-state transducers to implement a part-of-speech tagger, which obtains state-of-the-art accuracy when used to tag the Europarl corpora for Finnish, Swedish and English. Our system consists of a weighted lexicon and a guesser combined with a bigram model factored into two weighted transducers. We use both lemmas and tag sequences in the bigram model, which guarantees reliable bigram estimates.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Combining Statistical Models for POS Tagging using Finite-State Calculus

Author: Linden Krister
Silfverberg Miikka
Publication venue: Northern European Association for Language Technology
Publication date: 09/05/2011
Field of study

Peer reviewe

Helsingin yliopiston digitaalinen arkisto

DSpace at Tartu University Library

HFST—Framework for Compiling and Applying Morphologies

Author: A. Savary
A.V. Aho
C. Allauzen
H. Schmid
J.A. Brzozowski
K. Oflazer
K.R. Beesley
K.R. Beesley
L. Karttunen
M. Huldén
M. Silfverberg
Publication venue: Springer
Publication date: 01/01/2011
Field of study

HFST–Helsinki Finite-State Technology ( hfst.sf.net ) is a framework for compiling and applying linguistic descriptions with finite-state methods. HFST currently connects some of the most important finite-state tools for creating morphologies and spellers into one open-source platform and supports extending and improving the descriptions with weights to accommodate the modeling of statistical information. HFST offers a path from language descriptions to efficient language applications in key environments and operating systems. HFST also provides an opportunity to exchange transducers between different software providers in order to get the best out of each finite-state library.Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto

Joint Morphological and Syntactic Disambiguation

Author: Cohen S. B.
Smith N. A.
Publication venue
Publication date: 01/01/2007
Field of study

In morphologically rich languages, should morphological and syntactic disambiguation be treated sequentially or as a single problem? We describe several efficient, probabilistically interpretable ways to apply joint inference to morphological and syntactic disambiguation using lattice parsing. Joint inference is shown to compare favorably to pipeline parsing methods across a variety of component models. State-of-the-art performance on Hebrew Treebank parsing is demonstrated using the new method. The benefits of joint inference are modest with the current component models, but appear to increase as components themselves improve

CiteSeerX

Edinburgh Research Explorer

Guessers for Finite-State Transducer Lexicons

Author: Lindén Krister
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2009
Field of study

Language software applications encounter new words, e.g., acronyms, technical terminology, names or compounds of such words. In order to add new words to a lexicon, we need to indicate their inflectional paradigm. We present a new generally applicable method for creating an entry generator, i.e. a paradigm guesser, for finite-state transducer lexicons. As a guesser tends to produce numerous suggestions, it is important that the correct suggestions be among the first few candidates. We prove some formal properties of the method and evaluate it on Finnish, English and Swedish full-scale transducer lexicons. We use the open-source Helsinki Finite-State Technology to create finitestate transducer lexicons from existing lexical resources and automatically derive guessers for unknown words. The method has a recall of 82-87 % and a precision of 71-76 % for the three test languages. The model needs no external corpus and can therefore serve as a baseline.Peer reviewe

Helsingin yliopiston digitaalinen arkisto