634 research outputs found

    Use of Weighted Finite State Transducers in Part of Speech Tagging

    Full text link
    This paper addresses issues in part of speech disambiguation using finite-state transducers and presents two main contributions to the field. One of them is the use of finite-state machines for part of speech tagging. Linguistic and statistical information is represented in terms of weights on transitions in weighted finite-state transducers. Another contribution is the successful combination of techniques -- linguistic and statistical -- for word disambiguation, compounded with the notion of word classes.Comment: uses psfig, ipamac

    A Finite State and Data-Oriented Method for Grapheme to Phoneme Conversion

    Full text link
    A finite-state method, based on leftmost longest-match replacement, is presented for segmenting words into graphemes, and for converting graphemes into phonemes. A small set of hand-crafted conversion rules for Dutch achieves a phoneme accuracy of over 93%. The accuracy of the system is further improved by using transformation-based learning. The phoneme accuracy of the best system (using a large set of rule templates and a `lazy' variant of Brill's algoritm), trained on only 40K words, reaches 99% accuracy.Comment: 8 page

    Part-of-Speech Tagging using Parallel Weighted Finite-State Transducers

    Get PDF
    We use parallel weighted finite-state transducers to implement a part-of-speech tagger, which obtains state-of-the-art accuracy when used to tag the Europarl corpora for Finnish, Swedish and English. Our system consists of a weighted lexicon and a guesser combined with a bigram model factored into two weighted transducers. We use both lemmas and tag sequences in the bigram model, which guarantees reliable bigram estimates.Peer reviewe

    Combining Statistical Models for POS Tagging using Finite-State Calculus

    Get PDF
    Peer reviewe

    HFST—Framework for Compiling and Applying Morphologies

    Get PDF
    HFST–Helsinki Finite-State Technology ( hfst.sf.net ) is a framework for compiling and applying linguistic descriptions with finite-state methods. HFST currently connects some of the most important finite-state tools for creating morphologies and spellers into one open-source platform and supports extending and improving the descriptions with weights to accommodate the modeling of statistical information. HFST offers a path from language descriptions to efficient language applications in key environments and operating systems. HFST also provides an opportunity to exchange transducers between different software providers in order to get the best out of each finite-state library.Peer reviewe

    Joint Morphological and Syntactic Disambiguation

    Get PDF
    In morphologically rich languages, should morphological and syntactic disambiguation be treated sequentially or as a single problem? We describe several efficient, probabilistically interpretable ways to apply joint inference to morphological and syntactic disambiguation using lattice parsing. Joint inference is shown to compare favorably to pipeline parsing methods across a variety of component models. State-of-the-art performance on Hebrew Treebank parsing is demonstrated using the new method. The benefits of joint inference are modest with the current component models, but appear to increase as components themselves improve

    Guessers for Finite-State Transducer Lexicons

    Get PDF
    Language software applications encounter new words, e.g., acronyms, technical terminology, names or compounds of such words. In order to add new words to a lexicon, we need to indicate their inflectional paradigm. We present a new generally applicable method for creating an entry generator, i.e. a paradigm guesser, for finite-state transducer lexicons. As a guesser tends to produce numerous suggestions, it is important that the correct suggestions be among the first few candidates. We prove some formal properties of the method and evaluate it on Finnish, English and Swedish full-scale transducer lexicons. We use the open-source Helsinki Finite-State Technology to create finitestate transducer lexicons from existing lexical resources and automatically derive guessers for unknown words. The method has a recall of 82-87 % and a precision of 71-76 % for the three test languages. The model needs no external corpus and can therefore serve as a baseline.Peer reviewe
    corecore