90,764 research outputs found

    Error-tolerant Finite State Recognition with Applications to Morphological Analysis and Spelling Correction

    Get PDF
    Error-tolerant recognition enables the recognition of strings that deviate mildly from any string in the regular set recognized by the underlying finite state recognizer. Such recognition has applications in error-tolerant morphological processing, spelling correction, and approximate string matching in information retrieval. After a description of the concepts and algorithms involved, we give examples from two applications: In the context of morphological analysis, error-tolerant recognition allows misspelled input word forms to be corrected, and morphologically analyzed concurrently. We present an application of this to error-tolerant analysis of agglutinative morphology of Turkish words. The algorithm can be applied to morphological analysis of any language whose morphology is fully captured by a single (and possibly very large) finite state transducer, regardless of the word formation processes and morphographemic phenomena involved. In the context of spelling correction, error-tolerant recognition can be used to enumerate correct candidate forms from a given misspelled string within a certain edit distance. Again, it can be applied to any language with a word list comprising all inflected forms, or whose morphology is fully described by a finite state transducer. We present experimental results for spelling correction for a number of languages. These results indicate that such recognition works very efficiently for candidate generation in spelling correction for many European languages such as English, Dutch, French, German, Italian (and others) with very large word lists of root and inflected forms (some containing well over 200,000 forms), generating all candidate solutions within 10 to 45 milliseconds (with edit distance 1) on a SparcStation 10/41. For spelling correction in Turkish, error-tolerantComment: Replaces 9504031. gzipped, uuencoded postscript file. To appear in Computational Linguistics Volume 22 No:1, 1996, Also available as ftp://ftp.cs.bilkent.edu.tr/pub/ko/clpaper9512.ps.

    Computerization of African languages-French dictionaries

    Get PDF
    This paper relates work done during the DiLAF project. It consists in converting 5 bilingual African language-French dictionaries originally in Word format into XML following the LMF model. The languages processed are Bambara, Hausa, Kanuri, Tamajaq and Songhai-zarma, still considered as under-resourced languages concerning Natural Language Processing tools. Once converted, the dictionaries are available online on the Jibiki platform for lookup and modification. The DiLAF project is first presented. A description of each dictionary follows. Then, the conversion methodology from .doc format to XML files is presented. A specific point on the usage of Unicode follows. Then, each step of the conversion into XML and LMF is detailed. The last part presents the Jibiki lexical resources management platform used for the project.Comment: 8 page

    Regular Cost Functions, Part I: Logic and Algebra over Words

    Full text link
    The theory of regular cost functions is a quantitative extension to the classical notion of regularity. A cost function associates to each input a non-negative integer value (or infinity), as opposed to languages which only associate to each input the two values "inside" and "outside". This theory is a continuation of the works on distance automata and similar models. These models of automata have been successfully used for solving the star-height problem, the finite power property, the finite substitution problem, the relative inclusion star-height problem and the boundedness problem for monadic-second order logic over words. Our notion of regularity can be -- as in the classical theory of regular languages -- equivalently defined in terms of automata, expressions, algebraic recognisability, and by a variant of the monadic second-order logic. These equivalences are strict extensions of the corresponding classical results. The present paper introduces the cost monadic logic, the quantitative extension to the notion of monadic second-order logic we use, and show that some problems of existence of bounds are decidable for this logic. This is achieved by introducing the corresponding algebraic formalism: stabilisation monoids.Comment: 47 page
    corecore