3 research outputs found

    Error-tolerant Finite State Recognition with Applications to Morphological Analysis and Spelling Correction

    Get PDF
    Error-tolerant recognition enables the recognition of strings that deviate mildly from any string in the regular set recognized by the underlying finite state recognizer. Such recognition has applications in error-tolerant morphological processing, spelling correction, and approximate string matching in information retrieval. After a description of the concepts and algorithms involved, we give examples from two applications: In the context of morphological analysis, error-tolerant recognition allows misspelled input word forms to be corrected, and morphologically analyzed concurrently. We present an application of this to error-tolerant analysis of agglutinative morphology of Turkish words. The algorithm can be applied to morphological analysis of any language whose morphology is fully captured by a single (and possibly very large) finite state transducer, regardless of the word formation processes and morphographemic phenomena involved. In the context of spelling correction, error-tolerant recognition can be used to enumerate correct candidate forms from a given misspelled string within a certain edit distance. Again, it can be applied to any language with a word list comprising all inflected forms, or whose morphology is fully described by a finite state transducer. We present experimental results for spelling correction for a number of languages. These results indicate that such recognition works very efficiently for candidate generation in spelling correction for many European languages such as English, Dutch, French, German, Italian (and others) with very large word lists of root and inflected forms (some containing well over 200,000 forms), generating all candidate solutions within 10 to 45 milliseconds (with edit distance 1) on a SparcStation 10/41. For spelling correction in Turkish, error-tolerantComment: Replaces 9504031. gzipped, uuencoded postscript file. To appear in Computational Linguistics Volume 22 No:1, 1996, Also available as ftp://ftp.cs.bilkent.edu.tr/pub/ko/clpaper9512.ps.

    Robust Part of Speech Tagging

    Get PDF
    Generally, NLP tools use well-formed and annotated data to learn patterns by using machine learning techniques. However, in this work we will focus on the language used in an on-line platform for machine translation. In this area it is usual to have a framework such the following: a web-page which offer a service of translation between pairs of languages. The problem is that the casual users utilize the service to translate any type of text (cut and paste, single words, bad formatting, snipets, informal language, pre-traductions, etc.). Hence, in this situation we will find very often words with mistakes that make the system provides a bad translation because it is not able to understand the input.The main goal of our work is, once we have identified the problem of dealing with non-standard-input is to develop a robust PoS tagger from the SVMTagger

    A morphological analysis based method for spelling correction

    No full text
    corecore