Search CORE

22 research outputs found

Error-tolerant Finite State Recognition with Applications to Morphological Analysis and Spelling Correction

Author: Oflazer Kemal
Publication venue
Publication date: 21/07/1995
Field of study

Error-tolerant recognition enables the recognition of strings that deviate mildly from any string in the regular set recognized by the underlying finite state recognizer. Such recognition has applications in error-tolerant morphological processing, spelling correction, and approximate string matching in information retrieval. After a description of the concepts and algorithms involved, we give examples from two applications: In the context of morphological analysis, error-tolerant recognition allows misspelled input word forms to be corrected, and morphologically analyzed concurrently. We present an application of this to error-tolerant analysis of agglutinative morphology of Turkish words. The algorithm can be applied to morphological analysis of any language whose morphology is fully captured by a single (and possibly very large) finite state transducer, regardless of the word formation processes and morphographemic phenomena involved. In the context of spelling correction, error-tolerant recognition can be used to enumerate correct candidate forms from a given misspelled string within a certain edit distance. Again, it can be applied to any language with a word list comprising all inflected forms, or whose morphology is fully described by a finite state transducer. We present experimental results for spelling correction for a number of languages. These results indicate that such recognition works very efficiently for candidate generation in spelling correction for many European languages such as English, Dutch, French, German, Italian (and others) with very large word lists of root and inflected forms (some containing well over 200,000 forms), generating all candidate solutions within 10 to 45 milliseconds (with edit distance 1) on a SparcStation 10/41. For spelling correction in Turkish, error-tolerantComment: Replaces 9504031. gzipped, uuencoded postscript file. To appear in Computational Linguistics Volume 22 No:1, 1996, Also available as ftp://ftp.cs.bilkent.edu.tr/pub/ko/clpaper9512.ps.

arXiv.org e-Print Archive

CiteSeerX

Bilkent University Institutional Repository

Description of the verbal morphology of Asama: A realizational and implemented approach

Author: Lévêque Dimitri
Pellard Thomas
Publication venue: HAL CCSD
Publication date: 25/09/2019
Field of study

International audienc

Toward a widely usable finite-state morphology workbench for less studied languages, 1: Desiderata

Author: Yli-Jyrä Anssi
Publication venue
Publication date: 01/01/2005
Field of study

Most of the world’s languages lack electronic word form dictionaries. The linguists who gather such dictionaries could be helped with an efficient morphology workbench that adapts to different environments and uses. A widely usable workbench could be characterized, ideally, as generally applicable, extensible, and freely available (GEA). It seems that such a solution could be implemented in the framework of finite-state methods. The current work defines the GEA desiderata and starts a series of articles concerning these desiderata in finite- state morphology. Subsequent parts will review the state of the art and present an action plan toward creating a widely usable finite-state morphology workbench.Most of the world’s languages lack electronic word form dictionaries. The linguists who gather such dictionaries could be helped with an efficient morphology workbench that adapts to different environments and uses. A widely usable workbench could be characterized, ideally, as generally applicable, extensible, and freely available (GEA). It seems that such a solution could be implemented in the framework of finite-state methods. The current work defines the GEA desiderata and starts a series of articles concerning these desiderata in finite- state morphology. Subsequent parts will review the state of the art and present an action plan toward creating a widely usable finite-state morphology workbench.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Resolving Inflected Text Structures Irregularities Using Rule-Based Models

Author: Al Ajeeli Abid Thyab
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 29/06/2016
Field of study

In this paper a model, for natural language inflected irregular text structure, is developed in order to automatically be able to derive stems from given text words. The proposed system is modeled in away so that it has the ability to act in two ways forward and backword which is called bi-directional Techniques. It can deduce morphemes from inflected words and, at the same time, can build inflected words from stems. The proposed system is developed and built using first-order logic techniques.The Proposed rule-based model will help researchers to do more investigation and works on multiligual applications that help facilitate many applications in our real life. Those applications can cover topics ranging from medical diagnosis systems, machine translation,…, to e-government entities through teaching expository text structure to facilitate reading comprehension. The proposed model be able learn how to extract rules from information by applying logic programming techniques to natural language data. Keywords:syntax Analysis, Irregular plurals, rule-based, bi-directional, Inflected words, stems, finite atomato

International Institute for Science, Technology and Education (IISTE): E-Journals

Euskarazko hitz anitzeko unitate lexikalen tratamendu konputazionala

Author: Alegría Loinaz Iñaki
Ezeiza Ramos Nerea
Odriozola Pereira Juan Carlos
Urízar Enbeitia Rubén
Publication venue: Servicio Editorial de la Universidad del País Vasco/Euskal Herriko Unibertsitatearen Argitalpen Zerbitzua
Publication date: 01/01/2009
Field of study

Multi-word Lexical Units (MWLU) are of great importance in language in general, and in Natural Language Processing in particular, since they are not governed by the free rules of the system. In this article, we give an overview of the different types of phraseological units, explaining briefly each one's features. Our priority being to process idioms automatically in Basque texts, we concisely analyze several approaches for the inflectional description of MWLUs, and then, we explain the system we have developed for Basque: (i) a general representation for describing MWLUs in the lexical database for Basque (EDBL), (ii) HABIL, a tool capable of detecting and analyzing them based on the features described in the database, and (iii) a constraint grammar for disambiguating ambiguous MWLUs

Archivo Digital para la Docencia y la Investigación

Universidad del País Vasco / Euskal Herriko Unibertsitatea: Ciencia - Portal de revistas digitales de la UPV/EHU