500 research outputs found
Error-tolerant Finite State Recognition with Applications to Morphological Analysis and Spelling Correction
Error-tolerant recognition enables the recognition of strings that deviate
mildly from any string in the regular set recognized by the underlying finite
state recognizer. Such recognition has applications in error-tolerant
morphological processing, spelling correction, and approximate string matching
in information retrieval. After a description of the concepts and algorithms
involved, we give examples from two applications: In the context of
morphological analysis, error-tolerant recognition allows misspelled input word
forms to be corrected, and morphologically analyzed concurrently. We present an
application of this to error-tolerant analysis of agglutinative morphology of
Turkish words. The algorithm can be applied to morphological analysis of any
language whose morphology is fully captured by a single (and possibly very
large) finite state transducer, regardless of the word formation processes and
morphographemic phenomena involved. In the context of spelling correction,
error-tolerant recognition can be used to enumerate correct candidate forms
from a given misspelled string within a certain edit distance. Again, it can be
applied to any language with a word list comprising all inflected forms, or
whose morphology is fully described by a finite state transducer. We present
experimental results for spelling correction for a number of languages. These
results indicate that such recognition works very efficiently for candidate
generation in spelling correction for many European languages such as English,
Dutch, French, German, Italian (and others) with very large word lists of root
and inflected forms (some containing well over 200,000 forms), generating all
candidate solutions within 10 to 45 milliseconds (with edit distance 1) on a
SparcStation 10/41. For spelling correction in Turkish, error-tolerantComment: Replaces 9504031. gzipped, uuencoded postscript file. To appear in
Computational Linguistics Volume 22 No:1, 1996, Also available as
ftp://ftp.cs.bilkent.edu.tr/pub/ko/clpaper9512.ps.
Design and implementation of a spelling checker for Turkish
This paper presents the design and implementation of a spelling checker for Turkish. Turkish is an agglutinative language in which words are formed by affixing a sequence of morphemes to a root word. Parsing agglutinative word structures has attracted relatively little attention except for application areas for general purpose morphological processors. Parsing words in such languages even for spelling checking purposes requires substantial morphological and morphophonemic analysis techniques, and spelling correction (not addressed in this paper) is significantly more complicated. In this paper, we present the design and implementation of a morphological root-driven parser for Turkish word structures which has been incorporated into a spelling checking kernel for on-line Turkish text. The agglutinative nature of the language complex word formations, various phonetic harmony rules, and subtle exceptions present certain difficulties not usually encountered in the spelling checking of languages like English and make this a very challenging problem. © 1993 Oxford University Press
Rapid Development of Morphological Descriptions for Full Language Processing Systems
I describe a compiler and development environment for feature-augmented
two-level morphology rules integrated into a full NLP system. The compiler is
optimized for a class of languages including many or most European ones, and
for rapid development and debugging of descriptions of new languages. The key
design decision is to compose morphophonological and morphosyntactic
information, but not the lexicon, when compiling the description. This results
in typical compilation times of about a minute, and has allowed a reasonably
full, feature-based description of French inflectional morphology to be
developed in about a month by a linguist new to the system.Comment: 8 pages, LaTeX (2.09 preferred); eaclap.sty; Procs of Euro ACL-9
Component processes of early reading, spelling, and narrative writing skills in Turkish: a longitudinal study
The study examined: (a) the role of phonological, grammatical, and rapid automatized naming (RAN) skills in reading and spelling development; and (b) the component processes of early narrative writing skills. Fifty-seven Turkish-speaking children were followed from Grade 1 to Grade 2. RAN was the most powerful longitudinal predictor of reading speed and its effect was evident even when previous reading skills were taken into account. Broadly, the phonological and grammatical skills made reliable contributions to spelling performance but their effects were completely mediated by previous spelling skills. Different aspects of the narrative writing skills were related to different processing skills. While handwriting speed predicted writing fluency, spelling accuracy predicted spelling error rate. Vocabulary and working memory were the only reliable longitudinal predictors of the quality of composition content. The overall model, however, failed to explain any reliable variance in the structural quality of the composition
Integrated speech and morphological processing in a connectionist continuous speech understanding for Korean
A new tightly coupled speech and natural language integration model is
presented for a TDNN-based continuous possibly large vocabulary speech
recognition system for Korean. Unlike popular n-best techniques developed for
integrating mainly HMM-based speech recognition and natural language processing
in a {\em word level}, which is obviously inadequate for morphologically
complex agglutinative languages, our model constructs a spoken language system
based on a {\em morpheme-level} speech and language integration. With this
integration scheme, the spoken Korean processing engine (SKOPE) is designed and
implemented using a TDNN-based diphone recognition module integrated with a
Viterbi-based lexical decoding and symbolic phonological/morphological
co-analysis. Our experiment results show that the speaker-dependent continuous
{\em eojeol} (Korean word) recognition and integrated morphological analysis
can be achieved with over 80.6% success rate directly from speech inputs for
the middle-level vocabularies.Comment: latex source with a4 style, 15 pages, to be published in computer
processing of oriental language journa
Detection of semantic errors in Arabic texts
AbstractDetecting semantic errors in a text is still a challenging area of investigation. A lot of research has been done on lexical and syntactic errors while fewer studies have tackled semantic errors, as they are more difficult to treat. Compared to other languages, Arabic appears to be a special challenge for this problem. Because words are graphically very similar to each other, the risk of getting semantic errors in Arabic texts is bigger. Moreover, there are special cases and unique complexities for this language. This paper deals with the detection of semantic errors in Arabic texts but the approach we have adopted can also be applied for texts in other languages. It combines four contextual methods (using statistics and linguistic information) in order to decide about the semantic validity of a word in a sentence. We chose to implement our approach on a distributed architecture, namely, a Multi Agent System (MAS). The implemented system achieved a precision rate of about 90% and a recall rate of about 83%
- …