18,041 research outputs found
Error-tolerant Finite State Recognition with Applications to Morphological Analysis and Spelling Correction
Error-tolerant recognition enables the recognition of strings that deviate
mildly from any string in the regular set recognized by the underlying finite
state recognizer. Such recognition has applications in error-tolerant
morphological processing, spelling correction, and approximate string matching
in information retrieval. After a description of the concepts and algorithms
involved, we give examples from two applications: In the context of
morphological analysis, error-tolerant recognition allows misspelled input word
forms to be corrected, and morphologically analyzed concurrently. We present an
application of this to error-tolerant analysis of agglutinative morphology of
Turkish words. The algorithm can be applied to morphological analysis of any
language whose morphology is fully captured by a single (and possibly very
large) finite state transducer, regardless of the word formation processes and
morphographemic phenomena involved. In the context of spelling correction,
error-tolerant recognition can be used to enumerate correct candidate forms
from a given misspelled string within a certain edit distance. Again, it can be
applied to any language with a word list comprising all inflected forms, or
whose morphology is fully described by a finite state transducer. We present
experimental results for spelling correction for a number of languages. These
results indicate that such recognition works very efficiently for candidate
generation in spelling correction for many European languages such as English,
Dutch, French, German, Italian (and others) with very large word lists of root
and inflected forms (some containing well over 200,000 forms), generating all
candidate solutions within 10 to 45 milliseconds (with edit distance 1) on a
SparcStation 10/41. For spelling correction in Turkish, error-tolerantComment: Replaces 9504031. gzipped, uuencoded postscript file. To appear in
Computational Linguistics Volume 22 No:1, 1996, Also available as
ftp://ftp.cs.bilkent.edu.tr/pub/ko/clpaper9512.ps.
A finite-state approach to arabic broken noun morphology
In this paper, a finite-state computational approach to Arabic broken plural noun morphology is introduced. The paper considers the derivational aspect of the approach, and how generalizations about dependencies in the broken plural noun derivational system of Arabic are captured and handled computationally in this finite-state approach. The approach will be implemented using Xerox finite-state tool
Recommended from our members
Minimally supervised induction of morphology through bitexts
textA knowledge of morphology can be useful for many natural language processing systems. Thus, much effort has been expended in developing accurate computational tools for morphology that lemmatize, segment and generate new forms. The most powerful and accurate of these have been manually encoded, such endeavors being without exception expensive and time-consuming. There have been consequently many attempts to reduce this cost in the development of morphological systems through the development of unsupervised or minimally supervised algorithms and learning methods for acquisition of morphology. These efforts have yet to produce a tool that approaches the performance of manually encoded systems.
Here, I present a strategy for dealing with morphological clustering and segmentation in a minimally supervised manner but one that will be more linguistically informed than previous unsupervised approaches. That is, this study will attempt to induce clusters of words from an unannotated text that are inflectional variants of each other. Then a set of inflectional suffixes by part-of-speech will be induced from these clusters. This level of detail is made possible by a method known as alignment and transfer (AT), among other names, an approach that uses aligned bitexts to transfer linguistic resources developed for one language–the source language–to another language–the target. This approach has a further advantage in that it allows a reduction in the amount of training data without a significant degradation in performance making it useful in applications targeted at data collected from endangered languages. In the current study, however, I use English as the source and German as the target for ease of evaluation and for certain typlogical properties of German. The two main tasks, that of clustering and segmentation, are approached as sequential tasks with the clustering informing the segmentation to allow for greater accuracy in morphological analysis.
While the performance of these methods does not exceed the current roster of unsupervised or minimally supervised approaches to morphology acquisition, it attempts to integrate more learning methods than previous studies. Furthermore, it attempts to learn inflectional morphology as opposed to derivational morphology, which is a crucial distinction in linguistics.Linguistic
Three-dimensional coherent X-ray diffraction imaging of a ceramic nanofoam: determination of structural deformation mechanisms
Ultra-low density polymers, metals, and ceramic nanofoams are valued for
their high strength-to-weight ratio, high surface area and insulating
properties ascribed to their structural geometry. We obtain the labrynthine
internal structure of a tantalum oxide nanofoam by X-ray diffractive imaging.
Finite element analysis from the structure reveals mechanical properties
consistent with bulk samples and with a diffusion limited cluster aggregation
model, while excess mass on the nodes discounts the dangling fragments
hypothesis of percolation theory.Comment: 8 pages, 5 figures, 30 reference
Three-dimensional coherent X-ray diffraction imaging of a ceramic nanofoam: determination of structural deformation mechanisms
Ultra-low density polymers, metals, and ceramic nanofoams are valued for
their high strength-to-weight ratio, high surface area and insulating
properties ascribed to their structural geometry. We obtain the labrynthine
internal structure of a tantalum oxide nanofoam by X-ray diffractive imaging.
Finite element analysis from the structure reveals mechanical properties
consistent with bulk samples and with a diffusion limited cluster aggregation
model, while excess mass on the nodes discounts the dangling fragments
hypothesis of percolation theory.Comment: 8 pages, 5 figures, 30 reference
ON MONITORING LANGUAGE CHANGE WITH THE SUPPORT OF CORPUS PROCESSING
One of the fundamental characteristics of language is that it can change over time. One
method to monitor the change is by observing its corpora: a structured language
documentation. Recent development in technology, especially in the field of Natural
Language Processing allows robust linguistic processing, which support the description of
diverse historical changes of the corpora. The interference of human linguist is inevitable as
it determines the gold standard, but computer assistance provides considerable support by
incorporating computational approach in exploring the corpora, especially historical
corpora. This paper proposes a model for corpus development, where corpus are annotated
to support further computational operations such as lexicogrammatical pattern matching,
automatic retrieval and extraction. The corpus processing operations are performed by local
grammar based corpus processing software on a contemporary Indonesian corpus. This
paper concludes that data collection and data processing in a corpus are equally crucial
importance to monitor language change, and none can be set aside
- …