3,551 research outputs found
GRAMPAL: A Morphological Processor for Spanish implemented in Prolog
A model for the full treatment of Spanish inflection for verbs, nouns and
adjectives is presented. This model is based on feature unification and it
relies upon a lexicon of allomorphs both for stems and morphemes. Word forms
are built by the concatenation of allomorphs by means of special contextual
features. We make use of standard Definite Clause Grammars (DCG) included in
most Prolog implementations, instead of the typical finite-state approach. This
allows us to take advantage of the declarativity and bidirectionality of Logic
Programming for NLP.
The most salient feature of this approach is simplicity: A really
straightforward rule and lexical components. We have developed a very simple
model for complex phenomena.
Declarativity, bidirectionality, consistency and completeness of the model
are discussed: all and only correct word forms are analysed or generated, even
alternative ones and gaps in paradigms are preserved. A Prolog implementation
has been developed for both analysis and generation of Spanish word forms. It
consists of only six DCG rules, because our {\em lexicalist\/} approach --i.e.
most information is in the dictionary. Although it is quite efficient, the
current implementation could be improved for analysis by using the non logical
features of Prolog, especially in word segmentation and dictionary access.Comment: 11 page
A prototype machine translation system between Turkmen and Turkish
In this work, we present a prototype system for translation of Turkmen texts into Turkish. Although machine translation (MT) is a very hard task, it is easier to implement a MT system between very close language pairs which have similar syntactic structure and word order. We implement a direct translation system between Turkmen and Turkish which performs a word-to-word transfer. We also use a Turkish Language Model to find the most probable Turkish sentence among all possible candidate translations generated by our system
Error-tolerant Finite State Recognition with Applications to Morphological Analysis and Spelling Correction
Error-tolerant recognition enables the recognition of strings that deviate
mildly from any string in the regular set recognized by the underlying finite
state recognizer. Such recognition has applications in error-tolerant
morphological processing, spelling correction, and approximate string matching
in information retrieval. After a description of the concepts and algorithms
involved, we give examples from two applications: In the context of
morphological analysis, error-tolerant recognition allows misspelled input word
forms to be corrected, and morphologically analyzed concurrently. We present an
application of this to error-tolerant analysis of agglutinative morphology of
Turkish words. The algorithm can be applied to morphological analysis of any
language whose morphology is fully captured by a single (and possibly very
large) finite state transducer, regardless of the word formation processes and
morphographemic phenomena involved. In the context of spelling correction,
error-tolerant recognition can be used to enumerate correct candidate forms
from a given misspelled string within a certain edit distance. Again, it can be
applied to any language with a word list comprising all inflected forms, or
whose morphology is fully described by a finite state transducer. We present
experimental results for spelling correction for a number of languages. These
results indicate that such recognition works very efficiently for candidate
generation in spelling correction for many European languages such as English,
Dutch, French, German, Italian (and others) with very large word lists of root
and inflected forms (some containing well over 200,000 forms), generating all
candidate solutions within 10 to 45 milliseconds (with edit distance 1) on a
SparcStation 10/41. For spelling correction in Turkish, error-tolerantComment: Replaces 9504031. gzipped, uuencoded postscript file. To appear in
Computational Linguistics Volume 22 No:1, 1996, Also available as
ftp://ftp.cs.bilkent.edu.tr/pub/ko/clpaper9512.ps.
A MT System from Turkmen to Turkish employing finite state and statistical methods
In this work, we present a MT system from Turkmen to Turkish. Our system exploits the similarity of the languages by using a modified version of direct translation method. However, the complex inflectional and derivational morphology of the Turkic languages necessitate special treatment for word-by-word translation model. We also employ morphology-aware multi-word processing and statistical disambiguation processes in our system. We believe that this approach is valid for most of the Turkic languages and the architecture implemented using FSTs can be easily extended to those languages
Use of Weighted Finite State Transducers in Part of Speech Tagging
This paper addresses issues in part of speech disambiguation using
finite-state transducers and presents two main contributions to the field. One
of them is the use of finite-state machines for part of speech tagging.
Linguistic and statistical information is represented in terms of weights on
transitions in weighted finite-state transducers. Another contribution is the
successful combination of techniques -- linguistic and statistical -- for word
disambiguation, compounded with the notion of word classes.Comment: uses psfig, ipamac
FreeLing 3.0: Towards Wider Multilinguality
FreeLing is an open-source multilingual language processing library providing a wide range of analyzers for several languages. It
offers text processing and language annotation facilities to NLP application developers, lowering the cost of building those applications.
FreeLing is customizable, extensible, and has a strong orientation to real-world applications in terms of speed and robustness.
Developers can use the default linguistic resources (dictionaries, lexicons, grammars, etc.), extend/adapt them to specific domains, or –since the library is open source– develop new ones for specific languages or special application needs. This paper describes the general architecture of the library, presents the major changes and improvements included in FreeLing version 3.0, and summarizes some relevant industrial projects in which it has been used.Postprint (published version
- …