836 research outputs found
Use of Weighted Finite State Transducers in Part of Speech Tagging
This paper addresses issues in part of speech disambiguation using
finite-state transducers and presents two main contributions to the field. One
of them is the use of finite-state machines for part of speech tagging.
Linguistic and statistical information is represented in terms of weights on
transitions in weighted finite-state transducers. Another contribution is the
successful combination of techniques -- linguistic and statistical -- for word
disambiguation, compounded with the notion of word classes.Comment: uses psfig, ipamac
Morphological Disambiguation by Voting Constraints
We present a constraint-based morphological disambiguation system in which
individual constraints vote on matching morphological parses, and
disambiguation of all the tokens in a sentence is performed at the end by
selecting parses that receive the highest votes. This constraint application
paradigm makes the outcome of the disambiguation independent of the rule
sequence, and hence relieves the rule developer from worrying about potentially
conflicting rule sequencing. Our results for disambiguating Turkish indicate
that using about 500 constraint rules and some additional simple statistics, we
can attain a recall of 95-96% and a precision of 94-95% with about 1.01 parses
per token. Our system is implemented in Prolog and we are currently
investigating an efficient implementation based on finite state transducers.Comment: 8 pages, Latex source. To appear in Proceedings of ACL/EACL'97
Compressed postscript also available as
ftp://ftp.cs.bilkent.edu.tr/pub/ko/acl97.ps.
A finite-state model of German compounds
This paper summarizes the results of my Master's thesis and the main points of a talk I presented at the seminar of the Department of Applied Logic at the Adam Mickiewicz University in Poznań.It gives a short overview of the structure of German compounds and newer research concerning the role of the so-called interfixes. After an introduction to the concept of finite-state transducers the construction of a transducer used for naive compound segmentation is described. Tag-based finite-state methods for the further analysis of the found segments are given and discussed. Distributional transducer rules, for the construction of which I assume the existence of local and global morphological contexts, are proposed as means of disambiguation of the analyzed naive segmentation results.This paper summarizes the results of my Master's thesis and the main points of a talk I presented at the seminar of the Department of Applied Logic at the Adam Mickiewicz University in Poznań.It gives a short overview of the structure of German compounds and newer research concerning the role of the so-called interfixes. After an introduction to the concept of finite-state transducers the construction of a transducer used for naive compound segmentation is described. Tag-based finite-state methods for the further analysis of the found segments are given and discussed. Distributional transducer rules, for the construction of which I assume the existence of local and global morphological contexts, are proposed as means of disambiguation of the analyzed naive segmentation results.
On the Disambiguation of Weighted Automata
We present a disambiguation algorithm for weighted automata. The algorithm
admits two main stages: a pre-disambiguation stage followed by a transition
removal stage. We give a detailed description of the algorithm and the proof of
its correctness. The algorithm is not applicable to all weighted automata but
we prove sufficient conditions for its applicability in the case of the
tropical semiring by introducing the *weak twins property*. In particular, the
algorithm can be used with all acyclic weighted automata, relevant to
applications. While disambiguation can sometimes be achieved using
determinization, our disambiguation algorithm in some cases can return a result
that is exponentially smaller than any equivalent deterministic automaton. We
also present some empirical evidence of the space benefits of disambiguation
over determinization in speech recognition and machine translation
applications
Error-tolerant Finite State Recognition with Applications to Morphological Analysis and Spelling Correction
Error-tolerant recognition enables the recognition of strings that deviate
mildly from any string in the regular set recognized by the underlying finite
state recognizer. Such recognition has applications in error-tolerant
morphological processing, spelling correction, and approximate string matching
in information retrieval. After a description of the concepts and algorithms
involved, we give examples from two applications: In the context of
morphological analysis, error-tolerant recognition allows misspelled input word
forms to be corrected, and morphologically analyzed concurrently. We present an
application of this to error-tolerant analysis of agglutinative morphology of
Turkish words. The algorithm can be applied to morphological analysis of any
language whose morphology is fully captured by a single (and possibly very
large) finite state transducer, regardless of the word formation processes and
morphographemic phenomena involved. In the context of spelling correction,
error-tolerant recognition can be used to enumerate correct candidate forms
from a given misspelled string within a certain edit distance. Again, it can be
applied to any language with a word list comprising all inflected forms, or
whose morphology is fully described by a finite state transducer. We present
experimental results for spelling correction for a number of languages. These
results indicate that such recognition works very efficiently for candidate
generation in spelling correction for many European languages such as English,
Dutch, French, German, Italian (and others) with very large word lists of root
and inflected forms (some containing well over 200,000 forms), generating all
candidate solutions within 10 to 45 milliseconds (with edit distance 1) on a
SparcStation 10/41. For spelling correction in Turkish, error-tolerantComment: Replaces 9504031. gzipped, uuencoded postscript file. To appear in
Computational Linguistics Volume 22 No:1, 1996, Also available as
ftp://ftp.cs.bilkent.edu.tr/pub/ko/clpaper9512.ps.
- …