12,499 research outputs found
A MT System from Turkmen to Turkish employing finite state and statistical methods
In this work, we present a MT system from Turkmen to Turkish. Our system exploits the similarity of the languages by using a modified version of direct translation method. However, the complex inflectional and derivational morphology of the Turkic languages necessitate special treatment for word-by-word translation model. We also employ morphology-aware multi-word processing and statistical disambiguation processes in our system. We believe that this approach is valid for most of the Turkic languages and the architecture implemented using FSTs can be easily extended to those languages
Unsupervised Discovery of Phonological Categories through Supervised Learning of Morphological Rules
We describe a case study in the application of {\em symbolic machine
learning} techniques for the discovery of linguistic rules and categories. A
supervised rule induction algorithm is used to learn to predict the correct
diminutive suffix given the phonological representation of Dutch nouns. The
system produces rules which are comparable to rules proposed by linguists.
Furthermore, in the process of learning this morphological task, the phonemes
used are grouped into phonologically relevant categories. We discuss the
relevance of our method for linguistics and language technology
The Unsupervised Acquisition of a Lexicon from Continuous Speech
We present an unsupervised learning algorithm that acquires a
natural-language lexicon from raw speech. The algorithm is based on the optimal
encoding of symbol sequences in an MDL framework, and uses a hierarchical
representation of language that overcomes many of the problems that have
stymied previous grammar-induction procedures. The forward mapping from symbol
sequences to the speech stream is modeled using features based on articulatory
gestures. We present results on the acquisition of lexicons and language models
from raw speech, text, and phonetic transcripts, and demonstrate that our
algorithm compares very favorably to other reported results with respect to
segmentation performance and statistical efficiency.Comment: 27 page technical repor
- …