2,603 research outputs found
Unsupervised Discovery of Phonological Categories through Supervised Learning of Morphological Rules
We describe a case study in the application of {\em symbolic machine
learning} techniques for the discovery of linguistic rules and categories. A
supervised rule induction algorithm is used to learn to predict the correct
diminutive suffix given the phonological representation of Dutch nouns. The
system produces rules which are comparable to rules proposed by linguists.
Furthermore, in the process of learning this morphological task, the phonemes
used are grouped into phonologically relevant categories. We discuss the
relevance of our method for linguistics and language technology
Penalizing unknown words’ emissions in hmm pos tagger based on Malay affix morphemes
The challenge in unsupervised Hidden Markov Model (HMM) training for a POS tagger isthat the training depends on an untagged corpus; the only supervised data limiting possible tagging of words is a dictionary. Therefore, training cannot properly map possible tags. The exact morphemes of prefixes, suffixes and circumfixes in the agglutinative Malay language is examined to assign unknown words’ probable tags based on linguistically meaningful affixes using a morpheme-based POS guessing algorithm for tagging. The algorithm has been integrated into Viterbi algorithm which uses HMM trained parameters for tagging new sentences. In the experiment, this tagger is first, uses character-based prediction to handle unknown words; next, uses morpheme-based POS guessing algorithm; lastly, combination of the first and second.Keywords: Malay POS tagger; morpheme-based; HMM
Implicit learning of recursive context-free grammars
Context-free grammars are fundamental for the description of linguistic syntax. However, most artificial grammar learning
experiments have explored learning of simpler finite-state grammars, while studies exploring context-free grammars have
not assessed awareness and implicitness. This paper explores the implicit learning of context-free grammars employing
features of hierarchical organization, recursive embedding and long-distance dependencies. The grammars also featured
the distinction between left- and right-branching structures, as well as between centre- and tail-embedding, both
distinctions found in natural languages. People acquired unconscious knowledge of relations between grammatical classes
even for dependencies over long distances, in ways that went beyond learning simpler relations (e.g. n-grams) between
individual words. The structural distinctions drawn from linguistics also proved important as performance was greater for
tail-embedding than centre-embedding structures. The results suggest the plausibility of implicit learning of complex
context-free structures, which model some features of natural languages. They support the relevance of artificial grammar
learning for probing mechanisms of language learning and challenge existing theories and computational models of
implicit learning
Corpus-based paradigm Selection for morphological entries
Volume: 4 Host publication title: Nealt Proceedings Series Vol. 4 Host publication sub-title: Proceedings of the 17th Nordic Conference of Computational Linguistics NODALIDA 2009Peer reviewe
- …