865 research outputs found

    SKOPE: A connectionist/symbolic architecture of spoken Korean processing

    Full text link
    Spoken language processing requires speech and natural language integration. Moreover, spoken Korean calls for unique processing methodology due to its linguistic characteristics. This paper presents SKOPE, a connectionist/symbolic spoken Korean processing engine, which emphasizes that: 1) connectionist and symbolic techniques must be selectively applied according to their relative strength and weakness, and 2) the linguistic characteristics of Korean must be fully considered for phoneme recognition, speech and language integration, and morphological/syntactic processing. The design and implementation of SKOPE demonstrates how connectionist/symbolic hybrid architectures can be constructed for spoken agglutinative language processing. Also SKOPE presents many novel ideas for speech and language processing. The phoneme recognition, morphological analysis, and syntactic analysis experiments show that SKOPE is a viable approach for the spoken Korean processing.Comment: 8 pages, latex, use aaai.sty & aaai.bst, bibfile: nlpsp.bib, to be presented at IJCAI95 workshops on new approaches to learning for natural language processin

    Integrated speech and morphological processing in a connectionist continuous speech understanding for Korean

    Full text link
    A new tightly coupled speech and natural language integration model is presented for a TDNN-based continuous possibly large vocabulary speech recognition system for Korean. Unlike popular n-best techniques developed for integrating mainly HMM-based speech recognition and natural language processing in a {\em word level}, which is obviously inadequate for morphologically complex agglutinative languages, our model constructs a spoken language system based on a {\em morpheme-level} speech and language integration. With this integration scheme, the spoken Korean processing engine (SKOPE) is designed and implemented using a TDNN-based diphone recognition module integrated with a Viterbi-based lexical decoding and symbolic phonological/morphological co-analysis. Our experiment results show that the speaker-dependent continuous {\em eojeol} (Korean word) recognition and integrated morphological analysis can be achieved with over 80.6% success rate directly from speech inputs for the middle-level vocabularies.Comment: latex source with a4 style, 15 pages, to be published in computer processing of oriental language journa

    Morphological annotation of Korean with Directly Maintainable Resources

    Get PDF
    This article describes an exclusively resource-based method of morphological annotation of written Korean text. Korean is an agglutinative language. Our annotator is designed to process text before the operation of a syntactic parser. In its present state, it annotates one-stem words only. The output is a graph of morphemes annotated with accurate linguistic information. The granularity of the tagset is 3 to 5 times higher than usual tagsets. A comparison with a reference annotated corpus showed that it achieves 89% recall without any corpus training. The language resources used by the system are lexicons of stems, transducers of suffixes and transducers of generation of allomorphs. All can be easily updated, which allows users to control the evolution of the performances of the system. It has been claimed that morphological annotation of Korean text could only be performed by a morphological analysis module accessing a lexicon of morphemes. We show that it can also be performed directly with a lexicon of words and without applying morphological rules at annotation time, which speeds up annotation to 1,210 word/s. The lexicon of words is obtained from the maintainable language resources through a fully automated compilation process

    Learning probabilistic patterns: influence of homophony, L1 and frequency

    Get PDF
    In this thesis, I investigate whether learnersā€™ avoidance of alternation and neutralization, as well as learnersā€™ exposure to their native language (L1), affect how they learn new morpho-phonological patterns. While the effect of individual factors on morpho-phonological learning has been widely studied, whether these factors have a collective effect on learning and interact with the frequency of variants in the input has been understudied. To explore whether there are any interactive effects of these factors, I modify the type of alternations, learnersā€™ native languages, and relative frequency of variants across several repetitions of an experiment. I exposed adult English speakers to an artificial language in which plural forms were probabilistically marked by one of two prefixes. One of the prefixes triggered either a non- neutralizing or neutralizing alternation that could create homophony. I found that English speakers generally matched the relative input frequency to their output. However, learners avoided the construction that resulted in a phonological alternation, but only when it was infrequent. This finding suggests that though there is a tendency to avoid alternations, it depends on how frequent the relative variants are in the input. Moreover, English speakers were poorer at learning the neutralizing alternation than the non-neutralizing alternation, showing their bias against neutralization that can create homophony. Additionally, I replicated the same experiments with Korean speakers because there is abundant exposure to neutralization in their L1. I found that Korean speakers were successful at learning both neutralizing and non-neutralizing alternations, suggesting that having abundant exposure to neutralization can make new neutralization easier to learn. Finally, I argue for a model which implements the avoidance effect as a discounting of observations that trigger homophony in the training data, rather than requiring a special constraint penalizing neutralization in the grammar. This Discount model correctly predicts the different learning results between English and Korean speakers and provides a straightforward explanation for learnersā€™ bias against neutralization and homophony. This approach places the locus of the bias in the learning process rather than in the grammar

    Finding Structure in Silence: The Role of Pauses in Aligning Speaker Expectations

    Full text link
    The intelligibility of speech relies on the ability of interlocutors to dynamically align their expectations about the rates at which informative changes in signals occur. Exactly how this is achieved remains an open question. We propose that speaker alignment is supported by the statistical structure of spoken signals and show how pauses offer a time-invariant template for structuring speech sequences. Consistent with this, we show that pause distributions in conversational English and Korean provide a memoryless information source. We describe how this can facilitate both the initial structuring and maintenance of predictability in spoken signals over time, and show how the properties of this signal change predictably with speaker experience. These results indicate that pauses provide a structuring signal that interacts with the morphological and rhythmical structure of languages, allowing speakers at all stages of lifespan development to distinguish signal from noise and maintain mutual predictability in time.Comment: 25 pages, 5 figure

    Hybrid N-gram Probability Estimation in Morphologically Rich Languages

    Get PDF
    PACLIC 23 / City University of Hong Kong / 3-5 December 200

    Improving accuracy of Part-of-Speech (POS) tagging using hidden markov model and morphological analysis for Myanmar Language

    Get PDF
    In Natural Language Processing (NLP), Word segmentation and Part-of-Speech (POS) tagging are fundamental tasks. The POS information is also necessary in NLPā€™s preprocessing work applications such as machine translation (MT), information retrieval (IR), etc. Currently, there are many research efforts in word segmentation and POS tagging developed separately with different methods to get high performance and accuracy. For Myanmar Language, there are also separate word segmentors and POS taggers based on statistical approaches such as Neural Network (NN) and Hidden Markov Models (HMMs). But, as the Myanmar language's complex morphological structure, the OOV problem still exists. To keep away from error and improve segmentation by utilizing POS data, segmentation and labeling should be possible at the same time.The main goal of developing POS tagger for any Language is to improve accuracy of tagging and remove ambiguity in sentences due to language structure. This paper focuses on developing word segmentation and Part-of- Speech (POS) Tagger for Myanmar Language. This paper presented the comparison of separate word segmentation and POS tagging with joint word segmentation and POS tagging
    • ā€¦
    corecore