165 research outputs found

    Tone classification of syllable -segmented Thai speech based on multilayer perceptron

    Get PDF
    Thai is a monosyllabic and tonal language. Thai makes use of tone to convey lexical information about the meaning of a syllable. Thai has five distinctive tones and each tone is well represented by a single F0 contour pattern. In general, a Thai syllable with a different tone has a different lexical meaning. Thus, to completely recognize a spoken Thai syllable, a speech recognition system has not only to recognize a base syllable but also to correctly identify a tone. Hence, tone classification of Thai speech is an essential part of a Thai speech recognition system.;In this study, a tone classification of syllable-segmented Thai speech which incorporates the effects of tonal coarticulation, stress and intonation was developed. Automatic syllable segmentation, which performs the segmentation on the training and test utterances into syllable units, was also developed. The acoustical features including fundamental frequency (F0), duration, and energy extracted from the processing syllable and neighboring syllables were used as the main discriminating features. A multilayer perceptron (MLP) trained by backpropagation method was employed to classify these features. The proposed system was evaluated on 920 test utterances spoken by five male and three female Thai speakers who also uttered the training speech. The proposed system achieved an average accuracy rate of 91.36%

    Large vocabulary Cantonese speech recognition using neural networks.

    Get PDF
    Tsik Chung Wai Benjamin.Thesis (M.Phil.)--Chinese University of Hong Kong, 1994.Includes bibliographical references (leaves 67-70).Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Automatic Speech Recognition --- p.1Chapter 1.2 --- Cantonese Speech Recognition --- p.3Chapter 1.3 --- Neural Networks --- p.4Chapter 1.4 --- About this Thesis --- p.5Chapter 2 --- The Phonology of Cantonese --- p.6Chapter 2.1 --- The Syllabic Structure of Cantonese Syllable --- p.7Chapter 2.2 --- The Tone System of Cantonese --- p.9Chapter 3 --- Review of Automatic Speech Recognition Systems --- p.12Chapter 3.1 --- Hidden Markov Model Approach --- p.12Chapter 3.2 --- Neural Networks Approach --- p.13Chapter 3.2.1 --- Multi-Layer Perceptrons (MLP) --- p.13Chapter 3.2.2 --- Time-Delay Neural Networks (TDNN) --- p.15Chapter 3.2.3 --- Recurrent Neural Networks --- p.17Chapter 3.3 --- Integrated Approach --- p.18Chapter 3.4 --- Mandarin and Cantonese Speech Recognition Systems --- p.19Chapter 4 --- The Speech Corpus and Database --- p.21Chapter 4.1 --- Design of the Speech Corpus --- p.21Chapter 4.2 --- Speech Database Acquisition --- p.23Chapter 5 --- Feature Parameters Extraction --- p.24Chapter 5.1 --- Endpoint Detection --- p.25Chapter 5.2 --- Speech Processing --- p.26Chapter 5.3 --- Speech Segmentation --- p.27Chapter 5.4 --- Phoneme Feature Extraction --- p.29Chapter 5.5 --- Tone Feature Extraction --- p.30Chapter 6 --- The Design of the System --- p.33Chapter 6.1 --- Towards Large Vocabulary System --- p.34Chapter 6.2 --- Overview of the Isolated Cantonese Syllable Recognition System --- p.36Chapter 6.3 --- The Primary Level: Phoneme Classifiers and Tone Classifier --- p.38Chapter 6.4 --- The Intermediate Level: Ending Corrector --- p.42Chapter 6.5 --- The Secondary Level: Syllable Classifier --- p.43Chapter 6.5.1 --- Concatenation with Correction Approach --- p.44Chapter 6.5.2 --- Fuzzy ART Approach --- p.45Chapter 7 --- Computer Simulation --- p.49Chapter 7.1 --- Experimental Conditions --- p.49Chapter 7.2 --- Experimental Results of the Primary Level Classifiers --- p.50Chapter 7.3 --- Overall Performance of the System --- p.57Chapter 7.4 --- Discussions --- p.61Chapter 8 --- Further Works --- p.62Chapter 8.1 --- Enhancement on Speech Segmentation --- p.62Chapter 8.2 --- Towards Speaker-Independent System --- p.63Chapter 8.3 --- Towards Speech-to-Text System --- p.64Chapter 9 --- Conclusions --- p.65Bibliography --- p.67Appendix A. Cantonese Syllable Full Set List --- p.7

    Production and perception of tones by Dutch learners of Mandarin

    Get PDF
    The function of pitch movements varies across languages. Tone languages, such as Mandarin Chinese, use pitch configurations to differentiate between word forms. For non-tone languages (such as Dutch and English), pitch information is mainly used at the post-lexical level, e.g., to signal sentential prominence or delimit prosodic constituents. Therefore, learning to use lexical tones is always difficult for non-tone second language learners of Mandarin who are not familiar with using pitch information in a lexically contrastive way. This thesis investigates various aspects of production and perception of tones by beginning and advanced Dutch learners of Mandarin. Through a series of four experiments, this thesis examines the developmental path of Dutch learners of Mandarin at the university level in their acquisition of fine-grained tonal coarticulation patterns, distribution of attention between segments and tones, phonological processing of tones and using tonal information in spoken word recognition. The mechanisms underlying the learners’ tone acquisition are discussed with reference to current theories and models of second language acquisition and spoken word recognition. China Scholarship Council Leiden University Centre for LinguisticsTheoretical and Experimental Linguistic

    MISPRONUNCIATION DETECTION AND DIAGNOSIS IN MANDARIN ACCENTED ENGLISH SPEECH

    Get PDF
    This work presents the development, implementation, and evaluation of a Mispronunciation Detection and Diagnosis (MDD) system, with application to pronunciation evaluation of Mandarin-accented English speech. A comprehensive detection and diagnosis of errors in the Electromagnetic Articulography corpus of Mandarin-Accented English (EMA-MAE) was performed by using the expert phonetic transcripts and an Automatic Speech Recognition (ASR) system. Articulatory features derived from the parallel kinematic data available in the EMA-MAE corpus were used to identify the most significant articulatory error patterns seen in L2 speakers during common mispronunciations. Using both acoustic and articulatory information, an ASR based Mispronunciation Detection and Diagnosis (MDD) system was built and evaluated across different feature combinations and Deep Neural Network (DNN) architectures. The MDD system captured mispronunciation errors with a detection accuracy of 82.4%, a diagnostic accuracy of 75.8% and a false rejection rate of 17.2%. The results demonstrate the advantage of using articulatory features in revealing the significant contributors of mispronunciation as well as improving the performance of MDD systems

    Investigating the build-up of precedence effect using reflection masking

    Get PDF
    The auditory processing level involved in the build‐up of precedence [Freyman et al., J. Acoust. Soc. Am. 90, 874–884 (1991)] has been investigated here by employing reflection masked threshold (RMT) techniques. Given that RMT techniques are generally assumed to address lower levels of the auditory signal processing, such an approach represents a bottom‐up approach to the buildup of precedence. Three conditioner configurations measuring a possible buildup of reflection suppression were compared to the baseline RMT for four reflection delays ranging from 2.5–15 ms. No buildup of reflection suppression was observed for any of the conditioner configurations. Buildup of template (decrease in RMT for two of the conditioners), on the other hand, was found to be delay dependent. For five of six listeners, with reflection delay=2.5 and 15 ms, RMT decreased relative to the baseline. For 5‐ and 10‐ms delay, no change in threshold was observed. It is concluded that the low‐level auditory processing involved in RMT is not sufficient to realize a buildup of reflection suppression. This confirms suggestions that higher level processing is involved in PE buildup. The observed enhancement of reflection detection (RMT) may contribute to active suppression at higher processing levels
    corecore