303 research outputs found

    Efficient Embedded Speech Recognition for Very Large Vocabulary Mandarin Car-Navigation Systems

    Get PDF
    Automatic speech recognition (ASR) for a very large vocabulary of isolated words is a difficult task on a resource-limited embedded device. This paper presents a novel fast decoding algorithm for a Mandarin speech recognition system which can simultaneously process hundreds of thousands of items and maintain high recognition accuracy. The proposed algorithm constructs a semi-tree search network based on Mandarin pronunciation rules, to avoid duplicate syllable matching and save redundant memory. Based on a two-stage fixed-width beam-search baseline system, the algorithm employs a variable beam-width pruning strategy and a frame-synchronous word-level pruning strategy to significantly reduce recognition time. This algorithm is aimed at an in-car navigation system in China and simulated on a standard PC workstation. The experimental results show that the proposed method reduces recognition time by nearly 6-fold and memory size nearly 2- fold compared to the baseline system, and causes less than 1% accuracy degradation for a 200,000 word recognition task

    Large vocabulary Cantonese speech recognition using neural networks.

    Get PDF
    Tsik Chung Wai Benjamin.Thesis (M.Phil.)--Chinese University of Hong Kong, 1994.Includes bibliographical references (leaves 67-70).Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Automatic Speech Recognition --- p.1Chapter 1.2 --- Cantonese Speech Recognition --- p.3Chapter 1.3 --- Neural Networks --- p.4Chapter 1.4 --- About this Thesis --- p.5Chapter 2 --- The Phonology of Cantonese --- p.6Chapter 2.1 --- The Syllabic Structure of Cantonese Syllable --- p.7Chapter 2.2 --- The Tone System of Cantonese --- p.9Chapter 3 --- Review of Automatic Speech Recognition Systems --- p.12Chapter 3.1 --- Hidden Markov Model Approach --- p.12Chapter 3.2 --- Neural Networks Approach --- p.13Chapter 3.2.1 --- Multi-Layer Perceptrons (MLP) --- p.13Chapter 3.2.2 --- Time-Delay Neural Networks (TDNN) --- p.15Chapter 3.2.3 --- Recurrent Neural Networks --- p.17Chapter 3.3 --- Integrated Approach --- p.18Chapter 3.4 --- Mandarin and Cantonese Speech Recognition Systems --- p.19Chapter 4 --- The Speech Corpus and Database --- p.21Chapter 4.1 --- Design of the Speech Corpus --- p.21Chapter 4.2 --- Speech Database Acquisition --- p.23Chapter 5 --- Feature Parameters Extraction --- p.24Chapter 5.1 --- Endpoint Detection --- p.25Chapter 5.2 --- Speech Processing --- p.26Chapter 5.3 --- Speech Segmentation --- p.27Chapter 5.4 --- Phoneme Feature Extraction --- p.29Chapter 5.5 --- Tone Feature Extraction --- p.30Chapter 6 --- The Design of the System --- p.33Chapter 6.1 --- Towards Large Vocabulary System --- p.34Chapter 6.2 --- Overview of the Isolated Cantonese Syllable Recognition System --- p.36Chapter 6.3 --- The Primary Level: Phoneme Classifiers and Tone Classifier --- p.38Chapter 6.4 --- The Intermediate Level: Ending Corrector --- p.42Chapter 6.5 --- The Secondary Level: Syllable Classifier --- p.43Chapter 6.5.1 --- Concatenation with Correction Approach --- p.44Chapter 6.5.2 --- Fuzzy ART Approach --- p.45Chapter 7 --- Computer Simulation --- p.49Chapter 7.1 --- Experimental Conditions --- p.49Chapter 7.2 --- Experimental Results of the Primary Level Classifiers --- p.50Chapter 7.3 --- Overall Performance of the System --- p.57Chapter 7.4 --- Discussions --- p.61Chapter 8 --- Further Works --- p.62Chapter 8.1 --- Enhancement on Speech Segmentation --- p.62Chapter 8.2 --- Towards Speaker-Independent System --- p.63Chapter 8.3 --- Towards Speech-to-Text System --- p.64Chapter 9 --- Conclusions --- p.65Bibliography --- p.67Appendix A. Cantonese Syllable Full Set List --- p.7

    The effects of phonological neighborhoods on spoken word recognition in Mandarin Chinese

    Get PDF
    Spoken word recognition is influenced by words similar to the target word with one phoneme difference (neighbors). In English, words with many neighbors (high neighborhood density) are processed more slowly or less accurately than words with few neighbors. However, little is known about the effects in Mandarin Chinese. The present study examined the effects of neighborhood density and the definition of neighbors in Mandarin Chinese, using an auditory naming task with word sets differing in density levels (high vs. low) and neighbor types (words with neighbors with a nasal final consonant vs. words without such nasal-final neighbors). Results showed an inhibitory effect of high neighborhood density on reaction times and a difference between nasal-final neighbors and vowel-final neighbors. The findings suggest that neighbors compete and inhibit word access in Mandarin Chinese. Yet, other factors at the sublexical level may also play a role in the process

    Linguistic constraints for large vocabulary speech recognition.

    Get PDF
    by Roger H.Y. Leung.Thesis (M.Phil.)--Chinese University of Hong Kong, 1999.Includes bibliographical references (leaves 79-84).Abstracts in English and Chinese.ABSTRACT --- p.IKeywords: --- p.IACKNOWLEDGEMENTS --- p.IIITABLE OF CONTENTS: --- p.IVTable of Figures: --- p.VITable of Tables: --- p.VIIChapter CHAPTER 1 --- INTRODUCTION --- p.1Chapter 1.1 --- Languages in the World --- p.2Chapter 1.2 --- Problems of Chinese Speech Recognition --- p.3Chapter 1.2.1 --- Unlimited word size: --- p.3Chapter 1.2.2 --- Too many Homophones: --- p.3Chapter 1.2.3 --- Difference between spoken and written Chinese: --- p.3Chapter 1.2.4 --- Word Segmentation Problem: --- p.4Chapter 1.3 --- Different types of knowledge --- p.5Chapter 1.4 --- Chapter Conclusion --- p.6Chapter CHAPTER 2 --- FOUNDATIONS --- p.7Chapter 2.1 --- Chinese Phonology and Language Properties --- p.7Chapter 2.1.1 --- Basic Syllable Structure --- p.7Chapter 2.2 --- Acoustic Models --- p.9Chapter 2.2.1 --- Acoustic Unit --- p.9Chapter 2.2.2 --- Hidden Markov Model (HMM) --- p.9Chapter 2.3 --- Search Algorithm --- p.11Chapter 2.4 --- Statistical Language Models --- p.12Chapter 2.4.1 --- Context-Independent Language Model --- p.12Chapter 2.4.2 --- Word-Pair Language Model --- p.13Chapter 2.4.3 --- N-gram Language Model --- p.13Chapter 2.4.4 --- Backoff n-gram --- p.14Chapter 2.5 --- Smoothing for Language Model --- p.16Chapter CHAPTER 3 --- LEXICAL ACCESS --- p.18Chapter 3.1 --- Introduction --- p.18Chapter 3.2 --- Motivation: Phonological and lexical constraints --- p.20Chapter 3.3 --- Broad Classes Representation --- p.22Chapter 3.4 --- Broad Classes Statistic Measures --- p.25Chapter 3.5 --- Broad Classes Frequency Normalization --- p.26Chapter 3.6 --- Broad Classes Analysis --- p.27Chapter 3.7 --- Isolated Word Speech Recognizer using Broad Classes --- p.33Chapter 3.8 --- Chapter Conclusion --- p.34Chapter CHAPTER 4 --- CHARACTER AND WORD LANGUAGE MODEL --- p.35Chapter 4.1 --- Introduction --- p.35Chapter 4.2 --- Motivation --- p.36Chapter 4.2.1 --- Perplexity --- p.36Chapter 4.3 --- Call Home Mandarin corpus --- p.38Chapter 4.3.1 --- Acoustic Data --- p.38Chapter 4.3.2 --- Transcription Texts --- p.39Chapter 4.4 --- Methodology: Building Language Model --- p.41Chapter 4.5 --- Character Level Language Model --- p.45Chapter 4.6 --- Word Level Language Model --- p.48Chapter 4.7 --- Comparison of Character level and Word level Language Model --- p.50Chapter 4.8 --- Interpolated Language Model --- p.54Chapter 4.8.1 --- Methodology --- p.54Chapter 4.8.2 --- Experiment Results --- p.55Chapter 4.9 --- Chapter Conclusion --- p.56Chapter CHAPTER 5 --- N-GRAM SMOOTHING --- p.57Chapter 5.1 --- Introduction --- p.57Chapter 5.2 --- Motivation --- p.58Chapter 5.3 --- Mathematical Representation --- p.59Chapter 5.4 --- Methodology: Smoothing techniques --- p.61Chapter 5.4.1 --- Add-one Smoothing --- p.62Chapter 5.4.2 --- Witten-Bell Discounting --- p.64Chapter 5.4.3 --- Good Turing Discounting --- p.66Chapter 5.4.4 --- Absolute and Linear Discounting --- p.68Chapter 5.5 --- Comparison of Different Discount Methods --- p.70Chapter 5.6 --- Continuous Word Speech Recognizer --- p.71Chapter 5.6.1 --- Experiment Setup --- p.71Chapter 5.6.2 --- Experiment Results: --- p.72Chapter 5.7 --- Chapter Conclusion --- p.74Chapter CHAPTER 6 --- SUMMARY AND CONCLUSIONS --- p.75Chapter 6.1 --- Summary --- p.75Chapter 6.2 --- Further Work --- p.77Chapter 6.3 --- Conclusion --- p.78REFERENCE --- p.7

    Porting the galaxy system to Mandarin Chinese

    Get PDF
    Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1997.Includes bibliographical references (leaves 83-86).by Chao Wang.M.S

    Pronunciation modeling for Cantonese speech recognition.

    Get PDF
    Kam Patgi.Thesis (M.Phil.)--Chinese University of Hong Kong, 2003.Includes bibliographical references (leaf 103).Abstracts in English and Chinese.Chapter Chapter 1. --- Introduction --- p.1Chapter 1.1 --- Automatic Speech Recognition --- p.1Chapter 1.2 --- Pronunciation Modeling in ASR --- p.2Chapter 1.3 --- Obj ectives of the Thesis --- p.5Chapter 1.4 --- Thesis Outline --- p.5Reference --- p.7Chapter Chapter 2. --- The Cantonese Dialect --- p.9Chapter 2.1 --- Cantonese - A Typical Chinese Dialect --- p.10Chapter 2.1.1 --- Cantonese Phonology --- p.11Chapter 2.1.2 --- Cantonese Phonetics --- p.12Chapter 2.2 --- Pronunciation Variation in Cantonese --- p.13Chapter 2.2.1 --- Phone Change and Sound Change --- p.14Chapter 2.2.2 --- Notation for Different Sound Units --- p.16Chapter 2.3 --- Summary --- p.17Reference --- p.18Chapter Chapter 3. --- Large-Vocabulary Continuous Speech Recognition for Cantonese --- p.19Chapter 3.1 --- Feature Representation of the Speech Signal --- p.20Chapter 3.2 --- Probabilistic Framework of ASR --- p.20Chapter 3.3 --- Hidden Markov Model for Acoustic Modeling --- p.21Chapter 3.4 --- Pronunciation Lexicon --- p.25Chapter 3.5 --- Statistical Language Model --- p.25Chapter 3.6 --- Decoding --- p.26Chapter 3.7 --- The Baseline Cantonese LVCSR System --- p.26Chapter 3.7.1 --- System Architecture --- p.26Chapter 3.7.2 --- Speech Databases --- p.28Chapter 3.8 --- Summary --- p.29Reference --- p.30Chapter Chapter 4. --- Pronunciation Model --- p.32Chapter 4.1 --- Pronunciation Modeling at Different Levels --- p.33Chapter 4.2 --- Phone-level pronunciation model and its Application --- p.35Chapter 4.2.1 --- IF Confusion Matrix (CM) --- p.35Chapter 4.2.2 --- Decision Tree Pronunciation Model (DTPM) --- p.38Chapter 4.2.3 --- Refinement of Confusion Matrix --- p.41Chapter 4.3 --- Summary --- p.43References --- p.44Chapter Chapter 5. --- Pronunciation Modeling at Lexical Level --- p.45Chapter 5.1 --- Construction of PVD --- p.46Chapter 5.2 --- PVD Pruning by Word Unigram --- p.48Chapter 5.3 --- Recognition Experiments --- p.49Chapter 5.3.1 --- Experiment 1 ´ؤPronunciation Modeling in LVCSR --- p.49Chapter 5.3.2 --- Experiment 2 ´ؤ Pronunciation Modeling in Domain Specific task --- p.58Chapter 5.3.3 --- Experiment 3 ´ؤ PVD Pruning by Word Unigram --- p.62Chapter 5.4 --- Summary --- p.63Reference --- p.64Chapter Chapter 6. --- Pronunciation Modeling at Acoustic Model Level --- p.66Chapter 6.1 --- Hierarchy of HMM --- p.67Chapter 6.2 --- Sharing of Mixture Components --- p.68Chapter 6.3 --- Adaptation of Mixture Components --- p.70Chapter 6.4 --- Combination of Mixture Component Sharing and Adaptation --- p.74Chapter 6.5 --- Recognition Experiments --- p.78Chapter 6.6 --- Result Analysis --- p.80Chapter 6.6.1 --- Performance of Sharing Mixture Components --- p.81Chapter 6.6.2 --- Performance of Mixture Component Adaptation --- p.84Chapter 6.7 --- Summary --- p.85Reference --- p.87Chapter Chapter 7. --- Pronunciation Modeling at Decoding Level --- p.88Chapter 7.1 --- Search Process in Cantonese LVCSR --- p.88Chapter 7.2 --- Model-Level Search Space Expansion --- p.90Chapter 7.3 --- State-Level Output Probability Modification --- p.92Chapter 7.4 --- Recognition Experiments --- p.93Chapter 7.4.1 --- Experiment 1 ´ؤModel-Level Search Space Expansion --- p.93Chapter 7.4.2 --- Experiment 2 ´ؤ State-Level Output Probability Modification …… --- p.94Chapter 7.5 --- Summary --- p.96Reference --- p.97Chapter Chapter 8. --- Conclusions and Suggestions for Future Work --- p.98Chapter 8.1 --- Conclusions --- p.98Chapter 8.2 --- Suggestions for Future Work --- p.100Reference --- p.103Appendix I Base Syllable Table --- p.104Appendix II Cantonese Initials and Finals --- p.105Appendix III IF confusion matrix --- p.106Appendix IV Phonetic Question Set --- p.112Appendix V CDDT and PCDT --- p.11
    • …
    corecore