Search CORE

3 research outputs found

Language modeling for speech recognition of spoken Cantonese.

Author
Publication venue
Publication date: 01/01/2009
Field of study

Yeung, Yu Ting.Thesis (M.Phil.)--Chinese University of Hong Kong, 2009.Includes bibliographical references (leaves 84-93).Abstracts in English and Chinese.Acknowledgement --- p.iiiAbstract --- p.ivChapter 1 --- Introduction --- p.1Chapter 1.1 --- Cantonese Speech Recognition --- p.3Chapter 1.2 --- Objectives --- p.4Chapter 1.3 --- Thesis Outline --- p.5Chapter 2 --- Fundamentals of Large Vocabulary Continuous Speech Recognition --- p.7Chapter 2.1 --- Problem Formulation --- p.7Chapter 2.2 --- Feature Extraction --- p.8Chapter 2.3 --- Acoustic Models --- p.9Chapter 2.4 --- Decoding --- p.10Chapter 2.5 --- Statistical Language Modeling --- p.12Chapter 2.5.1 --- N-gram Language Models --- p.12Chapter 2.5.2 --- N-gram Smoothing --- p.13Chapter 2.5.3 --- Complexity of Language Model --- p.15Chapter 2.5.4 --- Class-based Langauge Model --- p.16Chapter 2.5.5 --- Language Model Pruning --- p.17Chapter 2.6 --- Performance Evaluation --- p.18Chapter 3 --- The Cantonese Dialect --- p.19Chapter 3.1 --- Phonology of Cantonese --- p.19Chapter 3.2 --- Orthographic Representation of Cantonese --- p.22Chapter 3.3 --- Classification of Cantonese speech --- p.25Chapter 3.4 --- Cantonese-English Code-mixing --- p.27Chapter 4 --- Rule-based Translation Method --- p.29Chapter 4.1 --- Motivations --- p.29Chapter 4.2 --- Transformation-based Learning --- p.30Chapter 4.2.1 --- Algorithm Overview --- p.30Chapter 4.2.2 --- Learning of Translation Rules --- p.32Chapter 4.3 --- Performance Evaluation --- p.35Chapter 4.3.1 --- The Learnt Translation Rules --- p.35Chapter 4.3.2 --- Evaluation of the Rules --- p.37Chapter 4.3.3 --- Analysis of the Rules --- p.37Chapter 4.4 --- Preparation of Training Data for Language Modeling --- p.41Chapter 4.5 --- Discussion --- p.43Chapter 5 --- Language Modeling for Cantonese --- p.44Chapter 5.1 --- Training Data --- p.44Chapter 5.1.1 --- Text Corpora --- p.44Chapter 5.1.2 --- Preparation of Formal Cantonese Text Data --- p.45Chapter 5.2 --- Training of Language Models --- p.46Chapter 5.2.1 --- Language Models for Standard Chinese --- p.46Chapter 5.2.2 --- Language Models for Formal Cantonese --- p.46Chapter 5.2.3 --- Language models for Colloquial Cantonese --- p.47Chapter 5.3 --- Evaluation of Language Models --- p.48Chapter 5.3.1 --- Speech Corpora for Evaluation --- p.48Chapter 5.3.2 --- Perplexities of Formal Cantonese Language Models --- p.49Chapter 5.3.3 --- Perplexities of Colloquial Cantonese Language Models --- p.51Chapter 5.4 --- Speech Recognition Experiments --- p.53Chapter 5.4.1 --- Speech Corpora --- p.53Chapter 5.4.2 --- Experimental Setup --- p.54Chapter 5.4.3 --- Results on Formal Cantonese Models --- p.55Chapter 5.4.4 --- Results on Colloquial Cantonese Models --- p.56Chapter 5.5 --- Analysis of Results --- p.58Chapter 5.6 --- Discussion --- p.59Chapter 5.6.1 --- Cantonese Language Modeling --- p.59Chapter 5.6.2 --- Interpolated Language Models --- p.59Chapter 5.6.3 --- Class-based Language Models --- p.60Chapter 6 --- Towards Language Modeling of Code-mixing Speech --- p.61Chapter 6.1 --- Data Collection --- p.61Chapter 6.1.1 --- Data Collection --- p.62Chapter 6.1.2 --- Filtering of Collected Data --- p.63Chapter 6.1.3 --- Processing of Collected Data --- p.63Chapter 6.2 --- Clustering of Chinese and English Words --- p.64Chapter 6.3 --- Language Modeling for Code-mixing Speech --- p.64Chapter 6.3.1 --- Language Models from Collected Data --- p.64Chapter 6.3.2 --- Class-based Language Models --- p.66Chapter 6.3.3 --- Performance Evaluation of Code-mixing Language Models --- p.67Chapter 6.4 --- Speech Recognition Experiments with Code-mixing Language Models --- p.69Chapter 6.4.1 --- Experimental Setup --- p.69Chapter 6.4.2 --- Monolingual Cantonese Recognition --- p.70Chapter 6.4.3 --- Code-mixing Speech Recognition --- p.72Chapter 6.5 --- Discussion --- p.74Chapter 6.5.1 --- Data Collection from the Internet --- p.74Chapter 6.5.2 --- Speech Recognition of Code-mixing Speech --- p.75Chapter 7 --- Conclusions and Future Work --- p.77Chapter 7.1 --- Conclusions --- p.77Chapter 7.1.1 --- Rule-based Translation Method --- p.77Chapter 7.1.2 --- Cantonese Language Modeling --- p.78Chapter 7.1.3 --- Code-mixing Language Modeling --- p.78Chapter 7.2 --- Future Works --- p.79Chapter 7.2.1 --- Rule-based Translation --- p.79Chapter 7.2.2 --- Training data --- p.80Chapter 7.2.3 --- Code-mixing speech --- p.80Chapter A --- Equation Derivation --- p.82Chapter A.l --- Relationship between Average Mutual Information and Perplexity --- p.82Bibliography --- p.8

CUHK Digital Repository

Automatic speech recognition of Cantonese-English code-mixing utterances.

Author
Publication venue
Publication date: 01/01/2005
Field of study

Chan Yeuk Chi Joyce.Thesis (M.Phil.)--Chinese University of Hong Kong, 2005.Includes bibliographical references.Abstracts in English and Chinese.Chapter Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Background --- p.1Chapter 1.2 --- Previous Work on Code-switching Speech Recognition --- p.2Chapter 1.2.1 --- Keyword Spotting Approach --- p.3Chapter 1.2.2 --- Translation Approach --- p.4Chapter 1.2.3 --- Language Boundary Detection --- p.6Chapter 1.3 --- Motivations of Our Work --- p.7Chapter 1.4 --- Methodology --- p.8Chapter 1.5 --- Thesis Outline --- p.10Chapter 1.6 --- References --- p.11Chapter Chapter 2 --- Fundamentals of Large Vocabulary Continuous Speech Recognition for Cantonese and English --- p.14Chapter 2.1 --- Basic Theory of Speech Recognition --- p.14Chapter 2.1.1 --- Feature Extraction --- p.14Chapter 2.1.2 --- Maximum a Posteriori (MAP) Probability --- p.15Chapter 2.1.3 --- Hidden Markov Model (HMM) --- p.16Chapter 2.1.4 --- Statistical Language Modeling --- p.17Chapter 2.1.5 --- Search A lgorithm --- p.18Chapter 2.2 --- Word Posterior Probability (WPP) --- p.19Chapter 2.3 --- Generalized Word Posterior Probability (GWPP) --- p.23Chapter 2.4 --- Characteristics of Cantonese --- p.24Chapter 2.4.1 --- Cantonese Phonology --- p.24Chapter 2.4.2 --- Variation and Change in Pronunciation --- p.27Chapter 2.4.3 --- Syllables and Characters in Cantonese --- p.28Chapter 2.4.4 --- Spoken Cantonese vs. Written Chinese --- p.28Chapter 2.5 --- Characteristics of English --- p.30Chapter 2.5.1 --- English Phonology --- p.30Chapter 2.5.2 --- English with Cantonese Accents --- p.31Chapter 2.6 --- References --- p.32Chapter Chapter 3 --- Code-mixing and Code-switching Speech Recognition --- p.35Chapter 3.1 --- Introduction --- p.35Chapter 3.2 --- Definition --- p.35Chapter 3.2.1 --- Monolingual Speech Recognition --- p.35Chapter 3.2.2 --- Multilingual Speech Recognition --- p.35Chapter 3.2.3 --- Code-mixing and Code-switching --- p.36Chapter 3.3 --- Conversation in Hong Kong --- p.38Chapter 3.3.1 --- Language Choice of Hong Kong People --- p.38Chapter 3.3.2 --- Reasons for Code-mixing in Hong Kong --- p.40Chapter 3.3.3 --- How Does Code-mixing Occur? --- p.41Chapter 3.4 --- Difficulties for Code-mixing - Specific to Cantonese-English --- p.44Chapter 3.4.1 --- Phonetic Differences --- p.45Chapter 3.4.2 --- Phonology difference --- p.48Chapter 3.4.3 --- Accent and Borrowing --- p.49Chapter 3.4.4 --- Lexicon and Grammar --- p.49Chapter 3.4.5 --- Lack of Appropriate Speech Corpus --- p.50Chapter 3.5 --- References --- p.50Chapter Chapter 4 --- Data Collection --- p.53Chapter 4.1 --- Data Collection --- p.53Chapter 4.1.1 --- Corpus Design --- p.53Chapter 4.1.2 --- Recording Setup --- p.59Chapter 4.1.3 --- Post-processing of Speech Data --- p.60Chapter 4.2 --- A Baseline Database --- p.61Chapter 4.2.1 --- Monolingual Spoken Cantonese Speech Data (CUMIX) --- p.61Chapter 4.3 --- References --- p.61Chapter Chapter 5 --- System Design and Experimental Setup --- p.63Chapter 5.1 --- Overview of the Code-mixing Speech Recognizer --- p.63Chapter 5.1.1 --- Bilingual Syllable / Word-based Speech Recognizer --- p.63Chapter 5.1.2 --- Language Boundary Detection --- p.64Chapter 5.1.3 --- Generalized Word Posterior Probability (GWPP) --- p.65Chapter 5.2 --- Acoustic Modeling --- p.66Chapter 5.2.1 --- Speech Corpus for Training of Acoustic Models --- p.67Chapter 5.2.2 --- Features Extraction --- p.69Chapter 5.2.3 --- Variability in the Speech Signal --- p.69Chapter 5.2.4 --- Language Dependency of the Acoustic Models --- p.71Chapter 5.2.5 --- Pronunciation Dictionary --- p.80Chapter 5.2.6 --- The Training Process of Acoustic Models --- p.83Chapter 5.2.7 --- Decoding and Evaluation --- p.88Chapter 5.3 --- Language Modeling --- p.90Chapter 5.3.1 --- N-gram Language Model --- p.91Chapter 5.3.2 --- Difficulties in Data Collection --- p.91Chapter 5.3.3 --- Text Data for Training Language Model --- p.92Chapter 5.3.4 --- Training Tools --- p.95Chapter 5.3.5 --- Training Procedure --- p.95Chapter 5.3.6 --- Evaluation of the Language Models --- p.98Chapter 5.4 --- Language Boundary Detection --- p.99Chapter 5.4.1 --- Phone-based LBD --- p.100Chapter 5.4.2 --- Syllable-based LBD --- p.104Chapter 5.4.3 --- LBD Based on Syllable Lattice --- p.106Chapter 5.5 --- "Integration of the Acoustic Model Scores, Language Model Scores and Language Boundary Information" --- p.107Chapter 5.5.1 --- Integration of Acoustic Model Scores and Language Boundary Information. --- p.107Chapter 5.5.2 --- Integration of Modified Acoustic Model Scores and Language Model Scores --- p.109Chapter 5.5.3 --- Evaluation Criterion --- p.111Chapter 5.6 --- References --- p.112Chapter Chapter 6 --- Results and Analysis --- p.118Chapter 6.1 --- Speech Data for Development and Evaluation --- p.118Chapter 6.1.1 --- Development Data --- p.118Chapter 6.1.2 --- Testing Data --- p.118Chapter 6.2 --- Performance of Different Acoustic Units --- p.119Chapter 6.2.1 --- Analysis of Results --- p.120Chapter 6.3 --- Language Boundary Detection --- p.122Chapter 6.3.1 --- Phone-based Language Boundary Detection --- p.123Chapter 6.3.2 --- Syllable-based Language Boundary Detection (SYL LB) --- p.127Chapter 6.3.3 --- Language Boundary Detection Based on Syllable Lattice (BILINGUAL LBD) --- p.129Chapter 6.3.4 --- Observations --- p.129Chapter 6.4 --- Evaluation of the Language Models --- p.130Chapter 6.4.1 --- Character Perplexity --- p.130Chapter 6.4.2 --- Phonetic-to-text Conversion Rate --- p.131Chapter 6.4.3 --- Observations --- p.131Chapter 6.5 --- Character Error Rate --- p.132Chapter 6.5.1 --- Without Language Boundary Information --- p.133Chapter 6.5.2 --- With Language Boundary Detector SYL LBD --- p.134Chapter 6.5.3 --- With Language Boundary Detector BILINGUAL-LBD --- p.136Chapter 6.5.4 --- Observations --- p.138Chapter 6.6 --- References --- p.141Chapter Chapter 7 --- Conclusions and Suggestions for Future Work --- p.143Chapter 7.1 --- Conclusion --- p.143Chapter 7.1.1 --- Difficulties and Solutions --- p.144Chapter 7.2 --- Suggestions for Future Work --- p.149Chapter 7.2.1 --- Acoustic Modeling --- p.149Chapter 7.2.2 --- Pronunciation Modeling --- p.149Chapter 7.2.3 --- Language Modeling --- p.150Chapter 7.2.4 --- Speech Data --- p.150Chapter 7.2.5 --- Language Boundary Detection --- p.151Chapter 7.3 --- References --- p.151Appendix A Code-mixing Utterances in Training Set of CUMIX --- p.152Appendix B Code-mixing Utterances in Testing Set of CUMIX --- p.175Appendix C Usage of Speech Data in CUMIX --- p.20

CUHK Digital Repository