4 research outputs found

    Phone Merging for Code-switched Speech Recognition

    Get PDF
    International audienceSpeakers in multilingual communities often switch between or mix multiple languages in the same conversation. Automatic Speech Recognition (ASR) of code-switched speech faces many challenges including the influence of phones of different languages on each other. This paper shows evidence that phone sharing between languages improves the Acoustic Model performance for Hindi-English code-switched speech. We compare base-line system built with separate phones for Hindi and English with systems where the phones were manually merged based on linguistic knowledge. Encouraged by the improved ASR performance after manually merging the phones, we further investigate multiple data-driven methods to identify phones to be merged across the languages. We show detailed analysis of automatic phone merging in this language pair and the impact it has on individual phone accuracies and WER. Though the best performance gain of 1.2% WER was observed with manually merged phones, we show experimentally that the manual phone merge is not optimal

    Automatic speech recognition of Cantonese-English code-mixing utterances.

    Get PDF
    Chan Yeuk Chi Joyce.Thesis (M.Phil.)--Chinese University of Hong Kong, 2005.Includes bibliographical references.Abstracts in English and Chinese.Chapter Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Background --- p.1Chapter 1.2 --- Previous Work on Code-switching Speech Recognition --- p.2Chapter 1.2.1 --- Keyword Spotting Approach --- p.3Chapter 1.2.2 --- Translation Approach --- p.4Chapter 1.2.3 --- Language Boundary Detection --- p.6Chapter 1.3 --- Motivations of Our Work --- p.7Chapter 1.4 --- Methodology --- p.8Chapter 1.5 --- Thesis Outline --- p.10Chapter 1.6 --- References --- p.11Chapter Chapter 2 --- Fundamentals of Large Vocabulary Continuous Speech Recognition for Cantonese and English --- p.14Chapter 2.1 --- Basic Theory of Speech Recognition --- p.14Chapter 2.1.1 --- Feature Extraction --- p.14Chapter 2.1.2 --- Maximum a Posteriori (MAP) Probability --- p.15Chapter 2.1.3 --- Hidden Markov Model (HMM) --- p.16Chapter 2.1.4 --- Statistical Language Modeling --- p.17Chapter 2.1.5 --- Search A lgorithm --- p.18Chapter 2.2 --- Word Posterior Probability (WPP) --- p.19Chapter 2.3 --- Generalized Word Posterior Probability (GWPP) --- p.23Chapter 2.4 --- Characteristics of Cantonese --- p.24Chapter 2.4.1 --- Cantonese Phonology --- p.24Chapter 2.4.2 --- Variation and Change in Pronunciation --- p.27Chapter 2.4.3 --- Syllables and Characters in Cantonese --- p.28Chapter 2.4.4 --- Spoken Cantonese vs. Written Chinese --- p.28Chapter 2.5 --- Characteristics of English --- p.30Chapter 2.5.1 --- English Phonology --- p.30Chapter 2.5.2 --- English with Cantonese Accents --- p.31Chapter 2.6 --- References --- p.32Chapter Chapter 3 --- Code-mixing and Code-switching Speech Recognition --- p.35Chapter 3.1 --- Introduction --- p.35Chapter 3.2 --- Definition --- p.35Chapter 3.2.1 --- Monolingual Speech Recognition --- p.35Chapter 3.2.2 --- Multilingual Speech Recognition --- p.35Chapter 3.2.3 --- Code-mixing and Code-switching --- p.36Chapter 3.3 --- Conversation in Hong Kong --- p.38Chapter 3.3.1 --- Language Choice of Hong Kong People --- p.38Chapter 3.3.2 --- Reasons for Code-mixing in Hong Kong --- p.40Chapter 3.3.3 --- How Does Code-mixing Occur? --- p.41Chapter 3.4 --- Difficulties for Code-mixing - Specific to Cantonese-English --- p.44Chapter 3.4.1 --- Phonetic Differences --- p.45Chapter 3.4.2 --- Phonology difference --- p.48Chapter 3.4.3 --- Accent and Borrowing --- p.49Chapter 3.4.4 --- Lexicon and Grammar --- p.49Chapter 3.4.5 --- Lack of Appropriate Speech Corpus --- p.50Chapter 3.5 --- References --- p.50Chapter Chapter 4 --- Data Collection --- p.53Chapter 4.1 --- Data Collection --- p.53Chapter 4.1.1 --- Corpus Design --- p.53Chapter 4.1.2 --- Recording Setup --- p.59Chapter 4.1.3 --- Post-processing of Speech Data --- p.60Chapter 4.2 --- A Baseline Database --- p.61Chapter 4.2.1 --- Monolingual Spoken Cantonese Speech Data (CUMIX) --- p.61Chapter 4.3 --- References --- p.61Chapter Chapter 5 --- System Design and Experimental Setup --- p.63Chapter 5.1 --- Overview of the Code-mixing Speech Recognizer --- p.63Chapter 5.1.1 --- Bilingual Syllable / Word-based Speech Recognizer --- p.63Chapter 5.1.2 --- Language Boundary Detection --- p.64Chapter 5.1.3 --- Generalized Word Posterior Probability (GWPP) --- p.65Chapter 5.2 --- Acoustic Modeling --- p.66Chapter 5.2.1 --- Speech Corpus for Training of Acoustic Models --- p.67Chapter 5.2.2 --- Features Extraction --- p.69Chapter 5.2.3 --- Variability in the Speech Signal --- p.69Chapter 5.2.4 --- Language Dependency of the Acoustic Models --- p.71Chapter 5.2.5 --- Pronunciation Dictionary --- p.80Chapter 5.2.6 --- The Training Process of Acoustic Models --- p.83Chapter 5.2.7 --- Decoding and Evaluation --- p.88Chapter 5.3 --- Language Modeling --- p.90Chapter 5.3.1 --- N-gram Language Model --- p.91Chapter 5.3.2 --- Difficulties in Data Collection --- p.91Chapter 5.3.3 --- Text Data for Training Language Model --- p.92Chapter 5.3.4 --- Training Tools --- p.95Chapter 5.3.5 --- Training Procedure --- p.95Chapter 5.3.6 --- Evaluation of the Language Models --- p.98Chapter 5.4 --- Language Boundary Detection --- p.99Chapter 5.4.1 --- Phone-based LBD --- p.100Chapter 5.4.2 --- Syllable-based LBD --- p.104Chapter 5.4.3 --- LBD Based on Syllable Lattice --- p.106Chapter 5.5 --- "Integration of the Acoustic Model Scores, Language Model Scores and Language Boundary Information" --- p.107Chapter 5.5.1 --- Integration of Acoustic Model Scores and Language Boundary Information. --- p.107Chapter 5.5.2 --- Integration of Modified Acoustic Model Scores and Language Model Scores --- p.109Chapter 5.5.3 --- Evaluation Criterion --- p.111Chapter 5.6 --- References --- p.112Chapter Chapter 6 --- Results and Analysis --- p.118Chapter 6.1 --- Speech Data for Development and Evaluation --- p.118Chapter 6.1.1 --- Development Data --- p.118Chapter 6.1.2 --- Testing Data --- p.118Chapter 6.2 --- Performance of Different Acoustic Units --- p.119Chapter 6.2.1 --- Analysis of Results --- p.120Chapter 6.3 --- Language Boundary Detection --- p.122Chapter 6.3.1 --- Phone-based Language Boundary Detection --- p.123Chapter 6.3.2 --- Syllable-based Language Boundary Detection (SYL LB) --- p.127Chapter 6.3.3 --- Language Boundary Detection Based on Syllable Lattice (BILINGUAL LBD) --- p.129Chapter 6.3.4 --- Observations --- p.129Chapter 6.4 --- Evaluation of the Language Models --- p.130Chapter 6.4.1 --- Character Perplexity --- p.130Chapter 6.4.2 --- Phonetic-to-text Conversion Rate --- p.131Chapter 6.4.3 --- Observations --- p.131Chapter 6.5 --- Character Error Rate --- p.132Chapter 6.5.1 --- Without Language Boundary Information --- p.133Chapter 6.5.2 --- With Language Boundary Detector SYL LBD --- p.134Chapter 6.5.3 --- With Language Boundary Detector BILINGUAL-LBD --- p.136Chapter 6.5.4 --- Observations --- p.138Chapter 6.6 --- References --- p.141Chapter Chapter 7 --- Conclusions and Suggestions for Future Work --- p.143Chapter 7.1 --- Conclusion --- p.143Chapter 7.1.1 --- Difficulties and Solutions --- p.144Chapter 7.2 --- Suggestions for Future Work --- p.149Chapter 7.2.1 --- Acoustic Modeling --- p.149Chapter 7.2.2 --- Pronunciation Modeling --- p.149Chapter 7.2.3 --- Language Modeling --- p.150Chapter 7.2.4 --- Speech Data --- p.150Chapter 7.2.5 --- Language Boundary Detection --- p.151Chapter 7.3 --- References --- p.151Appendix A Code-mixing Utterances in Training Set of CUMIX --- p.152Appendix B Code-mixing Utterances in Testing Set of CUMIX --- p.175Appendix C Usage of Speech Data in CUMIX --- p.20
    corecore