thesis

Using duration information in HMM-based automatic speech recognition.

Abstract

Zhu Yu.Thesis (M.Phil.)--Chinese University of Hong Kong, 2005.Includes bibliographical references (leaves 100-104).Abstracts in English and Chinese.Chapter CHAPTER 1 --- lNTRODUCTION --- p.1Chapter 1.1. --- Speech and its temporal structure --- p.1Chapter 1.2. --- Previous work on the modeling of temporal structure --- p.1Chapter 1.3. --- Integrating explicit duration modeling in HMM-based ASR system --- p.3Chapter 1.4. --- Thesis outline --- p.3Chapter CHAPTER 2 --- BACKGROUND --- p.5Chapter 2.1. --- Automatic speech recognition process --- p.5Chapter 2.2. --- HMM for ASR --- p.6Chapter 2.2.1. --- HMM for ASR --- p.6Chapter 2.2.2. --- HMM-based ASR system --- p.7Chapter 2.3. --- General approaches to explicit duration modeling --- p.12Chapter 2.3.1. --- Explicit duration modeling --- p.13Chapter 2.3.2. --- Training of duration model --- p.16Chapter 2.3.3. --- Incorporation of duration model in decoding --- p.18Chapter CHAPTER 3 --- CANTONESE CONNECTD-DlGlT RECOGNITION --- p.21Chapter 3.1. --- Cantonese connected digit recognition --- p.21Chapter 3.1.1. --- Phonetics of Cantonese and Cantonese digit --- p.21Chapter 3.2. --- The baseline system --- p.24Chapter 3.2.1. --- Speech corpus --- p.24Chapter 3.2.2. --- Feature extraction --- p.25Chapter 3.2.3. --- HMM models --- p.26Chapter 3.2.4. --- HMM decoding --- p.27Chapter 3.3. --- Baseline performance and error analysis --- p.27Chapter 3.3.1. --- Recognition performance --- p.27Chapter 3.3.2. --- Performance for different speaking rates --- p.28Chapter 3.3.3. --- Confusion matrix --- p.30Chapter CHAPTER 4 --- DURATION MODELING FOR CANTONESE DIGITS --- p.41Chapter 4.1. --- Duration features --- p.41Chapter 4.1.1. --- Absolute duration feature --- p.41Chapter 4.1.2. --- Relative duration feature --- p.44Chapter 4.2. --- Parametric distribution for duration modeling --- p.47Chapter 4.3. --- Estimation of the model parameters --- p.51Chapter 4.4. --- Speaking-rate-dependent duration model --- p.52Chapter CHAPTER 5 --- USING DURATION MODELING FOR CANTONSE DIGIT RECOGNITION --- p.57Chapter 5.1. --- Baseline decoder --- p.57Chapter 5.2. --- Incorporation of state-level duration model --- p.59Chapter 5.3. --- Incorporation word-level duration model --- p.62Chapter 5.4. --- Weighted use of duration model --- p.65Chapter CHAPTER 6 --- EXPERIMENT RESULT AND ANALYSIS --- p.66Chapter 6.1. --- Experiments with speaking-rate-independent duration models --- p.66Chapter 6.1.1. --- Discussion --- p.68Chapter 6.1.2. --- Analysis of the error patterns --- p.71Chapter 6.1.3. --- "Reduction of deletion, substitution and insertion" --- p.72Chapter 6.1.4. --- Recognition performance at different speaking rates --- p.75Chapter 6.2. --- Experiments with speaking-rate-dependent duration models --- p.77Chapter 6.2.1. --- Using true speaking rate --- p.77Chapter 6.2.2. --- Using estimated speaking rate --- p.79Chapter 6.3. --- Evaluation on another speech database --- p.80Chapter 6.3.1. --- Experimental setup --- p.80Chapter 6.3.2. --- Experiment results and analysis --- p.82Chapter CHAPTER 7 --- CONCLUSIONS AND FUTUR WORK --- p.87Chapter 7.1. --- Conclusion and understanding of current work --- p.87Chapter 7.2. --- Future work --- p.89Chapter A --- APPENDIX --- p.90BIBLIOGRAPHY --- p.10

    Similar works