Article thumbnail
Location of Repository

An Initial Study on English Continuous Speech Recognition

By 許庭瑋 and TingWei Hsu

Abstract

[[abstract]]This thesis is intended to perform a preliminary study on English continuous speech recognition. An English continous speech recognizer was implemented, while parts of its major constituents, including speech feature extraction, acoustic modeling and language modeling, were extensively investigated as well. First, for speech feature extraction, we compared the performance of linear discriminant analysis (LDA) and heteroscedastic linear discriminant analysis (HLDA) to that of the conventional Mel-frequency cepstral coefficients (MFCC) .Second, for acoustic modeling, we explored the use of the intra-word triphone models, the state-tying scheme and the phone confusion matrix, as well as the unsupervised training of acoustic models, for better speech recognition results. Finally, for language modeling, both count-merging and model-interpolation approaches were respectively expoited to combine the background and in-domain language model training corpora to enable better prediction of word occurrences during the speech recognition process. The experiments were conducted on the Voice of America (VOA) and the English Across Taiwan (EAT) corpora.

Topics: 連續語音辨識;詞內三連音素模型;狀態連結;音素模糊矩陣, Continuous Speech Recognition;Intra Triphone;State tying;Confusion Matrix, [[classification]]42
Year: 2011
OAI identifier: oai:ir.lib.ntnu.edu.tw:309250000Q/74256
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://ir.lib.ntnu.edu.tw/ir/h... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.