Article thumbnail
Location of Repository

[[alternative]]Robust And Discriminative Feature Extraction Techniques For Large Vocabulary Continuous Speech Recognition

By [[author]]張志豪 and 張志豪

Abstract

[[abstract]]Speech is the primary and the most convenient means of communication between people. Due to the successful development of much smaller electronic devices and the popularity of wireless communication and networking, it is widely believed that speech will play a more active role and will serve as the major human-machine interface for the interaction between people and different kinds of smart devices in the near future. Therefore, research on automatic speech recognition (ASR) is now becoming more and more emphasized, and in which the development of discriminative as well as robust feature extraction approaches for ASR to be deployed in real and diverse environments has continuously gained much attention over the past two decades. With the above observation in mind, in this thesis we studied the techniques of auditory-perception-based feature extraction and data-driven linear feature transformation for robust speech recognition. For auditory-perception-based feature extraction, we extensively compares the conventional Mel-frequency Cepstral Coefficients (MFCC) with the Perceptual Linear Prediction Coefficients (PLPC), as well as compared various ways to derive and combine their corresponding time trajectory information. For data-driven linear feature transformation, we started with the attempt to show the superior performance of the linear discriminant analysis (LDA) over that of the principal component analysis (PCA) in the feature transformation for speech recognition. We then investigated several improved approaches, such as the heteroscedastic linear discriminant analysis (HLDA) and heteroscedastic discriminant analysis (HDA) etc., for removing the inherent assumption of the same cluster variation in the derivation of LDA. Moreover, we proposed the use of the minimum classification error (MCE) and maximum mutual information (MMI) criteria, respectively, in the optimization of the transformation matrices, in comparison to the maximum likelihood (ML) criterion. Finally, the maximum likelihood linear transformation (MLLT) and other robust techniques, such as the feature mean subtraction or/and variance normalization were further applied. All experiments were carried out on the Mandarin broadcast news corpus (MATBN). Very promising experimental results were initially indicated.

Topics: 資料相關線性特徵轉換, 主成份分析, 線性鑑別分析, 異質性線性鑑別分析, 異質性鑑別分析, 最大相似度線性轉換, Data-driven Linear Feature Transformation, Principal Component Analysis, Linear Discriminant Analysis, Heteroscedastic Linear Discriminant Analysis, Heteroscedastic Discriminant Analysis, Maximum Likelihood Linear Transformation, [[classification]]42
Year: 2010
OAI identifier: oai:ir.lib.ntnu.edu.tw:309250000Q/17966
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://ir.lib.ntnu.edu.tw/ir/h... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.