SIGNAL MODELING WITH NON-UNIFORM TIME SAMPLING OF FEATURES FOR AUTOMATIC SPEECH RECOGNITION

Abstract

This dissertation presents an investigation of nonuniform time sampling methods for spectral/temporal feature extraction in speech. Frame-based features were computed based on an encoding of the global spectral shape using a Discrete Cosine Transform. In most current “standard” methods, trajectory (dynamic) features are determined from frame-based parameters using a fixed time sampling, i.e., fixed block length and fixed block spacing. In this research, new methods are proposed and investigated in which block length and/or block spacing are variable. The idea was initially tested with HMM-based isolated word recognition, and a significant performance improvement resulted when a variable block length and variable block method were applied. An accuracy of 97.9 % was obtained with an alphabet recognition task using the ISOLET database. This result i

    Similar works