Modeling of Spectra and Temporal Trajectories in Speech Processing

Abstract

This work investigates the application of spectral and temporal speech processing algorithms developed for feature extraction in Automatic Speech Recognition (ASR) and for very low bit-rate speech coding. In the first part of the thesis, various spectral processing feature extraction techniques are investigated for robust parameterization of speech. We are especially focused on all-pole modeling based techniques that use, as the major processing block, autoregressive model to suppress speaker-dependent details in the auditory spectrum. Such techniques that use the model spectrum are advantageous as opposed to directly using signal auditory spectrum. The model spectrum can be represented by various types of parameters that have different properties (decorrelation property, quantization, robustness on additive and convolutive noise,...). We show that even though cepstrum-based speech features are mostly used for ASR, the best recognition performances are achieved using decorrelated and normalized Line Spectral Frequencies (LSFs). Furthermore, frequency selective and discrete all-pole modeling approaches are studied and their efficient properties on final speech features are presented. We take also into account feature normalization techniques and mention their influence on extracted speech features. The most significant experimental results are achieved on well-known SpeechDat-Ca

    Similar works

    Full text

    thumbnail-image

    Available Versions