Skip to main content
Article thumbnail
Location of Repository

A Gaussian Mixture Model Spectral Representation for Speech Recognition

By Matthew Nicholas Stuttle

Abstract

Most modern speech recognition systems use either Mel-frequency cepstral coefficients or perceptual linear prediction as acoustic features. Recently, there has been some interest in alternative speech parameterisations based on using formant features. Formants are the resonant frequencies in the vocal tract which form the characteristic shape of the speech spectrum. However, formants are difficult to reliably and robustly estimate from the speech signal and in some cases may not be clearly present. Rather than estimating the resonant frequencies, formant-like features can be used instead. Formant-like features use the characteristics of the spectral peaks to represent the spectrum. In this work, novel features are developed based on estimating a Gaussian mixture model (GMM) from the speech spectrum. This approach has previously been used sucessfully as a speech codec. The EM algorithm is used to estimate the parameters of the GMM. The extracted parameters: the means, standard deviations and component weights can be related to the formant locations, bandwidths and magnitudes. As the features directly represent the linear spectrum, it is possibly to apply techniques for vocal tract length normalisation and additive nois

Year: 2003
OAI identifier: oai:CiteSeerX.psu:10.1.1.135.7226
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://mi.eng.cam.ac.uk/~mjfg/... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.