Search CORE

3 research outputs found

Adaptive training using structured transforms

Author: Gales MJF
Yu K
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/05/2004
Field of study

CUED - Cambridge University Engineering Department

ADAPTIVE TRAINING USING STRUCTURED TRANSFORMS

Author
Publication venue
Publication date
Field of study

Adaptive training is an important approach to train speech recognition systems on found, non-homogeneous, data. Standard adaptive training employs a single transform to represent unwanted acoustic variability for an utterance. A canonical model representing only the inherent speech variability may then be trained given this set of transforms. For found data there are commonly multiple acoustic factors affecting the speech signal. This paper investigates the use of multiple forms of transformations, structured transforms (ST), to represent the complex non-speech variabilities in an adaptive training framework. Two forms of transform are considered, cluster mean interpolation and constrained MLLR. Re-estimation formulae for estimating the canonical model using both maximum likelihood and minimum phone error training are presented. Experiments to compare ST to standard adaptive training schemes were performed on a conversational telephone speech task. ST were found to significantly reduce the word error rate. 1

CiteSeerX

ADAPTIVE TRAINING USING STRUCTURED TRANSFORMS

Author
Publication venue
Publication date
Field of study

Adaptive training is an important approach to train speech recognition systems on found, non-homogeneous, data. Standard approach employs a single transform to represent unwanted acoustic variability. However, for found data there are commonly multiple acoustic factors affecting the speech signal. This paper investigates the use of multiple forms of transformations, structured transforms (ST), to represent the complex non-speech variabilities in an adaptive training framework. Two forms of transformations are considered, cluster mean interpolation and constrained MLLR, consequently, the canonical model here is a multi-cluster HMM model. Both ML and minimum phone error (MPE) reestimation formulae for the canonical model, are presented. This multi-cluster MPE training is also applicable to eigenvoice systems. Experiments to compare ST to standard adaptive training schemes were performed on a conversational telephone speech task. ST were found to significantly reduce the word error rate. 1

CiteSeerX