Article thumbnail

Rapid Unsupervised Speaker Adaptation Based on Multi-Template HMM Sufficient Statistics in Noisy Environments

By Randy Gomez, Akinobu Lee, Hiroshi Saruwatari and Kiyohiro Shikano

Abstract

INTERSPEECH2005: the 9th European Conference on Speech Communication and technology, September 4-8, 2005, Lisbon, Portugal.This paper describes a multi-template unsupervised speaker adaptation based on HMM-Sufficient Statistics. Multiple classdependent models based on gender and age are used to push up the adaptation performance while keeping adaptation time within few seconds with just one arbitrary utterance. Adaptation begins with the estimation of speaker's class from the N-best neighbor speakers using Gaussian Mixture Models (GMM) on the way of speaker selection. The corresponding template model is adopted as a base model. The adapted model is rapidly constructed using the selected HMM-Sufficient Statistics. Experiments in noisy environment conditions with 20dB SNR office, crowd, booth, and car noise are performed. The proposed multi-template method achieved 89.5% word correct rate compared with 88.0% of the conventional single-template method, while the baseline recognition rate without adaptation is 85.7%. Moreover, experiments using Vocal Tract Length Normalization (VTLN) and supervised Maximum Likelihood Linear Regression (MLLR) are also compared

Year: 2005
OAI identifier: oai:library.naist.jp:10061/8135

Suggested articles


To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.