Article thumbnail
Location of Repository

Improved phonetic and lexical speaker recognition through MAP adaptation

By Robert J. Vogt, Brendan J. Baker and Sridha Sridharan


High level features such as phone and word n-grams have been shown to be effective for speaker recognition, particularly when used along side traditional acoustic speaker recognition techniques. The applicability of these high-level recognition systems is impeded by the large training data requirements needed to build robust and stable speaker models. This paper describes an extension to an existing phone n-gram based speaker recognition technique, whereby MAP adaptation is used in the speaker model training process. Results obtained for the NIST 2003 Speaker Recognition Extended Data Task indicate that a significant improvement in performance can be gained through the use of this model estimation technique. In our tests, we were able to improve performance over the baseline system, and at the same time, halved the training data requirement. Further experimentation using MAP adaptation on word n-gram models also showed improvement over baseline results, suggesting that the technique could be applied to other multinomial distribution feature sets

Topics: 080107 Natural Language Processing
Publisher: International Speech Communication Association (ISCA)
Year: 2004
OAI identifier:

Suggested articles


  1. (2002). An overview of automatic speaker recognition technology,”
  2. (1994). Automatic language identification of telephone speech messages usinjg phoneme recognition and n-gram modelling,”
  3. (1996). Bayesian adaptive learning and map estimation of hmm,” in Auotmatic speech and speaker recognition : Advanced topics,
  4. (2003). Dependenceofgmmadaptationonfeaturepost-processing for speaker recognition,”
  5. (2002). Genderdependent phonetic refraction for speaker recognition,”
  6. (2001). Phonetic, idiolectal, and acoustic speaker recognition,” in Speaker Odyssey Workshop,
  7. (2001). Speaker recognition based on idiolectal differences between speakers,”
  8. (2000). Speaker verification using adapted gaussian mixture models,”
  9. SWITCHBOARD: A user’s manual,” Linguistic Data Consortium,
  10. (2003). The NIST Year

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.