Improved i-Vector Representation for Speaker Diarization

G Hinton; I McLoughlin; Ian McLoughlin; Kui Wu; N Dehak; P Kenny; S Tranter; Y Song; Yan Song; Yan Xu

research

Improved i-Vector Representation for Speaker Diarization

Authors: G Hinton
I McLoughlin
Ian McLoughlin
Kui Wu
N Dehak
P Kenny
S Tranter
Y Song
Yan Song
Yan Xu
Publication date: 1 January 2015
Publisher: 'Springer Science and Business Media LLC'
Doi

Abstract

This paper proposes using a previously well-trained deep neural network (DNN) to enhance the i-vector representation used for speaker diarization. In effect, we replace the Gaussian Mixture Model (GMM) typically used to train a Universal Background Model (UBM), with a DNN that has been trained using a different large scale dataset. To train the T-matrix we use a supervised UBM obtained from the DNN using filterbank input features to calculate the posterior information, and then MFCC features to train the UBM instead of a traditional unsupervised UBM derived from single features. Next we jointly use DNN and MFCC features to calculate the zeroth and first order Baum-Welch statistics for training an extractor from which we obtain the i-vector. The system will be shown to achieve a significant improvement on the NIST 2008 speaker recognition evaluation (SRE) telephone data task compared to state-of-the-art approaches