9,188 research outputs found
An Investigation into Speaker Informed DNN Front-end for LVCSR
Deep Neural Network (DNN) has become a standard method in many ASR tasks. Recently there is considerable interest in "informed training" of DNNs, where DNN input is augmented with auxiliary codes, such as i-vectors, speaker codes, speaker separation bottleneck (SSBN) features, etc. This paper compares different speaker informed DNN training methods in LVCSR task. We discuss mathematical equivalence between speaker informed DNN training and "bias adaptation" which uses speaker dependent biases, and give detailed analysis on influential factors such as dimension, discrimination and stability of auxiliary codes. The analysis is supported by experiments on a meeting recognition task using bottleneck feature based system. Results show that i-vector based adaptation is also effective in bottleneck feature based system (not just hybrid systems). However all tested methods show poor generalisation to unseen speakers. We introduce a system based on speaker classification followed by speaker adaptation of biases, which yields equivalent performance to an i-vector based system with 10.4% relative improvement over baseline on seen speakers. The new approach can serve as a fast alternative especially for short utterances
Quality Measures for Speaker Verification with Short Utterances
The performances of the automatic speaker verification (ASV) systems degrade
due to the reduction in the amount of speech used for enrollment and
verification. Combining multiple systems based on different features and
classifiers considerably reduces speaker verification error rate with short
utterances. This work attempts to incorporate supplementary information during
the system combination process. We use quality of the estimated model
parameters as supplementary information. We introduce a class of novel quality
measures formulated using the zero-order sufficient statistics used during the
i-vector extraction process. We have used the proposed quality measures as side
information for combining ASV systems based on Gaussian mixture model-universal
background model (GMM-UBM) and i-vector. The proposed methods demonstrate
considerable improvement in speaker recognition performance on NIST SRE
corpora, especially in short duration conditions. We have also observed
improvement over existing systems based on different duration-based quality
measures.Comment: Accepted for publication in Digital Signal Processing: A Review
Journa
- …