302 research outputs found
Joint Bayesian Gaussian discriminant analysis for speaker verification
State-of-the-art i-vector based speaker verification relies on variants of
Probabilistic Linear Discriminant Analysis (PLDA) for discriminant analysis. We
are mainly motivated by the recent work of the joint Bayesian (JB) method,
which is originally proposed for discriminant analysis in face verification. We
apply JB to speaker verification and make three contributions beyond the
original JB. 1) In contrast to the EM iterations with approximated statistics
in the original JB, the EM iterations with exact statistics are employed and
give better performance. 2) We propose to do simultaneous diagonalization (SD)
of the within-class and between-class covariance matrices to achieve efficient
testing, which has broader application scope than the SVD-based efficient
testing method in the original JB. 3) We scrutinize similarities and
differences between various Gaussian PLDAs and JB, complementing the previous
analysis of comparing JB only with Prince-Elder PLDA. Extensive experiments are
conducted on NIST SRE10 core condition 5, empirically validating the
superiority of JB with faster convergence rate and 9-13% EER reduction compared
with state-of-the-art PLDA.Comment: accepted by ICASSP201
Exploring the Encoding Layer and Loss Function in End-to-End Speaker and Language Recognition System
In this paper, we explore the encoding/pooling layer and loss function in the
end-to-end speaker and language recognition system. First, a unified and
interpretable end-to-end system for both speaker and language recognition is
developed. It accepts variable-length input and produces an utterance level
result. In the end-to-end system, the encoding layer plays a role in
aggregating the variable-length input sequence into an utterance level
representation. Besides the basic temporal average pooling, we introduce a
self-attentive pooling layer and a learnable dictionary encoding layer to get
the utterance level representation. In terms of loss function for open-set
speaker verification, to get more discriminative speaker embedding, center loss
and angular softmax loss is introduced in the end-to-end system. Experimental
results on Voxceleb and NIST LRE 07 datasets show that the performance of
end-to-end learning system could be significantly improved by the proposed
encoding layer and loss function.Comment: Accepted for Speaker Odyssey 201
- …