12 research outputs found
Investigation of Frame Alignments for GMM-based Digit-prompted Speaker Verification
Frame alignments can be computed by different methods in GMM-based speaker
verification. By incorporating a phonetic Gaussian mixture model (PGMM), we are
able to compare the performance using alignments extracted from the deep neural
networks (DNN) and the conventional hidden Markov model (HMM) in digit-prompted
speaker verification. Based on the different characteristics of these two
alignments, we present a novel content verification method to improve the
system security without much computational overhead. Our experiments on the
RSR2015 Part-3 digit-prompted task show that, the DNN based alignment performs
on par with the HMM alignment. The results also demonstrate the effectiveness
of the proposed Kullback-Leibler (KL) divergence based scoring to reject speech
with incorrect pass-phrases.Comment: accepted by APSIPA ASC 201
Supervector extraction for encoding speaker and phrase information with neural networks for text-dependent speaker verification
In this paper, we propose a new differentiable neural network with an alignment mechanism for text-dependent speaker verification. Unlike previous works, we do not extract the embedding of an utterance from the global average pooling of the temporal dimension. Our system replaces this reduction mechanism by a phonetic phrase alignment model to keep the temporal structure of each phrase since the phonetic information is relevant in the verification task. Moreover, we can apply a convolutional neural network as front-end, and, thanks to the alignment process being differentiable, we can train the network to produce a supervector for each utterance that will be discriminative to the speaker and the phrase simultaneously. This choice has the advantage that the supervector encodes the phrase and speaker information providing good performance in text-dependent speaker verification tasks. The verification process is performed using a basic similarity metric. The new model using alignment to produce supervectors was evaluated on the RSR2015-Part I database, providing competitive results compared to similar size networks that make use of the global average pooling to extract embeddings. Furthermore, we also evaluated this proposal on the RSR2015-Part II. To our knowledge, this system achieves the best published results obtained on this second part
UIAI System for Short-Duration Speaker Verification Challenge 2020
International audienceIn this work, we present the system description of the UIAI entry for the short-duration speaker verification (SdSV) challenge 2020. Our focus is on Task 1 dedicated to text-dependent speaker verification. We investigate different feature extraction and modeling approaches for automatic speaker verification (ASV) and utterance verification (UV). We have also studied different fusion strategies for combining UV and ASV modules. Our primary submission to the challenge is the fusion of seven subsystems which yields a normalized minimum detection cost function (minDCF) of 0.072 and an equal error rate (EER) of 2.14% on the evaluation set. The single system consisting of a pass-phrase identification based model with phone-discriminative bottleneck features gives a normalized minDCF of 0.118 and achieves 19% relative improvement over the state-of-the-art challenge baseline