Search CORE

12 research outputs found

Investigation of Frame Alignments for GMM-based Digit-prompted Speaker Verification

Author: lee
martin
martin
park
stafylakis
young
zhong
Publication venue
Publication date: 02/09/2018
Field of study

Frame alignments can be computed by different methods in GMM-based speaker verification. By incorporating a phonetic Gaussian mixture model (PGMM), we are able to compare the performance using alignments extracted from the deep neural networks (DNN) and the conventional hidden Markov model (HMM) in digit-prompted speaker verification. Based on the different characteristics of these two alignments, we present a novel content verification method to improve the system security without much computational overhead. Our experiments on the RSR2015 Part-3 digit-prompted task show that, the DNN based alignment performs on par with the HMM alignment. The results also demonstrate the effectiveness of the proposed Kullback-Leibler (KL) divergence based scoring to reject speech with incorrect pass-phrases.Comment: accepted by APSIPA ASC 201

arXiv.org e-Print Archive

Crossref

Supervector extraction for encoding speaker and phrase information with neural networks for text-dependent speaker verification

Author: Lleida Eduardo
Miguel Antonio
Mingote Victoria
Ortega Alfonso
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

In this paper, we propose a new differentiable neural network with an alignment mechanism for text-dependent speaker verification. Unlike previous works, we do not extract the embedding of an utterance from the global average pooling of the temporal dimension. Our system replaces this reduction mechanism by a phonetic phrase alignment model to keep the temporal structure of each phrase since the phonetic information is relevant in the verification task. Moreover, we can apply a convolutional neural network as front-end, and, thanks to the alignment process being differentiable, we can train the network to produce a supervector for each utterance that will be discriminative to the speaker and the phrase simultaneously. This choice has the advantage that the supervector encodes the phrase and speaker information providing good performance in text-dependent speaker verification tasks. The verification process is performed using a basic similarity metric. The new model using alignment to produce supervectors was evaluated on the RSR2015-Part I database, providing competitive results compared to similar size networks that make use of the global average pooling to extract embeddings. Furthermore, we also evaluated this proposal on the RSR2015-Part II. To our knowledge, this system achieves the best published results obtained on this second part

Repositorio Universidad de Zaragoza

UIAI System for Short-Duration Speaker Verification Challenge 2020

Author: Kinnunen Tomi
Kumar Sarkar Achintya
Liu Xuechen
Sahidullah Md
Serizel Romain
Tan Zheng-Hua
Vestman Ville
Vincent Emmanuel
Publication venue: HAL CCSD
Publication date: 26/07/2020
Field of study

International audienceIn this work, we present the system description of the UIAI entry for the short-duration speaker verification (SdSV) challenge 2020. Our focus is on Task 1 dedicated to text-dependent speaker verification. We investigate different feature extraction and modeling approaches for automatic speaker verification (ASV) and utterance verification (UV). We have also studied different fusion strategies for combining UV and ASV modules. Our primary submission to the challenge is the fusion of seven subsystems which yields a normalized minimum detection cost function (minDCF) of 0.072 and an equal error rate (EER) of 2.14% on the evaluation set. The single system consisting of a pass-phrase identification based model with phone-discriminative bottleneck features gives a normalized minDCF of 0.118 and achieves 19% relative improvement over the state-of-the-art challenge baseline

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

VBN

HAL-Rennes 1