Search CORE

834 research outputs found

Investigation of Frame Alignments for GMM-based Digit-prompted Speaker Verification

Author: lee
martin
martin
park
stafylakis
young
zhong
Publication venue
Publication date: 02/09/2018
Field of study

Frame alignments can be computed by different methods in GMM-based speaker verification. By incorporating a phonetic Gaussian mixture model (PGMM), we are able to compare the performance using alignments extracted from the deep neural networks (DNN) and the conventional hidden Markov model (HMM) in digit-prompted speaker verification. Based on the different characteristics of these two alignments, we present a novel content verification method to improve the system security without much computational overhead. Our experiments on the RSR2015 Part-3 digit-prompted task show that, the DNN based alignment performs on par with the HMM alignment. The results also demonstrate the effectiveness of the proposed Kullback-Leibler (KL) divergence based scoring to reject speech with incorrect pass-phrases.Comment: accepted by APSIPA ASC 201

arXiv.org e-Print Archive

Crossref

Speaker recognition using frequency filtered spectral energies

Author: Hernando Pericás Francisco Javier
Publication venue: FONDAZIONE UGO BORDONI
Publication date: 01/01/1999
Field of study

The spectral parameters that result from filtering the frequency sequence of log mel-scaled filter-bank energies with a simple first or second order FIR filter have proved to be an efficient speech representation in terms of both speech recognition rate and computational load. Recently, the authors have shown that this frequency filtering can approximately equalize the cepstrum variance enhancing the oscillations of the spectral envelope curve that are most effective for discrimination between speakers. Even better speaker identification results than using melcepstrum have been obtained on the TIMIT database, especially when white noise was added. On the other hand, the hybridization of both linear prediction and filter-bank spectral analysis using either cepstral transformation or the alternative frequency filtering has been explored for speaker verification. The combination of hybrid spectral analysis and frequency filtering, that had shown to be able to outperform the conventional techniques in clean and noisy word recognition, has yield good text-dependent speaker verification results on the new speaker-oriented telephone-line POLYCOST database.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Phoneme and Sub-Phoneme T-Normalization for Text-Dependent Speaker Recognition

Author: Esteve-Elizalde Cristina
Fernández Pozo Rubén
Gonzalez-Rodriguez Joaquin
Hernández Gómez Luis Alfonso
Torre Toledano Doroteo
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2008
Field of study

Test normalization (T-Norm) is a score normalization technique that is regularly and successfully applied in the context of text-independent speaker recognition. It is less frequently applied, however, to text-dependent or textprompted speaker recognition, mainly because its improvement in this context is more modest. In this paper we present a novel way to improve the performance of T-Norm for text-dependent systems. It consists in applying score TNormalization at the phoneme or sub-phoneme level instead of at the sentence level. Experiments on the YOHO corpus show that, while using standard sentence-level T-Norm does not improve equal error rate (EER), phoneme and sub-phoneme level T-Norm produce a relative EER reduction of 18.9% and 20.1% respectively on a state-of-the-art HMM based textdependent speaker recognition system. Results are even better for working points with low false acceptance rates

CiteSeerX

Archivo Digital UPM

Speaker recognition by means of restricted Boltzmann machine adaptation

Author: Ghahabi Esfahani Omid
Hernando Pericás Francisco Javier
Safari Pooyan
Publication venue: 'Servicio de Publicaciones de la Universidad Autonoma de Madrid'
Publication date: 01/01/2016
Field of study

Restricted Boltzmann Machines (RBMs) have shown success in speaker recognition. In this paper, RBMs are investigated in a framework comprising a universal model training and model adaptation. Taking advantage of RBM unsupervised learning algorithm, a global model is trained based on all available background data. This general speaker-independent model, referred to as URBM, is further adapted to the data of a specific speaker to build speaker-dependent model. In order to show its effectiveness, we have applied this framework to two different tasks. It has been used to discriminatively model target and impostor spectral features for classification. It has been also utilized to produce a vector-based representation for speakers. This vector-based representation, similar to i-vector, can be further used for speaker recognition using either cosine scoring or Probabilistic Linear Discriminant Analysis (PLDA). The evaluation is performed on the core test condition of the NIST SRE 2006 database.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Evaluation of preprocessors for neural network speaker verification

Author: Salleh Sheikh-Hussain
Publication venue: The University of Edinburgh
Publication date: 01/01/1997
Field of study

Edinburgh Research Archive