Search CORE

4,997 research outputs found

A Unified Framework for Score Normalization Techniques Applied to Text Independent Speaker Verification

Author: Bengio Samy
Mariéthoz Johnny
Publication venue: IDIAP
Publication date: 10/03/2006
Field of study

The purpose of this paper is to unify several of the state-of-the-art score normalization techniques applied to text-independent speaker verification systems. We propose a new framework for this purpose. The two well-known Z- and T-normalization techniques can be easily interpreted in this framework as different ways to estimate score distributions. This is useful as it helps to understand the various assumptions behind these well-known score normalization techniques, and opens the door for yet more complex solutions. Finally, some experiments on the Switchboard database are performed in order to illustrate the validity of the new proposed framework

Infoscience - École polytechnique fédérale de Lausanne

A Generative Model for Score Normalization in Speaker Recognition

Author: Brummer Niko
Swart Albert
Publication venue
Publication date: 28/09/2017
Field of study

We propose a theoretical framework for thinking about score normalization, which confirms that normalization is not needed under (admittedly fragile) ideal conditions. If, however, these conditions are not met, e.g. under data-set shift between training and runtime, our theory reveals dependencies between scores that could be exploited by strategies such as score normalization. Indeed, it has been demonstrated over and over experimentally, that various ad-hoc score normalization recipes do work. We present a first attempt at using probability theory to design a generative score-space normalization model which gives similar improvements to ZT-norm on the text-dependent RSR 2015 database

arXiv.org e-Print Archive

Crossref

Speaker verification using sequence discriminant support vector machines

Author: Renals S.
Wan V.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

This paper presents a text-independent speaker verification system using support vector machines (SVMs) with score-space kernels. Score-space kernels generalize Fisher kernels and are based on underlying generative models such as Gaussian mixture models (GMMs). This approach provides direct discrimination between whole sequences, in contrast with the frame-level approaches at the heart of most current systems. The resultant SVMs have a very high dimensionality since it is related to the number of parameters in the underlying generative model. To address problems that arise in the resultant optimization we introduce a technique called spherical normalization that preconditions the Hessian matrix. We have performed speaker verification experiments using the PolyVar database. The SVM system presented here reduces the relative error rates by 34% compared to a GMM likelihood ratio system

Crossref

Edinburgh Research Archive

Edinburgh Research Explorer

White Rose Research Online

Speaker recognition by means of restricted Boltzmann machine adaptation

Author: Ghahabi Esfahani Omid
Hernando Pericás Francisco Javier
Safari Pooyan
Publication venue: 'Servicio de Publicaciones de la Universidad Autonoma de Madrid'
Publication date: 01/01/2016
Field of study

Restricted Boltzmann Machines (RBMs) have shown success in speaker recognition. In this paper, RBMs are investigated in a framework comprising a universal model training and model adaptation. Taking advantage of RBM unsupervised learning algorithm, a global model is trained based on all available background data. This general speaker-independent model, referred to as URBM, is further adapted to the data of a specific speaker to build speaker-dependent model. In order to show its effectiveness, we have applied this framework to two different tasks. It has been used to discriminatively model target and impostor spectral features for classification. It has been also utilized to produce a vector-based representation for speakers. This vector-based representation, similar to i-vector, can be further used for speaker recognition using either cosine scoring or Probabilistic Linear Discriminant Analysis (PLDA). The evaluation is performed on the core test condition of the NIST SRE 2006 database.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Attentive Statistics Pooling for Deep Speaker Embedding

Author: Koshinaka Takafumi
Okabe Koji
Shinoda Koichi
Publication venue: 'International Speech Communication Association'
Publication date: 24/02/2019
Field of study

This paper proposes attentive statistics pooling for deep speaker embedding in text-independent speaker verification. In conventional speaker embedding, frame-level features are averaged over all the frames of a single utterance to form an utterance-level feature. Our method utilizes an attention mechanism to give different weights to different frames and generates not only weighted means but also weighted standard deviations. In this way, it can capture long-term variations in speaker characteristics more effectively. An evaluation on the NIST SRE 2012 and the VoxCeleb data sets shows that it reduces equal error rates (EERs) from the conventional method by 7.5% and 8.1%, respectively.Comment: Proc. Interspeech 2018, pp2252--2256. arXiv admin note: text overlap with arXiv:1809.0931

arXiv.org e-Print Archive

Crossref