Search CORE

145 research outputs found

A Monte-Carlo Method For Score Normalization in Automatic Speaker Verification Using Kullback-Leibler Distances

Author: Frédéric Bimbot
Mathieu Ben
Raphaël Blouet
Publication venue
Publication date: 01/01/2002
Field of study

In this paper, we propose a new score normalization technique in Automatic Speaker Verification (ASV): the D-Norm. The main advantage of this score normalization is that it does not need any additional speech data nor external speaker population, as opposed to the state-ofthe-art approaches. The D-Norm is based on the use of Kullback-Leibler (KL) distances in an ASV context. In a first step, we estimate the KL distances with a Monte-Carlo method and we experimentally show that they are correlated with the verification scores. In a second step, we use this correlation to implement a score normalization procedure, the D-Norm. We analyse its performance and we compare it to that of a conventional normalization, the Z-Norm. The results show that performance of the D-Norm is comparable to that of the Z-Norm. We then conclude about the results we obtain and we discuss the applications of this work.

CiteSeerX

Crossref

Normalization and Transformation Techniques for Robust Speaker Recognition

Author: Baojie Li
Dalei Wu
Hui Jiang
Publication venue: 'IntechOpen'
Publication date: 01/11/2008
Field of study

IntechOpen

Crossref

Security in Voice Authentication

Author: Yang Chenguang
Publication venue: Digital WPI
Publication date: 27/03/2014
Field of study

We evaluate the security of human voice password databases from an information theoretical point of view. More specifically, we provide a theoretical estimation on the amount of entropy in human voice when processed using the conventional GMM-UBM technologies and the MFCCs as the acoustic features. The theoretical estimation gives rise to a methodology for analyzing the security level in a corpus of human voice. That is, given a database containing speech signals, we provide a method for estimating the relative entropy (Kullback-Leibler divergence) of the database thereby establishing the security level of the speaker verification system. To demonstrate this, we analyze the YOHO database, a corpus of voice samples collected from 138 speakers and show that the amount of entropy extracted is less than 14-bits. We also present a practical attack that succeeds in impersonating the voice of any speaker within the corpus with a 98% success probability with as little as 9 trials. The attack will still succeed with a rate of 62.50% if 4 attempts are permitted. Further, based on the same attack rationale, we mount an attack on the ALIZE speaker verification system. We show through experimentation that the attacker can impersonate any user in the database of 69 people with about 25% success rate with only 5 trials. The success rate can achieve more than 50% by increasing the allowed authentication attempts to 20. Finally, when the practical attack is cast in terms of an entropy metric, we find that the theoretical entropy estimate almost perfectly predicts the success rate of the practical attack, giving further credence to the theoretical model and the associated entropy estimation technique

DigitalCommons@WPI

An Investigation of F-ratio Client-Dependent Normalisation on Biometric Authentication Tasks

Author: Bengio Samy
Poh Norman
Publication venue: IDIAP
Publication date: 10/03/2006
Field of study

This study investigates a new \emph{client-dependent normalisation} to improve biometric authentication systems. There exists many client-de-pendent score normalisation techniques applied to speaker authentication, such as Z-Norm, D-Norm and T-Norm. Such normalisation is intended to adjust the variation across different client models. We propose ``F-ratio'' normalisation, or F-Norm, applied to face and speaker authentication systems. This normalisation requires only that \emph{as few as} two client-dependent accesses are available (the more the better). Different from previous normalisation techniques, F-Norm considers the client and impostor distributions \emph{simultaneously}. We show that F-ratio is a natural choice because it is directly associated to Equal Error Rate. It has the effect of centering the client and impostor distributions such that a global threshold can be easily found. Another difference is that F-Norm actually ``interpolates'' between client-independent and client-dependent information by introducing a mixture parameter. This parameter \emph{can be optimised} to maximise the class dispersion (the degree of separability between client and impostor distributions) while the aforementioned normalisation techniques cannot. unimodal experiments XM2VTS multimodal database show that such normalisation is advantageous over Z-Norm, client-dependent threshold normalisation or no normalisation

Infoscience - École polytechnique fédérale de Lausanne

A Novel Approach to Combining Client-Dependent and Confidence Information in Multimodal Biometric

Author: C. Bishop
J. Fiérrez-Aguilar
J. Kittler
K.-A. Toh
R. Auckenthaler
Publication venue: IDIAP
Publication date: 01/01/2005
Field of study

The issues of fusion with client-dependent and confidence information have been well studied separately in biometric authentication. In this study, we propose to take advantage of both sources of information in a discriminative framework. Initially, each source of information is processed on a per expert basis (plus on a per client basis for the first information and on a per example basis for the second information). Then, both sources of information are combined using a second-level classifier, across different experts. Although the formulation of such two-step solution is not new, the novelty lies in the way the sources of prior knowledge are incorporated prior to fusion using the second-level classifier. Because these two sources of information are of very different nature, one often needs to devise special algorithms to combine both information sources. Our framework that we call ``Prior Knowledge Incorporation'' has the advantage of using the standard machine learning algorithms. Based on

10 \times 32=320

intramodal and multimodal fusion experiments carried out on the publicly available XM2VTS score-level fusion benchmark database, it is found that the generalisation performance of combining both information sources improves over using either or none of them, thus achieving a new state-of-the-art performance on this database

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Improving Single Modal and Multimodal Biometric Authentication Using F-ratio Client-Dependent Normalisation

Author: Bengio Samy
Poh Norman
Publication venue: IDIAP
Publication date: 10/03/2006
Field of study

This study investigates a new client-dependent normalisation to improve a single biometric authentication system, as well as its effects on fusion. There exists two families of client-dependent normalisation techniques, often applied to speaker authentication. They are client-dependent score and threshold normalisation techniques. Examples of the former family of techniques are Z-Norm, D-Norm and T-Norm. There is also a vast amount of literature on the latter family of techniques. Both families are surveyed in this study. Furthermore, we also provide a link between these two families of techniques and show that one is a dual representation of the other. These techniques are intended to adjust the variation across different client models. We propose ``F-ratio'' normalisation, or F-Norm, applied to face and speaker authentication systems in two contexts: single modal and fusion of multi-modal biometerics. This normalisation requires that only as few as two client-dependent accesses are available (the more the better). Different from previous normalisation techniques, F-Norm considers the client and impostor distributions simultaneously. We show that F-ratio is a natural choice because it is directly associated to Equal Error Rate. It has the effect of centering the client and impostor distributions such that a global threshold can be easily found. Another difference is that F-Norm actually ``interpolates'' between client-independent and client-dependent information by introducing two mixture parameters. These parameters can be optimised to maximise the class dispersion (the degree of separability between client and impostor distributions) while the aforementioned normalisation techniques cannot. The results of 13 single modal experiments and 32 fusion experiments carried out on the XM2VTS multimodal database show that in both contexts, F-Norm is advantageous over Z-Norm, client-dependent score normalisation with EER and no normalisation

Infoscience - École polytechnique fédérale de Lausanne

Unconstrained Face Recognition

Author: Zhou Shaohua
Publication venue
Publication date: 17/08/2004
Field of study

Although face recognition has been actively studied over the past decade, the state-of-the-art recognition systems yield satisfactory performance only under controlled scenarios and recognition accuracy degrades significantly when confronted with unconstrained situations due to variations such as illumintion, pose, etc. In this dissertation, we propose novel approaches that are able to recognize human faces under unconstrained situations. Part I presents algorithms for face recognition under illumination/pose variations. For face recognition across illuminations, we present a generalized photometric stereo approach by modeling all face appearances belonging to all humans under all lighting conditions. Using a linear generalization, we achieve a factorization of the observation matrix consisting of face appearances of different individuals, each under a different illumination. We resolve ambiguities in factorization using surface integrability and symmetry constraints. In addition, an illumination-invariant identity descriptor is provided to perform face recognition across illuminations. We further extend the generalized photometric stereo approach to an illuminating light field approach, which is able to recognize faces under pose and illumination variations. Face appearance lies in a high-dimensional nonlinear manifold. In Part II, we introduce machine learning approaches based on reproducing kernel Hilbert space (RKHS) to capture higher-order statistical characteristics of the nonlinear appearance manifold. In particular, we analyze principal components of the RKHS in a probabilistic manner and compute distances such as the Chernoff distance, the Kullback-Leibler divergence between two Gaussian densities in RKHS. Part III is on face tracking and recognition from video. We first present an enhanced tracking algorithm that models online appearance changes in a video sequence using a mixture model and produces good tracking results in various challenging scenarios. For video-based face recognition, while conventional approaches treat tracking and recognition separately, we present a simultaneous tracking-and-recognition approach. This simultaneous approach solved using the sequential importance sampling algorithm improves accuracy in both tracking and recognition. Finally, we propose a unifying framework called probabilistic identity characterization able to perform face recognition under registration/illumination/pose variation and from a still image, a group of still images, or a video sequence

Digital Repository at the University of Maryland

Impostor-centric T-norm in speaker verification.

Author: Neil Pearson
Publication venue
Publication date: 01/01/2008
Field of study

Cronfa at Swansea University