Search CORE

21 research outputs found

I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences

The I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE). The latest edition of such joint submission was in SRE 2018, in which the I4U submission was among the best-performing systems. SRE'18 also marks the 10-year anniversary of I4U consortium into NIST SRE series of evaluation. The primary objective of the current paper is to summarize the results and lessons learned based on the twelve sub-systems and their fusion submitted to SRE'18. It is also our intention to present a shared view on the advancements, progresses, and major paradigm shifts that we have witnessed as an SRE participant in the past decade from SRE'08 to SRE'18. In this regard, we have seen, among others, a paradigm shift from supervector representation to deep speaker embedding, and a switch of research challenge from channel compensation to domain adaptation.Comment: 5 page

arXiv.org e-Print Archive

HAL AMU

INRIA a CCSD electronic archive server

Hal-Diderot

Applying SVMs and weight-based factor analysis to unsupervised adaptation for speaker verification

Author: Bonastre Jean-Francois
Matrouf Driss
Mclaren Mitchell
Vogt Robert
Publication venue: 'Elsevier BV'
Publication date: 01/01/2011
Field of study

This paper presents an extended study on the implementation of support vector machine(SVM) based speaker verification in systems that employ continuous progressive model adaptation using the weight-based factor analysis model. The weight-based factor analysis model compensates for session variations in unsupervised scenarios by incorporating trial confidence measures in the general statistics used in the inter-session variability modelling process. Employing weight-based factor analysis in Gaussian mixture models (GMM) was recently found to provide significant performance gains to unsupervised classification. Further improvements in performance were found through the integration of SVM-based classification in the system by means of GMM supervectors. This study focuses particularly on the way in which a client is represented in the SVM kernel space using single and multiple target supervectors. Experimental results indicate that training client SVMs using a single target supervector maximises performance while exhibiting a certain robustness to the inclusion of impostor training data in the model. Furthermore, the inclusion of low-scoring target trials in the adaptation process is investigated where they were found to significantly aid performance

Crossref

Queensland University of Technology ePrints Archive

HMM évolutif pour les tâches de segmentation et d'indexation

Author: BONASTRE Jean-Francois
IGOUNET Stéphane
MEIGNIER Sylvain
Publication venue: GRETSI, Groupe d’Etudes du Traitement du Signal et des Images
Publication date: 01/01/2001
Field of study

Cet article présente une méthode fondée sur un HMM pour la tâche d'indexation en aveugle de locuteurs. Cette méthode détecte et ajoute un à un les locuteurs dans un HMM évolutif (E-HMM). La solution proposée exploite l'ensemble des informations (locuteurs détectés) dès qu'elles sont disponibles. Le système proposé a été testé pour les tâches de « N-segmentation » lors de la campagne d'évaluation NIST 2001

I-Revues

EFFECT OF UTTERANCE DURATION AND PHONETIC CONTENT ON SPEAKER IDENTIFICATION USING SECOND-ORDER STATISTICAL METHODS

Author: Ivan Magrin-chagnolleau
Jean-francois Bonastre
Publication venue
Publication date
Field of study

Second-order statistical methods show very good results for automatic speaker identi cation in controlled recording conditions [2]. These approaches are generally used on the entire speech material available. In this paper, we study the in uence of the content of the test speech material on the performances of such methods, i.e. under a more analytical approach [3]. The goal is to investigate on the kind of information which is used by these methods, and where it is located in the speech signal. Liquids and glides together, vowels, and more particularly nasal vowels and nasal consonants, are found to be particularly speaker speci c: test utterances of 1 second, composed in majority of acoustic material from one of these classes provide better speaker identi cation results than phonetically balanced test utterances, even though the training is done, in both cases, with 15 seconds of phonetically balanced speech. Nevertheless, results with other phoneme classes are never dramatically poor. These results tend to show that the speaker-dependent information captured by long-term second-order statistics is consistently common to all phonetic classes, and that the homogeneity of the test material may improve the quality of the estimates. 1

CiteSeerX

Bayesian Approach based-Decision in Speaker Verification

Author: Corinne Fredouille
Jean-Francois Bonastre
Teva Merlin
Publication venue
Publication date
Field of study

Considering Bayesian decision framework applied in the context of speaker verification, this paper presents a new way of handling troublesome anti-speaker model by proposing a redefinition of hypotheses involved in the classical statistical hypothesis test. This new definition of hypotheses is then implemented through a speaker independent normalization technique, named MAP approach. Besides supporting these new hypotheses, MAP approach takes the advantages of projecting likelihood scores into a probabilistic domain and therefore of providing the decision threshold with bounded and meaningful values

CiteSeerX

Speaker Utterances Tying Among Speaker Segmented Audio Documents Using Hierarchical Classification: Towards Speaker Indexing Ofaudio Databases

Author: Ivan Magrin-chagnolleau
Jean-Francois Bonastre
Sylvain Meignier
Publication venue
Publication date: 01/01/2002
Field of study

audio data according to the speakers present in the database. It is composed of three steps: (1) segmentation by speakers of each audio document; (2) speaker tying among the various segmented portions of the audio documents; and (3) generation of a speakerbased index. This paper focuses on the second step, the speaker tying task, which has not been addressed in the literature. The result of this task is a classification of the segmented acoustic data by clusters; each cluster should represent one speaker. This paper investigates on hierarchical classification approaches for speaker tying. Two new discriminant dissimilarity measures and a new bottom-up algorithm are also proposed. The experiments are conducted on a subset of the Switchboard database, a conversational telephone database, and show that the proposed method allows a very satisfying speaker tying among various audio documents, with a good level of purity for the clusters, but with a number of clusters significantly higher than the number of speakers

CiteSeerX

Chapman University Digital Commons