21 research outputs found
I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences
The I4U consortium was established to facilitate a joint entry to NIST
speaker recognition evaluations (SRE). The latest edition of such joint
submission was in SRE 2018, in which the I4U submission was among the
best-performing systems. SRE'18 also marks the 10-year anniversary of I4U
consortium into NIST SRE series of evaluation. The primary objective of the
current paper is to summarize the results and lessons learned based on the
twelve sub-systems and their fusion submitted to SRE'18. It is also our
intention to present a shared view on the advancements, progresses, and major
paradigm shifts that we have witnessed as an SRE participant in the past decade
from SRE'08 to SRE'18. In this regard, we have seen, among others, a paradigm
shift from supervector representation to deep speaker embedding, and a switch
of research challenge from channel compensation to domain adaptation.Comment: 5 page
Applying SVMs and weight-based factor analysis to unsupervised adaptation for speaker verification
This paper presents an extended study on the implementation of support vector machine(SVM) based speaker verification in systems that employ continuous progressive model adaptation using the weight-based factor analysis model. The weight-based factor analysis model compensates for session variations in unsupervised scenarios by incorporating trial confidence measures in the general statistics used in the inter-session variability modelling process. Employing weight-based factor analysis in Gaussian mixture models (GMM) was recently found to provide significant performance gains to unsupervised classification. Further improvements in performance were found through the integration of SVM-based classification in the system by means of GMM supervectors. This study focuses particularly on the way in which a client is represented in the SVM kernel space using single and multiple target supervectors. Experimental results indicate that training client SVMs using a single target supervector maximises performance while exhibiting a certain robustness to the inclusion of impostor training data in the model. Furthermore, the inclusion of low-scoring target trials in the adaptation process is investigated where they were found to significantly aid performance
HMM évolutif pour les tâches de segmentation et d'indexation
Cet article présente une méthode fondée sur un HMM pour la tâche d'indexation en aveugle de locuteurs. Cette méthode détecte et ajoute un à un les locuteurs dans un HMM évolutif (E-HMM). La solution proposée exploite l'ensemble des informations (locuteurs détectés) dès qu'elles sont disponibles. Le système proposé a été testé pour les tâches de « N-segmentation » lors de la campagne d'évaluation NIST 2001
EFFECT OF UTTERANCE DURATION AND PHONETIC CONTENT ON SPEAKER IDENTIFICATION USING SECOND-ORDER STATISTICAL METHODS
Second-order statistical methods show very good results for automatic speaker identi cation in controlled recording conditions [2]. These approaches are generally used on the entire speech material available. In this paper, we study the in uence of the content of the test speech material on the performances of such methods, i.e. under a more analytical approach [3]. The goal is to investigate on the kind of information which is used by these methods, and where it is located in the speech signal. Liquids and glides together, vowels, and more particularly nasal vowels and nasal consonants, are found to be particularly speaker speci c: test utterances of 1 second, composed in majority of acoustic material from one of these classes provide better speaker identi cation results than phonetically balanced test utterances, even though the training is done, in both cases, with 15 seconds of phonetically balanced speech. Nevertheless, results with other phoneme classes are never dramatically poor. These results tend to show that the speaker-dependent information captured by long-term second-order statistics is consistently common to all phonetic classes, and that the homogeneity of the test material may improve the quality of the estimates. 1
Bayesian Approach based-Decision in Speaker Verification
Considering Bayesian decision framework applied in the context of speaker verification, this paper presents a new way of handling troublesome anti-speaker model by proposing a redefinition of hypotheses involved in the classical statistical hypothesis test. This new definition of hypotheses is then implemented through a speaker independent normalization technique, named MAP approach. Besides supporting these new hypotheses, MAP approach takes the advantages of projecting likelihood scores into a probabilistic domain and therefore of providing the decision threshold with bounded and meaningful values
Speaker Utterances Tying Among Speaker Segmented Audio Documents Using Hierarchical Classification: Towards Speaker Indexing Ofaudio Databases
audio data according to the speakers present in the database. It is composed of three steps: (1) segmentation by speakers of each audio document; (2) speaker tying among the various segmented portions of the audio documents; and (3) generation of a speakerbased index. This paper focuses on the second step, the speaker tying task, which has not been addressed in the literature. The result of this task is a classification of the segmented acoustic data by clusters; each cluster should represent one speaker. This paper investigates on hierarchical classification approaches for speaker tying. Two new discriminant dissimilarity measures and a new bottom-up algorithm are also proposed. The experiments are conducted on a subset of the Switchboard database, a conversational telephone database, and show that the proposed method allows a very satisfying speaker tying among various audio documents, with a good level of purity for the clusters, but with a number of clusters significantly higher than the number of speakers