Search CORE

24 research outputs found

Generative Modelling for Unsupervised Score Calibration

Author: Brümmer Niko
Garcia-Romero Daniel
Publication venue
Publication date: 14/02/2014
Field of study

Score calibration enables automatic speaker recognizers to make cost-effective accept / reject decisions. Traditional calibration requires supervised data, which is an expensive resource. We propose a 2-component GMM for unsupervised calibration and demonstrate good performance relative to a supervised baseline on NIST SRE'10 and SRE'12. A Bayesian analysis demonstrates that the uncertainty associated with the unsupervised calibration parameter estimates is surprisingly small.Comment: Accepted for ICASSP 201

arXiv.org e-Print Archive

Crossref

Constrained speaker linking

Author: Brümmer Niko
van Leeuwen David A.
Publication venue
Publication date: 01/01/2014
Field of study

In this paper we study speaker linking (a.k.a.\ partitioning) given constraints of the distribution of speaker identities over speech recordings. Specifically, we show that the intractable partitioning problem becomes tractable when the constraints pre-partition the data in smaller cliques with non-overlapping speakers. The surprisingly common case where speakers in telephone conversations are known, but the assignment of channels to identities is unspecified, is treated in a Bayesian way. We show that for the Dutch CGN database, where this channel assignment task is at hand, a lightweight speaker recognition system can quite effectively solve the channel assignment problem, with 93% of the cliques solved. We further show that the posterior distribution over channel assignment configurations is well calibrated.Comment: Submitted to Interspeech 2014, some typos fixe

arXiv.org e-Print Archive

Radboud Repository

A comparison of linear and non-linear calibrations for speaker recognition

Author: Brümmer Niko
Swart Albert
van Leeuwen David
Publication venue
Publication date: 01/01/2014
Field of study

In recent work on both generative and discriminative score to log-likelihood-ratio calibration, it was shown that linear transforms give good accuracy only for a limited range of operating points. Moreover, these methods required tailoring of the calibration training objective functions in order to target the desired region of best accuracy. Here, we generalize the linear recipes to non-linear ones. We experiment with a non-linear, non-parametric, discriminative PAV solution, as well as parametric, generative, maximum-likelihood solutions that use Gaussian, Student's T and normal-inverse-Gaussian score distributions. Experiments on NIST SRE'12 scores suggest that the non-linear methods provide wider ranges of optimal accuracy and can be trained without having to resort to objective function tailoring.Comment: accepted for Odyssey 2014: The Speaker and Language Recognition Worksho

arXiv.org e-Print Archive

Radboud Repository