24 research outputs found
Generative Modelling for Unsupervised Score Calibration
Score calibration enables automatic speaker recognizers to make
cost-effective accept / reject decisions. Traditional calibration requires
supervised data, which is an expensive resource. We propose a 2-component GMM
for unsupervised calibration and demonstrate good performance relative to a
supervised baseline on NIST SRE'10 and SRE'12. A Bayesian analysis demonstrates
that the uncertainty associated with the unsupervised calibration parameter
estimates is surprisingly small.Comment: Accepted for ICASSP 201
Constrained speaker linking
In this paper we study speaker linking (a.k.a.\ partitioning) given
constraints of the distribution of speaker identities over speech recordings.
Specifically, we show that the intractable partitioning problem becomes
tractable when the constraints pre-partition the data in smaller cliques with
non-overlapping speakers. The surprisingly common case where speakers in
telephone conversations are known, but the assignment of channels to identities
is unspecified, is treated in a Bayesian way. We show that for the Dutch CGN
database, where this channel assignment task is at hand, a lightweight speaker
recognition system can quite effectively solve the channel assignment problem,
with 93% of the cliques solved. We further show that the posterior distribution
over channel assignment configurations is well calibrated.Comment: Submitted to Interspeech 2014, some typos fixe
A comparison of linear and non-linear calibrations for speaker recognition
In recent work on both generative and discriminative score to
log-likelihood-ratio calibration, it was shown that linear transforms give good
accuracy only for a limited range of operating points. Moreover, these methods
required tailoring of the calibration training objective functions in order to
target the desired region of best accuracy. Here, we generalize the linear
recipes to non-linear ones. We experiment with a non-linear, non-parametric,
discriminative PAV solution, as well as parametric, generative,
maximum-likelihood solutions that use Gaussian, Student's T and
normal-inverse-Gaussian score distributions. Experiments on NIST SRE'12 scores
suggest that the non-linear methods provide wider ranges of optimal accuracy
and can be trained without having to resort to objective function tailoring.Comment: accepted for Odyssey 2014: The Speaker and Language Recognition
Worksho