5,390 research outputs found
Factor analysis modelling for speaker verification with short utterances
This paper examines combining both relevance MAP and subspace speaker adaptation processes to train GMM speaker models for use in speaker verification systems with a particular focus on short utterance lengths. The subspace speaker adaptation method involves developing a speaker GMM mean supervector as the sum of a speaker-independent prior distribution and a speaker dependent offset constrained to lie within a low-rank subspace, and has been shown to provide improvements in accuracy over ordinary relevance MAP when the amount of training data is limited. It is shown through testing on NIST SRE data that combining the two processes provides speaker models which lead to modest improvements in verification accuracy for limited data situations, in addition to improving the performance of the speaker verification system when a larger amount of available training data is available
NPLDA: A Deep Neural PLDA Model for Speaker Verification
The state-of-art approach for speaker verification consists of a neural
network based embedding extractor along with a backend generative model such as
the Probabilistic Linear Discriminant Analysis (PLDA). In this work, we propose
a neural network approach for backend modeling in speaker recognition. The
likelihood ratio score of the generative PLDA model is posed as a
discriminative similarity function and the learnable parameters of the score
function are optimized using a verification cost. The proposed model, termed as
neural PLDA (NPLDA), is initialized using the generative PLDA model parameters.
The loss function for the NPLDA model is an approximation of the minimum
detection cost function (DCF). The speaker recognition experiments using the
NPLDA model are performed on the speaker verificiation task in the VOiCES
datasets as well as the SITW challenge dataset. In these experiments, the NPLDA
model optimized using the proposed loss function improves significantly over
the state-of-art PLDA based speaker verification system.Comment: Published in Odyssey 2020, the Speaker and Language Recognition
Workshop (VOiCES Special Session). Link to GitHub Implementation:
https://github.com/iiscleap/NeuralPlda. arXiv admin note: substantial text
overlap with arXiv:2001.0703
Efficient Invariant Features for Sensor Variability Compensation in Speaker Recognition
In this paper, we investigate the use of invariant features for speaker recognition. Owing to their characteristics, these features are introduced to cope with the difficult and challenging problem of sensor variability and the source of performance degradation inherent in speaker recognition systems. Our experiments show: (1) the effectiveness of these features in match cases; (2) the benefit of combining these features with the mel frequency cepstral coefficients to exploit their discrimination power under uncontrolled conditions (mismatch cases). Consequently, the proposed invariant features result in a performance improvement as demonstrated by a reduction in the equal error rate and the minimum decision cost function compared to the GMM-UBM speaker recognition systems based on MFCC features
Constrained speaker linking
In this paper we study speaker linking (a.k.a.\ partitioning) given
constraints of the distribution of speaker identities over speech recordings.
Specifically, we show that the intractable partitioning problem becomes
tractable when the constraints pre-partition the data in smaller cliques with
non-overlapping speakers. The surprisingly common case where speakers in
telephone conversations are known, but the assignment of channels to identities
is unspecified, is treated in a Bayesian way. We show that for the Dutch CGN
database, where this channel assignment task is at hand, a lightweight speaker
recognition system can quite effectively solve the channel assignment problem,
with 93% of the cliques solved. We further show that the posterior distribution
over channel assignment configurations is well calibrated.Comment: Submitted to Interspeech 2014, some typos fixe
A Generative Model for Score Normalization in Speaker Recognition
We propose a theoretical framework for thinking about score normalization,
which confirms that normalization is not needed under (admittedly fragile)
ideal conditions. If, however, these conditions are not met, e.g. under
data-set shift between training and runtime, our theory reveals dependencies
between scores that could be exploited by strategies such as score
normalization. Indeed, it has been demonstrated over and over experimentally,
that various ad-hoc score normalization recipes do work. We present a first
attempt at using probability theory to design a generative score-space
normalization model which gives similar improvements to ZT-norm on the
text-dependent RSR 2015 database
Multimodal person recognition for human-vehicle interaction
Next-generation vehicles will undoubtedly feature biometric person recognition as part of an effort to improve the driving experience. Today's technology prevents such systems from operating satisfactorily under adverse conditions. A proposed framework for achieving person recognition successfully combines different biometric modalities, borne out in two case studies
- …