205 research outputs found
Speaker recognition by means of restricted Boltzmann machine adaptation
Restricted Boltzmann Machines (RBMs) have shown success in speaker recognition. In this paper, RBMs are investigated in a framework comprising a universal model training and model adaptation. Taking advantage of RBM unsupervised learning algorithm, a global model is trained based on all available background data. This general speaker-independent model, referred to as URBM, is further adapted to the data of a specific speaker to build speaker-dependent model. In order to show its effectiveness, we have applied this framework to two different tasks. It has been used to discriminatively model target and impostor spectral features for classification. It has been also utilized to produce a vector-based representation for speakers. This vector-based representation, similar to i-vector, can be further used for speaker recognition using either cosine scoring or Probabilistic Linear Discriminant Analysis (PLDA). The evaluation is performed on the core test condition of the NIST SRE 2006 database.Peer ReviewedPostprint (author's final draft
NPLDA: A Deep Neural PLDA Model for Speaker Verification
The state-of-art approach for speaker verification consists of a neural
network based embedding extractor along with a backend generative model such as
the Probabilistic Linear Discriminant Analysis (PLDA). In this work, we propose
a neural network approach for backend modeling in speaker recognition. The
likelihood ratio score of the generative PLDA model is posed as a
discriminative similarity function and the learnable parameters of the score
function are optimized using a verification cost. The proposed model, termed as
neural PLDA (NPLDA), is initialized using the generative PLDA model parameters.
The loss function for the NPLDA model is an approximation of the minimum
detection cost function (DCF). The speaker recognition experiments using the
NPLDA model are performed on the speaker verificiation task in the VOiCES
datasets as well as the SITW challenge dataset. In these experiments, the NPLDA
model optimized using the proposed loss function improves significantly over
the state-of-art PLDA based speaker verification system.Comment: Published in Odyssey 2020, the Speaker and Language Recognition
Workshop (VOiCES Special Session). Link to GitHub Implementation:
https://github.com/iiscleap/NeuralPlda. arXiv admin note: substantial text
overlap with arXiv:2001.0703
A Speaker Verification Backend with Robust Performance across Conditions
In this paper, we address the problem of speaker verification in conditions
unseen or unknown during development. A standard method for speaker
verification consists of extracting speaker embeddings with a deep neural
network and processing them through a backend composed of probabilistic linear
discriminant analysis (PLDA) and global logistic regression score calibration.
This method is known to result in systems that work poorly on conditions
different from those used to train the calibration model. We propose to modify
the standard backend, introducing an adaptive calibrator that uses duration and
other automatically extracted side-information to adapt to the conditions of
the inputs. The backend is trained discriminatively to optimize binary
cross-entropy. When trained on a number of diverse datasets that are labeled
only with respect to speaker, the proposed backend consistently and, in some
cases, dramatically improves calibration, compared to the standard PLDA
approach, on a number of held-out datasets, some of which are markedly
different from the training data. Discrimination performance is also
consistently improved. We show that joint training of the PLDA and the adaptive
calibrator is essential -- the same benefits cannot be achieved when freezing
PLDA and fine-tuning the calibrator. To our knowledge, the results in this
paper are the first evidence in the literature that it is possible to develop a
speaker verification system with robust out-of-the-box performance on a large
variety of conditions
Max-margin Metric Learning for Speaker Recognition
Probabilistic linear discriminant analysis (PLDA) is a popular normalization
approach for the i-vector model, and has delivered state-of-the-art performance
in speaker recognition. A potential problem of the PLDA model, however, is that
it essentially assumes Gaussian distributions over speaker vectors, which is
not always true in practice. Additionally, the objective function is not
directly related to the goal of the task, e.g., discriminating true speakers
and imposters. In this paper, we propose a max-margin metric learning approach
to solve the problems. It learns a linear transform with a criterion that the
margin between target and imposter trials are maximized. Experiments conducted
on the SRE08 core test show that compared to PLDA, the new approach can obtain
comparable or even better performance, though the scoring is simply a cosine
computation
Neural PLDA Modeling for End-to-End Speaker Verification
While deep learning models have made significant advances in supervised
classification problems, the application of these models for out-of-set
verification tasks like speaker recognition has been limited to deriving
feature embeddings. The state-of-the-art x-vector PLDA based speaker
verification systems use a generative model based on probabilistic linear
discriminant analysis (PLDA) for computing the verification score. Recently, we
had proposed a neural network approach for backend modeling in speaker
verification called the neural PLDA (NPLDA) where the likelihood ratio score of
the generative PLDA model is posed as a discriminative similarity function and
the learnable parameters of the score function are optimized using a
verification cost. In this paper, we extend this work to achieve joint
optimization of the embedding neural network (x-vector network) with the NPLDA
network in an end-to-end (E2E) fashion. This proposed end-to-end model is
optimized directly from the acoustic features with a verification cost function
and during testing, the model directly outputs the likelihood ratio score. With
various experiments using the NIST speaker recognition evaluation (SRE) 2018
and 2019 datasets, we show that the proposed E2E model improves significantly
over the x-vector PLDA baseline speaker verification system.Comment: Accepted in Interspeech 2020. GitHub Implementation Repos:
https://github.com/iiscleap/E2E-NPLDA and
https://github.com/iiscleap/NeuralPld
Constrained discriminative speaker verification specific to normalized i-vectors
International audienceThis paper focuses on discriminative trainings (DT) applied to i-vectors after Gaussian probabilistic linear discriminant analysis (PLDA). If DT has been successfully used with non-normalized vectors, this technique struggles to improve speaker detection when i-vectors have been first normalized, whereas the latter option has proven to achieve best performance in speaker verification. We propose an additional normalization procedure which limits the amount of coefficient to discriminatively train, with a minimal loss of accuracy. Adaptations of logistic regression based-DT to this new configuration are proposed, then we introduce a discriminative classifier for speaker verification which is a novelty in the field
- …