421 research outputs found
Pairwise Discriminative Speaker Verification in the I-Vector Space
This work presents a new and efficient approach to discriminative speaker verification in the i-vector space. We illustrate the development of a linear discriminative classifier that is trained to discriminate between the hypothesis that a pair of feature vectors in a trial belong to the same speaker or to different speakers. This approach is alternative to the usual discriminative setup that discriminates between a speaker and all the other speakers. We use a discriminative classifier based on a Support Vector Machine (SVM) that is trained to estimate the parameters of a symmetric quadratic function approximating a log-likelihood ratio score without explicit modeling of the i-vector distributions as in the generative Probabilistic Linear Discriminant Analysis (PLDA) models. Training these models is feasible because it is not necessary to expand the i-vector pairs, which would be expensive or even impossible even for medium sized training sets. The results of experiments performed on the tel-tel extended core condition of the NIST 2010 Speaker Recognition Evaluation are competitive with the ones obtained by generative models, in terms of normalized Detection Cost Function and Equal Error Rate. Moreover, we show that it is possible to train a gender- independent discriminative model that achieves state-of-the-art accuracy, comparable to the one of a gender-dependent system, saving memory and execution time both in training and in testin
Weighted LDA techniques for I-vector based speaker verification
This paper introduces the Weighted Linear Discriminant Analysis (WLDA) technique, based upon the weighted pairwise Fisher criterion, for the purposes of improving i-vector speaker verification in the presence of high intersession variability. By taking advantage of the speaker discriminative information that is available in the distances between pairs of speakers clustered in the development i-vector space, the WLDA technique is shown to provide an improvement in speaker verification performance over traditional Linear Discriminant Analysis (LDA) approaches. A similar approach is also taken to extend the recently developed Source Normalised LDA (SNLDA) into Weighted SNLDA (WSNLDA) which, similarly, shows an improvement in speaker verification performance in both matched and mismatched enrolment/verification conditions. Based upon the results presented within this paper using the NIST 2008 Speaker Recognition Evaluation dataset, we believe that both WLDA and WSNLDA are viable as replacement techniques to improve the performance of LDA and SNLDA-based i-vector speaker verification
Compensation of Nuisance Factors for Speaker and Language Recognition
The variability of the channel and environment is
one of the most important factors affecting the performance of
text-independent speaker verification systems. The best techniques
for channel compensation are model based. Most of them have
been proposed for Gaussian mixture models, while in the feature
domain blind channel compensation is usually performed. The
aim of this work is to explore techniques that allow more accurate
intersession compensation in the feature domain. Compensating
the features rather than the models has the advantage that the
transformed parameters can be used with models of a different
nature and complexity and for different tasks. In this paper,
we evaluate the effects of the compensation of the intersession
variability obtained by means of the channel factors approach. In
particular, we compare channel variability modeling in the usual
Gaussian mixture model domain, and our proposed feature domain
compensation technique. We show that the two approaches
lead to similar results on the NIST 2005 Speaker Recognition
Evaluation data with a reduced computation cost. We also report
the results of a system, based on the intersession compensation
technique in the feature space that was among the best participants
in the NIST 2006 Speaker Recognition Evaluation. Moreover, we
show how we obtained significant performance improvement in
language recognition by estimating and compensating, in the
feature domain, the distortions due to interspeaker variability
within the same language.
Index Terms—Factor anal
Anti-spoofing Methods for Automatic SpeakerVerification System
Growing interest in automatic speaker verification (ASV)systems has lead to
significant quality improvement of spoofing attackson them. Many research works
confirm that despite the low equal er-ror rate (EER) ASV systems are still
vulnerable to spoofing attacks. Inthis work we overview different acoustic
feature spaces and classifiersto determine reliable and robust countermeasures
against spoofing at-tacks. We compared several spoofing detection systems,
presented so far,on the development and evaluation datasets of the Automatic
SpeakerVerification Spoofing and Countermeasures (ASVspoof) Challenge
2015.Experimental results presented in this paper demonstrate that the useof
magnitude and phase information combination provides a substantialinput into
the efficiency of the spoofing detection systems. Also wavelet-based features
show impressive results in terms of equal error rate. Inour overview we compare
spoofing performance for systems based on dif-ferent classifiers. Comparison
results demonstrate that the linear SVMclassifier outperforms the conventional
GMM approach. However, manyresearchers inspired by the great success of deep
neural networks (DNN)approaches in the automatic speech recognition, applied
DNN in thespoofing detection task and obtained quite low EER for known and
un-known type of spoofing attacks.Comment: 12 pages, 0 figures, published in Springer Communications in Computer
and Information Science (CCIS) vol. 66
Acoustic Approaches to Gender and Accent Identification
There has been considerable research on the problems of speaker and language recognition
from samples of speech. A less researched problem is that of accent recognition. Although this
is a similar problem to language identification, di�erent accents of a language exhibit more
fine-grained di�erences between classes than languages. This presents a tougher problem
for traditional classification techniques. In this thesis, we propose and evaluate a number of
techniques for gender and accent classification. These techniques are novel modifications and
extensions to state of the art algorithms, and they result in enhanced performance on gender
and accent recognition.
The first part of the thesis focuses on the problem of gender identification, and presents a
technique that gives improved performance in situations where training and test conditions are
mismatched.
The bulk of this thesis is concerned with the application of the i-Vector technique to accent
identification, which is the most successful approach to acoustic classification to have emerged
in recent years. We show that it is possible to achieve high accuracy accent identification without
reliance on transcriptions and without utilising phoneme recognition algorithms. The thesis
describes various stages in the development of i-Vector based accent classification that improve
the standard approaches usually applied for speaker or language identification, which are
insu�cient. We demonstrate that very good accent identification performance is possible with
acoustic methods by considering di�erent i-Vector projections, frontend parameters, i-Vector
configuration parameters, and an optimised fusion of the resulting i-Vector classifiers we can
obtain from the same data.
We claim to have achieved the best accent identification performance on the test corpus
for acoustic methods, with up to 90% identification rate. This performance is even better than
previously reported acoustic-phonotactic based systems on the same corpus, and is very close
to performance obtained via transcription based accent identification. Finally, we demonstrate
that the utilization of our techniques for speech recognition purposes leads to considerably
lower word error rates.
Keywords: Accent Identification, Gender Identification, Speaker Identification, Gaussian
Mixture Model, Support Vector Machine, i-Vector, Factor Analysis, Feature Extraction, British
English, Prosody, Speech Recognition
Constrained discriminative speaker verification specific to normalized i-vectors
International audienceThis paper focuses on discriminative trainings (DT) applied to i-vectors after Gaussian probabilistic linear discriminant analysis (PLDA). If DT has been successfully used with non-normalized vectors, this technique struggles to improve speaker detection when i-vectors have been first normalized, whereas the latter option has proven to achieve best performance in speaker verification. We propose an additional normalization procedure which limits the amount of coefficient to discriminatively train, with a minimal loss of accuracy. Adaptations of logistic regression based-DT to this new configuration are proposed, then we introduce a discriminative classifier for speaker verification which is a novelty in the field
Max-margin Metric Learning for Speaker Recognition
Probabilistic linear discriminant analysis (PLDA) is a popular normalization
approach for the i-vector model, and has delivered state-of-the-art performance
in speaker recognition. A potential problem of the PLDA model, however, is that
it essentially assumes Gaussian distributions over speaker vectors, which is
not always true in practice. Additionally, the objective function is not
directly related to the goal of the task, e.g., discriminating true speakers
and imposters. In this paper, we propose a max-margin metric learning approach
to solve the problems. It learns a linear transform with a criterion that the
margin between target and imposter trials are maximized. Experiments conducted
on the SRE08 core test show that compared to PLDA, the new approach can obtain
comparable or even better performance, though the scoring is simply a cosine
computation
- …