3,204 research outputs found
Weighted LDA techniques for I-vector based speaker verification
This paper introduces the Weighted Linear Discriminant Analysis (WLDA) technique, based upon the weighted pairwise Fisher criterion, for the purposes of improving i-vector speaker verification in the presence of high intersession variability. By taking advantage of the speaker discriminative information that is available in the distances between pairs of speakers clustered in the development i-vector space, the WLDA technique is shown to provide an improvement in speaker verification performance over traditional Linear Discriminant Analysis (LDA) approaches. A similar approach is also taken to extend the recently developed Source Normalised LDA (SNLDA) into Weighted SNLDA (WSNLDA) which, similarly, shows an improvement in speaker verification performance in both matched and mismatched enrolment/verification conditions. Based upon the results presented within this paper using the NIST 2008 Speaker Recognition Evaluation dataset, we believe that both WLDA and WSNLDA are viable as replacement techniques to improve the performance of LDA and SNLDA-based i-vector speaker verification
Support vector regression in NIST SRE 2008 multichannel core task
Actas de las V Jornadas en Tecnología del Habla (JTH 2008)This paper explores two alternatives for speaker verification
using Generalized Linear Discriminant Sequence (GLDS)
kernel: classical Support Vector Classification (SVC), and
Support Vector Regression (SVR), recently proposed by the
authors as a more robust approach for telephone speech. In
this work we address a more challenging environment, the
NIST SRE 2008 multichannel core task, where strong
mismatch is introduced by the use of different microphones
and recordings from interviews. Channel compensation based
in Nuisance Attribute Projection (NAP) has also been
investigated in order to analyze its impact for both
approaches. Experiments show that, although both techniques
show a significant improvement over SVC-GLDS when NAP
is used, SVR is also robust to channel mismatch even when
channel compensation is not used. This avoids the need of a
considerable set of training data adapted to the operational
scenario, whose availability is not frequent in general. Results
show a similar performance for SVR-GLDS without NAP and
SVC-GLDS with NAP. Moreover, SVR-GLDS results are
promising, since other configurations and methods for channel
compensation can further improve performance.This work has been supported by the Spanish Ministry of Education under project TEC2006-13170-C02-01
Compensation of Nuisance Factors for Speaker and Language Recognition
The variability of the channel and environment is
one of the most important factors affecting the performance of
text-independent speaker verification systems. The best techniques
for channel compensation are model based. Most of them have
been proposed for Gaussian mixture models, while in the feature
domain blind channel compensation is usually performed. The
aim of this work is to explore techniques that allow more accurate
intersession compensation in the feature domain. Compensating
the features rather than the models has the advantage that the
transformed parameters can be used with models of a different
nature and complexity and for different tasks. In this paper,
we evaluate the effects of the compensation of the intersession
variability obtained by means of the channel factors approach. In
particular, we compare channel variability modeling in the usual
Gaussian mixture model domain, and our proposed feature domain
compensation technique. We show that the two approaches
lead to similar results on the NIST 2005 Speaker Recognition
Evaluation data with a reduced computation cost. We also report
the results of a system, based on the intersession compensation
technique in the feature space that was among the best participants
in the NIST 2006 Speaker Recognition Evaluation. Moreover, we
show how we obtained significant performance improvement in
language recognition by estimating and compensating, in the
feature domain, the distortions due to interspeaker variability
within the same language.
Index Terms—Factor anal
Multilevel and session variability compensated language recognition: ATVS-UAM systems at NIST LRE 2009
Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. J. Gonzalez-Dominguez, I. Lopez-Moreno, J. Franco-Pedroso, D. Ramos, D. T. Toledano, and J. Gonzalez-Rodriguez, "Multilevel and Session Variability Compensated Language Recognition: ATVS-UAM Systems at NIST LRE 2009" IEEE Journal of Selected Topics in Signal Processing, vol. 4, no. 6, pp. 1084 – 1093, December 2010This work presents the systems submitted by the
ATVS Biometric Recognition Group to the 2009 Language Recognition
Evaluation (LRE’09), organized by NIST. New challenges
included in this LRE edition can be summarized by three main
differences with respect to past evaluations. Firstly, the number
of languages to be recognized expanded to 23 languages from 14
in 2007, and 7 in 2005. Secondly, the data variability has been
increased by including telephone speech excerpts extracted from
Voice of America (VOA) radio broadcasts through Internet in
addition to Conversational Telephone Speech (CTS). The third
difference was the volume of data, involving in this evaluation
up to 2 terabytes of speech data for development, which is an
order of magnitude greater than past evaluations. LRE’09 thus
required participants to develop robust systems able not only to
successfully face the session variability problem but also to do
it with reasonable computational resources. ATVS participation
consisted of state-of-the-art acoustic and high-level systems focussing
on these issues. Furthermore, the problem of finding a
proper combination and calibration of the information obtained
at different levels of the speech signal was widely explored in this
submission. In this work, two original contributions were developed.
The first contribution was applying a session variability
compensation scheme based on Factor Analysis (FA) within the
statistics domain into a SVM-supervector (SVM-SV) approach.
The second contribution was the employment of a novel backend
based on anchor models in order to fuse individual systems
prior to one-vs-all calibration via logistic regression. Results both
in development and evaluation corpora show the robustness and
excellent performance of the submitted systems, exemplified by
our system ranked 2nd in the 30 second open-set condition, with
remarkably scarce computational resources.This work has been supported by the Spanish Ministry of Education under project TEC2006-13170-C02-01. Javier
Gonzalez-Dominguez also thanks Spanish Ministry of Education for supporting his doctoral research under project
TEC2006-13141-C03-03. Special thanks are given to Dr. David Van Leeuwen from TNO Human Factors (Utrech, The
Netherlands) for his strong collaboration, valuable discussions and ideas. Also, authors thank to Dr. Patrick Lucey for his
final support on (non-target) Australian English review of the manuscript
Intersession Variability Compensation in Language and Speaker Identification
Variabilita kanálu a hovoru je velmi důležitým problémem v úloze rozpoznávání mluvčího. V současné době je ve velkém množství vědeckých článků uvedeno několik technik pro kompenzaci vlivu kanálu. Kompenzace vlivu kanálu může být implementována jak v doméně modelu, tak i v doménách příznaků i skóre. Relativně nová výkoná technika je takzvaná eigenchannel adaptace pro GMM (Gaussian Mixture Models). Mevýhodou této metody je nemožnost její aplikace na jiné klasifikátory, jako napřílad takzvané SVM (Support Vector Machines), GMM s různým počtem Gausových komponent nebo v rozpoznávání řeči s použitím skrytých markovových modelů (HMM). Řešením může být aproximace této metody, eigenchannel adaptace v doméně příznaků. Obě tyto techniky, eigenchannel adaptace v doméně modelu a doméně příznaků v systémech rozpoznávání mluvčího, jsou uvedeny v této práci. Po dosažení dobrých výsledků v rozpoznávání mluvčího, byl přínos těchto technik zkoumán pro akustický systém rozpoznávání jazyka zahrnující 14 jazyků. V této úloze má nežádoucí vliv nejen variabilita kanálu, ale i variabilita mluvčího. Výsledky jsou prezentovány na datech definovaných pro evaluaci rozpoznávání mluvčího z roku 2006 a evaluaci rozpoznávání jazyka v roce 2007, obě organizované Amerických Národním Institutem pro Standard a Technologie (NIST)Varibiality in the channel and session is an important issue in the text-independent speaker recognition task. To date, several techniques providing channel and session variability compensation were introduced in a number of scientic papers. Such implementation can be done in feature, model and score domain. Relatively new and powerful approach to remove channel distortion is so-called eigenchannel adaptation for Gaussian Mixture Models (GMM). The drawback of the technique is that it is not applicable in its original implementation to different types of classifiers, eg. Support Vector Machines (SVM), GMM with different number of Gaussians or in speech recognition task using Hidden Markov Models (HMM). The solution can be the approximation of the technique, eigenchannel adaptation in feature domain. Both, the original eigenchannel adaptation and eigenchannel adaptation on features in task of speaker recognition are presented. After achieving good results in speaker recognition, contribution of the same techniques was examined in acoustic language identification system with languages. In this task undesired factors are channel and speaker variability. Presented results are presented on the NIST Speaker Recognition Evaluation 2006 data and NIST Language Recognition Evaluation 2007 data.
- …