17 research outputs found
Compensation of Nuisance Factors for Speaker and Language Recognition
The variability of the channel and environment is
one of the most important factors affecting the performance of
text-independent speaker verification systems. The best techniques
for channel compensation are model based. Most of them have
been proposed for Gaussian mixture models, while in the feature
domain blind channel compensation is usually performed. The
aim of this work is to explore techniques that allow more accurate
intersession compensation in the feature domain. Compensating
the features rather than the models has the advantage that the
transformed parameters can be used with models of a different
nature and complexity and for different tasks. In this paper,
we evaluate the effects of the compensation of the intersession
variability obtained by means of the channel factors approach. In
particular, we compare channel variability modeling in the usual
Gaussian mixture model domain, and our proposed feature domain
compensation technique. We show that the two approaches
lead to similar results on the NIST 2005 Speaker Recognition
Evaluation data with a reduced computation cost. We also report
the results of a system, based on the intersession compensation
technique in the feature space that was among the best participants
in the NIST 2006 Speaker Recognition Evaluation. Moreover, we
show how we obtained significant performance improvement in
language recognition by estimating and compensating, in the
feature domain, the distortions due to interspeaker variability
within the same language.
Index Terms—Factor anal
Nuance - Politecnico di Torino's 2012 NIST Speaker Recognition Evaluation System
This paper describes the Nuance-Politecnico di Torino (NPT) speaker recognition system submitted to the NIST SRE12 evaluation campaign. Included are the results of postevaluation tests, focusing on the analysis of the effects of score normalization and condition-dependent calibration. The submitted system combines the results of five acoustic recognizers all based on Gaussian Mixture Models (GMMs). Each system has its own front end, with features differing by their type and dimension. We illustrate the process of development data selection and configuration of state-of-the-art technology, which contributed to obtaining good performance in all the test conditions proposed in this evaluation
Nuance - Politecnico di Torino’s 2012 NIST Speaker Recognition Evaluation System
This paper describes the Nuance–Politecnico di Torino (NPT)
speaker recognition system submitted to the NIST SRE12
evaluation campaign. Included are the results of postevaluation
tests, focusing on the analysis of the effects of score
normalization and condition-dependent calibration. The
submitted system combines the results of five acoustic
recognizers all based on Gaussian Mixture Models (GMMs).
Each system has its own front end, with features differing by
their type and dimension. We illustrate the process of
development data selection and configuration of state-of-the-art technology, which contributed to obtaining good performance
in all the test conditions proposed in this evaluation
Audio segmentation-by-classification approach based on factor analysis in broadcast news domain
This paper studies a novel audio segmentation-by-classification approach based on factor analysis. The proposed technique compensates the within-class variability by using class-dependent factor loading matrices and obtains the scores by computing the log-likelihood ratio for the class model to a non-class model over fixed-length windows. Afterwards, these scores are smoothed to yield longer contiguous segments of the same class by means of different back-end systems. Unlike previous solutions, our proposal does not make use of specific acoustic features and does not need a hierarchical structure. The proposed method is applied to segment and classify audios coming from TV shows into five different acoustic classes: speech, music, speech with music, speech with noise, and others. The technique is compared to a hierarchical system with specific acoustic features achieving a significant error reduction