4,795 research outputs found
Cross match-CHMM fusion for speaker adaptation of voice biometric
The most significant factor affecting automatic voice biometric performance is the variation in the signal characteristics, due to speaker-based variability, conversation-based variability and technology variability. These variations give great challenge in accurately modeling and verifying a speaker. To solve this variability effects, the cross match (CM) technique is proposed to provide a speaker model that can adapt to variability over periods of time. Using limited amount of enrollment utterances, a client barcode is generated and can be updated by cross matching the client barcode with new data. Furthermore, CM adds the dimension of multimodality at the fusion-level when the similarity score from CM can be fused with the score from the default speaker modeling. The scores need to be normalized before the fusion takes place. By fusing the CM with continuous Hidden Markov Model (CHMM), the new adapted model gave significant improvement in identification and verification task, where the equal error rate (EER) decreased from 6.51% to 1.23% in speaker identification and from 5.87% to 1.04% in speaker verification. EER also decreased over time (across five sessions) when the CM is applied. The best combination of normalization and fusion technique methods is piecewise-linear method and weighted sum
Compensation of Nuisance Factors for Speaker and Language Recognition
The variability of the channel and environment is
one of the most important factors affecting the performance of
text-independent speaker verification systems. The best techniques
for channel compensation are model based. Most of them have
been proposed for Gaussian mixture models, while in the feature
domain blind channel compensation is usually performed. The
aim of this work is to explore techniques that allow more accurate
intersession compensation in the feature domain. Compensating
the features rather than the models has the advantage that the
transformed parameters can be used with models of a different
nature and complexity and for different tasks. In this paper,
we evaluate the effects of the compensation of the intersession
variability obtained by means of the channel factors approach. In
particular, we compare channel variability modeling in the usual
Gaussian mixture model domain, and our proposed feature domain
compensation technique. We show that the two approaches
lead to similar results on the NIST 2005 Speaker Recognition
Evaluation data with a reduced computation cost. We also report
the results of a system, based on the intersession compensation
technique in the feature space that was among the best participants
in the NIST 2006 Speaker Recognition Evaluation. Moreover, we
show how we obtained significant performance improvement in
language recognition by estimating and compensating, in the
feature domain, the distortions due to interspeaker variability
within the same language.
Index Terms—Factor anal
- …