2,290 research outputs found
Duration mismatch compensation using four-covariance model and deep neural network for speaker verification
International audienceDuration mismatch between enrollment and test utterances still remains a major concern for reliability of real-life speaker recognition applications. Two approaches are proposed here to deal with this case when using the i-vector representation. The first one is an adaptation of Gaussian Probabilistic Linear Discriminant Analysis (PLDA) modeling, which can be extended to the case of any shift between i-vectors drawn from two distinct distributions. The second one attempts to map i-vectors of truncated segments of an utterance to the i-vector of the full segment, by the use of deep neural networks (DNN). Our results show that both new approaches outperform the standard PLDA by about 10 % relative, noting that these back-end methods could complement those quantifying the i-vector uncertainty during its extraction process, in the case of duration gap
A Subband-Based SVM Front-End for Robust ASR
This work proposes a novel support vector machine (SVM) based robust
automatic speech recognition (ASR) front-end that operates on an ensemble of
the subband components of high-dimensional acoustic waveforms. The key issues
of selecting the appropriate SVM kernels for classification in frequency
subbands and the combination of individual subband classifiers using ensemble
methods are addressed. The proposed front-end is compared with state-of-the-art
ASR front-ends in terms of robustness to additive noise and linear filtering.
Experiments performed on the TIMIT phoneme classification task demonstrate the
benefits of the proposed subband based SVM front-end: it outperforms the
standard cepstral front-end in the presence of noise and linear filtering for
signal-to-noise ratio (SNR) below 12-dB. A combination of the proposed
front-end with a conventional front-end such as MFCC yields further
improvements over the individual front ends across the full range of noise
levels
Support vector machine regression for robust speaker verification in mismatching and forensic conditions
The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-01793-3_50Proceedings of Third International Conference, ICB 2009, Alghero, ItalyIn this paper we propose the use of Support Vector Machine Regression (SVR) for robust speaker verification in two scenarios: i) strong mismatch in speech conditions and ii) forensic environment. The proposed approach seeks robustness to situations where a proper background database is reduced or not present, a situation typical in forensic cases which has been called database mismatch. For the mismatching condition scenario, we use the NIST SRE 2008 core task as a highly variable environment, but with a mostly representative background set coming from past NIST evaluations. For the forensic scenario, we use the Ahumada III database, a public corpus in Spanish coming from real authored forensic cases collected by Spanish Guardia Civil. We show experiments illustrating the robustness of a SVR scheme using a GLDS kernel under strong session variability, even when no session variability is applied, and especially in the forensic scenario, under database mismatch.This work has been supported by the Spanish Ministry of Education under project TEC2006-13170-C02-0
- …