741 research outputs found

    Exploiting Low-dimensional Structures to Enhance DNN Based Acoustic Modeling in Speech Recognition

    Get PDF
    We propose to model the acoustic space of deep neural network (DNN) class-conditional posterior probabilities as a union of low-dimensional subspaces. To that end, the training posteriors are used for dictionary learning and sparse coding. Sparse representation of the test posteriors using this dictionary enables projection to the space of training data. Relying on the fact that the intrinsic dimensions of the posterior subspaces are indeed very small and the matrix of all posteriors belonging to a class has a very low rank, we demonstrate how low-dimensional structures enable further enhancement of the posteriors and rectify the spurious errors due to mismatch conditions. The enhanced acoustic modeling method leads to improvements in continuous speech recognition task using hybrid DNN-HMM (hidden Markov model) framework in both clean and noisy conditions, where upto 15.4% relative reduction in word error rate (WER) is achieved

    Anti-spoofing Methods for Automatic SpeakerVerification System

    Full text link
    Growing interest in automatic speaker verification (ASV)systems has lead to significant quality improvement of spoofing attackson them. Many research works confirm that despite the low equal er-ror rate (EER) ASV systems are still vulnerable to spoofing attacks. Inthis work we overview different acoustic feature spaces and classifiersto determine reliable and robust countermeasures against spoofing at-tacks. We compared several spoofing detection systems, presented so far,on the development and evaluation datasets of the Automatic SpeakerVerification Spoofing and Countermeasures (ASVspoof) Challenge 2015.Experimental results presented in this paper demonstrate that the useof magnitude and phase information combination provides a substantialinput into the efficiency of the spoofing detection systems. Also wavelet-based features show impressive results in terms of equal error rate. Inour overview we compare spoofing performance for systems based on dif-ferent classifiers. Comparison results demonstrate that the linear SVMclassifier outperforms the conventional GMM approach. However, manyresearchers inspired by the great success of deep neural networks (DNN)approaches in the automatic speech recognition, applied DNN in thespoofing detection task and obtained quite low EER for known and un-known type of spoofing attacks.Comment: 12 pages, 0 figures, published in Springer Communications in Computer and Information Science (CCIS) vol. 66

    Compensation of Nuisance Factors for Speaker and Language Recognition

    Get PDF
    The variability of the channel and environment is one of the most important factors affecting the performance of text-independent speaker verification systems. The best techniques for channel compensation are model based. Most of them have been proposed for Gaussian mixture models, while in the feature domain blind channel compensation is usually performed. The aim of this work is to explore techniques that allow more accurate intersession compensation in the feature domain. Compensating the features rather than the models has the advantage that the transformed parameters can be used with models of a different nature and complexity and for different tasks. In this paper, we evaluate the effects of the compensation of the intersession variability obtained by means of the channel factors approach. In particular, we compare channel variability modeling in the usual Gaussian mixture model domain, and our proposed feature domain compensation technique. We show that the two approaches lead to similar results on the NIST 2005 Speaker Recognition Evaluation data with a reduced computation cost. We also report the results of a system, based on the intersession compensation technique in the feature space that was among the best participants in the NIST 2006 Speaker Recognition Evaluation. Moreover, we show how we obtained significant performance improvement in language recognition by estimating and compensating, in the feature domain, the distortions due to interspeaker variability within the same language. Index Terms—Factor anal
    • …
    corecore