1,642 research outputs found

    Speaker verification using sequence discriminant support vector machines

    Get PDF
    This paper presents a text-independent speaker verification system using support vector machines (SVMs) with score-space kernels. Score-space kernels generalize Fisher kernels and are based on underlying generative models such as Gaussian mixture models (GMMs). This approach provides direct discrimination between whole sequences, in contrast with the frame-level approaches at the heart of most current systems. The resultant SVMs have a very high dimensionality since it is related to the number of parameters in the underlying generative model. To address problems that arise in the resultant optimization we introduce a technique called spherical normalization that preconditions the Hessian matrix. We have performed speaker verification experiments using the PolyVar database. The SVM system presented here reduces the relative error rates by 34% compared to a GMM likelihood ratio system

    SVMs for Automatic Speech Recognition: a Survey

    Get PDF
    Hidden Markov Models (HMMs) are, undoubtedly, the most employed core technique for Automatic Speech Recognition (ASR). Nevertheless, we are still far from achieving high-performance ASR systems. Some alternative approaches, most of them based on Artificial Neural Networks (ANNs), were proposed during the late eighties and early nineties. Some of them tackled the ASR problem using predictive ANNs, while others proposed hybrid HMM/ANN systems. However, despite some achievements, nowadays, the preponderance of Markov Models is a fact. During the last decade, however, a new tool appeared in the field of machine learning that has proved to be able to cope with hard classification problems in several fields of application: the Support Vector Machines (SVMs). The SVMs are effective discriminative classifiers with several outstanding characteristics, namely: their solution is that with maximum margin; they are capable to deal with samples of a very higher dimensionality; and their convergence to the minimum of the associated cost function is guaranteed. These characteristics have made SVMs very popular and successful. In this chapter we discuss their strengths and weakness in the ASR context and make a review of the current state-of-the-art techniques. We organize the contributions in two parts: isolated-word recognition and continuous speech recognition. Within the first part we review several techniques to produce the fixed-dimension vectors needed for original SVMs. Afterwards we explore more sophisticated techniques based on the use of kernels capable to deal with sequences of different length. Among them is the DTAK kernel, simple and effective, which rescues an old technique of speech recognition: Dynamic Time Warping (DTW). Within the second part, we describe some recent approaches to tackle more complex tasks like connected digit recognition or continuous speech recognition using SVMs. Finally we draw some conclusions and outline several ongoing lines of research

    Large Margin GMM for discriminative speaker verifi cation

    Get PDF
    International audienceGaussian mixture models (GMM), trained using the generative cri- terion of maximum likelihood estimation, have been the most popular ap- proach in speaker recognition during the last decades. This approach is also widely used in many other classi cation tasks and applications. Generative learning in not however the optimal way to address classi cation problems. In this paper we rst present a new algorithm for discriminative learning of diagonal GMM under a large margin criterion. This algorithm has the ma- jor advantage of being highly e cient, which allow fast discriminative GMM training using large scale databases. We then evaluate its performances on a full NIST speaker veri cation task using NIST-SRE'2006 data. In particular, we use the popular Symmetrical Factor Analysis (SFA) for session variability compensation. The results show that our system outperforms the state-of-the- art approaches of GMM-SFA and the SVM-based one, GSL-NAP. Relative reductions of the Equal Error Rate of about 9.33% and 14.88% are respec- tively achieved over these systems

    Combination of SVM and Large Margin GMM modeling for speaker identification

    Get PDF
    International audienceMost state-of-the-art speaker recognition systems are partially or completely based on Gaussian mixture models (GMM). GMM have been widely and successfully used in speaker recognition during the last decades. They are traditionally estimated from a world model using the generative criterion of Maximum A Posteriori. In an earlier work, we proposed an efficient algorithm for discriminative learning of GMM with diagonal covariances under a large margin criterion. In this paper, we evaluate the combination of the large margin GMM modeling approach with SVM in the setting of speaker identification. We carry out a full NIST speaker identification task using NIST-SRE'2006 data, in a Symmetrical Factor Analysis compensation scheme. The results show that the two modeling approaches are complementary and that their combination outperforms their single use

    Speaker Identication Using Discriminative Learning of Large Margin GMM

    Get PDF
    International audienceGaussian mixture models (GMM) have been widely and suc- cessfully used in speaker recognition during the last decades. They are generally trained using the generative criterion of maximum likelihood estimation. In an earlier work, we proposed an algorithm for discrimi- native training of GMM with diagonal covariances under a large margin criterion. In this paper, we present a new version of this algorithm which has the major advantage of being computationally highly e cient, thus well suited to handle large scale databases. We evaluate our fast algo- rithm in a Symmetrical Factor Analysis compensation scheme. We carry out a full NIST speaker identi cation task using NIST-SRE'2006 data. The results show that our system outperforms the traditional discrimina- tive approach of SVM-GMM supervectors. A 3.5% speaker identi cation rate improvement is achieved
    • 

    corecore