1,638 research outputs found

    Speaker verification using sequence discriminant support vector machines

    Get PDF
    This paper presents a text-independent speaker verification system using support vector machines (SVMs) with score-space kernels. Score-space kernels generalize Fisher kernels and are based on underlying generative models such as Gaussian mixture models (GMMs). This approach provides direct discrimination between whole sequences, in contrast with the frame-level approaches at the heart of most current systems. The resultant SVMs have a very high dimensionality since it is related to the number of parameters in the underlying generative model. To address problems that arise in the resultant optimization we introduce a technique called spherical normalization that preconditions the Hessian matrix. We have performed speaker verification experiments using the PolyVar database. The SVM system presented here reduces the relative error rates by 34% compared to a GMM likelihood ratio system

    SVMs for Automatic Speech Recognition: a Survey

    Get PDF
    Hidden Markov Models (HMMs) are, undoubtedly, the most employed core technique for Automatic Speech Recognition (ASR). Nevertheless, we are still far from achieving high-performance ASR systems. Some alternative approaches, most of them based on Artificial Neural Networks (ANNs), were proposed during the late eighties and early nineties. Some of them tackled the ASR problem using predictive ANNs, while others proposed hybrid HMM/ANN systems. However, despite some achievements, nowadays, the preponderance of Markov Models is a fact. During the last decade, however, a new tool appeared in the field of machine learning that has proved to be able to cope with hard classification problems in several fields of application: the Support Vector Machines (SVMs). The SVMs are effective discriminative classifiers with several outstanding characteristics, namely: their solution is that with maximum margin; they are capable to deal with samples of a very higher dimensionality; and their convergence to the minimum of the associated cost function is guaranteed. These characteristics have made SVMs very popular and successful. In this chapter we discuss their strengths and weakness in the ASR context and make a review of the current state-of-the-art techniques. We organize the contributions in two parts: isolated-word recognition and continuous speech recognition. Within the first part we review several techniques to produce the fixed-dimension vectors needed for original SVMs. Afterwards we explore more sophisticated techniques based on the use of kernels capable to deal with sequences of different length. Among them is the DTAK kernel, simple and effective, which rescues an old technique of speech recognition: Dynamic Time Warping (DTW). Within the second part, we describe some recent approaches to tackle more complex tasks like connected digit recognition or continuous speech recognition using SVMs. Finally we draw some conclusions and outline several ongoing lines of research

    One-Class Classification: Taxonomy of Study and Review of Techniques

    Full text link
    One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure

    AdS/QCD, Light-Front Holography, and Sublimated Gluons

    Full text link
    Gauge/gravity duality leads to a simple, analytical, and phenomenologically compelling nonperturbative approximation to the full light-front QCD Hamiltonian. This approach, called "Light-Front Holography", successfully describes the spectroscopy of light-quark meson and baryons, their elastic and transition form factors, and other hadronic properties. The bound-state Schrodinger and Dirac equations of the soft-wall AdS/QCD model predict linear Regge trajectories which have the same slope in orbital angular momentum L and radial quantum number n for both mesons and baryons. Light-front holography connects the fifth-dimensional coordinate of AdS space z to an invariant impact separation variable zeta in 3+1 space at fixed light-front time. A key feature is the determination of the frame-independent light-front wavefunctions of hadrons -- the relativistic analogs of the Schrodinger wavefunctions of atomic physics which allow one to compute form factors, transversity distributions, spin properties of the valence quarks, jet hadronization, and other hadronic observables. One thus obtains a one-parameter color-confining model for hadron physics at the amplitude level. AdS/QCD also predicts the form of a non-perturbative effective running coupling and its beta-function with an infrared fixed point which agrees with the effective coupling extracted from measurements of the Bjorken sum rule below 1 GeV^2. This is consistent with a flux-tube interpretation of QCD where soft gluons are sublimated into a color-confining potential for quarks. We discuss a number of phenomenological hadronic properties which support this picture.Comment: Invited talk, presented by SJB at the International Workshop on QCD Green's Functions, Confinement and Phenomenology, 5-9 September 2011, Trento, Ital

    Speaker specific feature based clustering and its applications in language independent forensic speaker recognition

    Get PDF
    Forensic speaker recognition (FSR) is the process of determining whether the source of a questioned voice recording (trace) is a specific individual (suspected speaker). The role of the forensic expert is to testify by using, if possible, a quantitative measure of this value to the value of the voice evidence. Using this information as an aid in their judgments and decisions are up to the judge and/or the jury. Most existing methods measure inter-utterance similarities directly based on spectrum-based characteristics, the resulting clusters may not be well related to speaker’s, but rather to different acoustic classes. This research addresses this deficiency by projecting language-independent utterances into a reference space equipped to cover the standard voice features underlying the entire utterance set. The resulting projection vectors naturally represent the language-independent voice-like relationships among all the utterances and are therefore more robust against non-speaker interference. Then a clustering approach is proposed based on the peak approximation in order to maximize the similarities between language-independent utterances within all clusters. This method uses a K-medoid, Fuzzy C-means, Gustafson and Kessel and Gath-Geva algorithm to evaluate the cluster to which each utterance should be allocated, overcoming the disadvantage of traditional hierarchical clustering that the ultimate outcome can only hit the optimum recognition efficiency. The recognition efficiency of K-medoid, Fuzzy C-means, Gustafson and Kessel and Gath-Geva clustering algorithms are 95.2%, 97.3%, 98.5% and 99.7% and EER are 3.62%, 2.91 %, 2.82%, and 2.61% respectively. The EER improvement of the Gath-Geva technique based FSRsystem compared with Gustafson and Kessel and Fuzzy C-means is 8.04% and 11.49% respectivel
    corecore