5,504 research outputs found

    Speaker verification using sequence discriminant support vector machines

    Get PDF
    This paper presents a text-independent speaker verification system using support vector machines (SVMs) with score-space kernels. Score-space kernels generalize Fisher kernels and are based on underlying generative models such as Gaussian mixture models (GMMs). This approach provides direct discrimination between whole sequences, in contrast with the frame-level approaches at the heart of most current systems. The resultant SVMs have a very high dimensionality since it is related to the number of parameters in the underlying generative model. To address problems that arise in the resultant optimization we introduce a technique called spherical normalization that preconditions the Hessian matrix. We have performed speaker verification experiments using the PolyVar database. The SVM system presented here reduces the relative error rates by 34% compared to a GMM likelihood ratio system

    Emotion Recognition from Acted and Spontaneous Speech

    Get PDF
    DizertačnĂ­ prĂĄce se zabĂœvĂĄ rozpoznĂĄnĂ­m emočnĂ­ho stavu mluvčích z ƙečovĂ©ho signĂĄlu. PrĂĄce je rozdělena do dvou hlavnĂ­ch častĂ­, prvnĂ­ část popisuju navrĆŸenĂ© metody pro rozpoznĂĄnĂ­ emočnĂ­ho stavu z hranĂœch databĂĄzĂ­. V rĂĄmci tĂ©to části jsou pƙedstaveny vĂœsledky rozpoznĂĄnĂ­ pouĆŸitĂ­m dvou rĆŻznĂœch databĂĄzĂ­ s rĆŻznĂœmi jazyky. HlavnĂ­mi pƙínosy tĂ©to části je detailnĂ­ analĂœza rozsĂĄhlĂ© ĆĄkĂĄly rĆŻznĂœch pƙíznakĆŻ zĂ­skanĂœch z ƙečovĂ©ho signĂĄlu, nĂĄvrh novĂœch klasifikačnĂ­ch architektur jako je napƙíklad „emočnĂ­ pĂĄrovĂĄní“ a nĂĄvrh novĂ© metody pro mapovĂĄnĂ­ diskrĂ©tnĂ­ch emočnĂ­ch stavĆŻ do dvou dimenzionĂĄlnĂ­ho prostoru. DruhĂĄ část se zabĂœvĂĄ rozpoznĂĄnĂ­m emočnĂ­ch stavĆŻ z databĂĄze spontĂĄnnĂ­ ƙeči, kterĂĄ byla zĂ­skĂĄna ze zĂĄznamĆŻ hovorĆŻ z reĂĄlnĂœch call center. Poznatky z analĂœzy a nĂĄvrhu metod rozpoznĂĄnĂ­ z hranĂ© ƙeči byly vyuĆŸity pro nĂĄvrh novĂ©ho systĂ©mu pro rozpoznĂĄnĂ­ sedmi spontĂĄnnĂ­ch emočnĂ­ch stavĆŻ. JĂĄdrem navrĆŸenĂ©ho pƙístupu je komplexnĂ­ klasifikačnĂ­ architektura zaloĆŸena na fĂșzi rĆŻznĂœch systĂ©mĆŻ. PrĂĄce se dĂĄle zabĂœvĂĄ vlivem emočnĂ­ho stavu mluvčího na Ășspěơnosti rozpoznĂĄnĂ­ pohlavĂ­ a nĂĄvrhem systĂ©mu pro automatickou detekci ĂșspěơnĂœch hovorĆŻ v call centrech na zĂĄkladě analĂœzy parametrĆŻ dialogu mezi ĂșčastnĂ­ky telefonnĂ­ch hovorĆŻ.Doctoral thesis deals with emotion recognition from speech signals. The thesis is divided into two main parts; the first part describes proposed approaches for emotion recognition using two different multilingual databases of acted emotional speech. The main contributions of this part are detailed analysis of a big set of acoustic features, new classification schemes for vocal emotion recognition such as “emotion coupling” and new method for mapping discrete emotions into two-dimensional space. The second part of this thesis is devoted to emotion recognition using multilingual databases of spontaneous emotional speech, which is based on telephone records obtained from real call centers. The knowledge gained from experiments with emotion recognition from acted speech was exploited to design a new approach for classifying seven emotional states. The core of the proposed approach is a complex classification architecture based on the fusion of different systems. The thesis also examines the influence of speaker’s emotional state on gender recognition performance and proposes system for automatic identification of successful phone calls in call center by means of dialogue features.
    • 

    corecore