5,504 research outputs found
Speaker verification using sequence discriminant support vector machines
This paper presents a text-independent speaker verification system using support vector machines (SVMs) with score-space kernels. Score-space kernels generalize Fisher kernels and are based on underlying generative models such as Gaussian mixture models (GMMs). This approach provides direct discrimination between whole sequences, in contrast with the frame-level approaches at the heart of most current systems. The resultant SVMs have a very high dimensionality since it is related to the number of parameters in the underlying generative model. To address problems that arise in the resultant optimization we introduce a technique called spherical normalization that preconditions the Hessian matrix. We have performed speaker verification experiments using the PolyVar database. The SVM system presented here reduces the relative error rates by 34% compared to a GMM likelihood ratio system
Emotion Recognition from Acted and Spontaneous Speech
DizertaÄnĂ prĂĄce se zabĂœvĂĄ rozpoznĂĄnĂm emoÄnĂho stavu mluvÄĂch z ĆeÄovĂ©ho signĂĄlu. PrĂĄce je rozdÄlena do dvou hlavnĂch ÄastĂ, prvnĂ ÄĂĄst popisuju navrĆŸenĂ© metody pro rozpoznĂĄnĂ emoÄnĂho stavu z hranĂœch databĂĄzĂ. V rĂĄmci tĂ©to ÄĂĄsti jsou pĆedstaveny vĂœsledky rozpoznĂĄnĂ pouĆŸitĂm dvou rĆŻznĂœch databĂĄzĂ s rĆŻznĂœmi jazyky. HlavnĂmi pĆĂnosy tĂ©to ÄĂĄsti je detailnĂ analĂœza rozsĂĄhlĂ© ĆĄkĂĄly rĆŻznĂœch pĆĂznakĆŻ zĂskanĂœch z ĆeÄovĂ©ho signĂĄlu, nĂĄvrh novĂœch klasifikaÄnĂch architektur jako je napĆĂklad âemoÄnĂ pĂĄrovĂĄnĂâ a nĂĄvrh novĂ© metody pro mapovĂĄnĂ diskrĂ©tnĂch emoÄnĂch stavĆŻ do dvou dimenzionĂĄlnĂho prostoru. DruhĂĄ ÄĂĄst se zabĂœvĂĄ rozpoznĂĄnĂm emoÄnĂch stavĆŻ z databĂĄze spontĂĄnnĂ ĆeÄi, kterĂĄ byla zĂskĂĄna ze zĂĄznamĆŻ hovorĆŻ z reĂĄlnĂœch call center. Poznatky z analĂœzy a nĂĄvrhu metod rozpoznĂĄnĂ z hranĂ© ĆeÄi byly vyuĆŸity pro nĂĄvrh novĂ©ho systĂ©mu pro rozpoznĂĄnĂ sedmi spontĂĄnnĂch emoÄnĂch stavĆŻ. JĂĄdrem navrĆŸenĂ©ho pĆĂstupu je komplexnĂ klasifikaÄnĂ architektura zaloĆŸena na fĂșzi rĆŻznĂœch systĂ©mĆŻ. PrĂĄce se dĂĄle zabĂœvĂĄ vlivem emoÄnĂho stavu mluvÄĂho na ĂșspÄĆĄnosti rozpoznĂĄnĂ pohlavĂ a nĂĄvrhem systĂ©mu pro automatickou detekci ĂșspÄĆĄnĂœch hovorĆŻ v call centrech na zĂĄkladÄ analĂœzy parametrĆŻ dialogu mezi ĂșÄastnĂky telefonnĂch hovorĆŻ.Doctoral thesis deals with emotion recognition from speech signals. The thesis is divided into two main parts; the first part describes proposed approaches for emotion recognition using two different multilingual databases of acted emotional speech. The main contributions of this part are detailed analysis of a big set of acoustic features, new classification schemes for vocal emotion recognition such as âemotion couplingâ and new method for mapping discrete emotions into two-dimensional space. The second part of this thesis is devoted to emotion recognition using multilingual databases of spontaneous emotional speech, which is based on telephone records obtained from real call centers. The knowledge gained from experiments with emotion recognition from acted speech was exploited to design a new approach for classifying seven emotional states. The core of the proposed approach is a complex classification architecture based on the fusion of different systems. The thesis also examines the influence of speakerâs emotional state on gender recognition performance and proposes system for automatic identification of successful phone calls in call center by means of dialogue features.
- âŠ