Search CORE

2,870 research outputs found

Emotion Recognition from Acted and Spontaneous Speech

Author: Atassi Hicham
Publication venue: Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií
Publication date: 01/01/2014
Field of study

Dizertační práce se zabývá rozpoznáním emočního stavu mluvčích z řečového signálu. Práce je rozdělena do dvou hlavních častí, první část popisuju navržené metody pro rozpoznání emočního stavu z hraných databází. V rámci této části jsou představeny výsledky rozpoznání použitím dvou různých databází s různými jazyky. Hlavními přínosy této části je detailní analýza rozsáhlé škály různých příznaků získaných z řečového signálu, návrh nových klasifikačních architektur jako je například „emoční párování“ a návrh nové metody pro mapování diskrétních emočních stavů do dvou dimenzionálního prostoru. Druhá část se zabývá rozpoznáním emočních stavů z databáze spontánní řeči, která byla získána ze záznamů hovorů z reálných call center. Poznatky z analýzy a návrhu metod rozpoznání z hrané řeči byly využity pro návrh nového systému pro rozpoznání sedmi spontánních emočních stavů. Jádrem navrženého přístupu je komplexní klasifikační architektura založena na fúzi různých systémů. Práce se dále zabývá vlivem emočního stavu mluvčího na úspěšnosti rozpoznání pohlaví a návrhem systému pro automatickou detekci úspěšných hovorů v call centrech na základě analýzy parametrů dialogu mezi účastníky telefonních hovorů.Doctoral thesis deals with emotion recognition from speech signals. The thesis is divided into two main parts; the first part describes proposed approaches for emotion recognition using two different multilingual databases of acted emotional speech. The main contributions of this part are detailed analysis of a big set of acoustic features, new classification schemes for vocal emotion recognition such as “emotion coupling” and new method for mapping discrete emotions into two-dimensional space. The second part of this thesis is devoted to emotion recognition using multilingual databases of spontaneous emotional speech, which is based on telephone records obtained from real call centers. The knowledge gained from experiments with emotion recognition from acted speech was exploited to design a new approach for classifying seven emotional states. The core of the proposed approach is a complex classification architecture based on the fusion of different systems. The thesis also examines the influence of speaker’s emotional state on gender recognition performance and proposes system for automatic identification of successful phone calls in call center by means of dialogue features.

Digital library of Brno University of Technology

National Repository of Grey Literature

Enhancing the front-end of speaker recognition systems

Author: Ahmed Ahmed Isam
Publication venue
Publication date: 01/07/2019
Field of study

Portsmouth University Research Portal (Pure)

Speaker segmentation and clustering

Author: Ajmera
Ajmera
Almpanidis
Barras
Bimbot
Campbell
Campbell
Cettolo
Constantine Kotropoulos
Delacourt
Deller
Fiscus
Gales
Garofolo
Godfrey
Graff
Graff
Graff
Hansen
Harb
Hess
Huang
Jain
Kim
Know
Lapidot
Lu
Manjunath
Margarita Kotti
Meignier
Oppenheim
Pellom
Reynolds
Sondhi
Tranter
Vassiliki Moschou
Ververidis
Wang
Wu
Wu
Zhou
Zhu
Publication venue: 'Elsevier BV'
Publication date: 01/01/2008
Field of study

This survey focuses on two challenging speech processing topics, namely: speaker segmentation and speaker clustering. Speaker segmentation aims at finding speaker change points in an audio stream, whereas speaker clustering aims at grouping speech segments based on speaker characteristics. Model-based, metric-based, and hybrid speaker segmentation algorithms are reviewed. Concerning speaker clustering, deterministic and probabilistic algorithms are examined. A comparative assessment of the reviewed algorithms is undertaken, the algorithm advantages and disadvantages are indicated, insight to the algorithms is offered, and deductions as well as recommendations are given. Rich transcription and movie analysis are candidate applications that benefit from combined speaker segmentation and clustering. © 2007 Elsevier B.V. All rights reserved

CiteSeerX

Crossref

Spiral - Imperial College Digital Repository

An investigation of supervector regression for forensic voice comparison on small data

International audienceThe present paper deals with an observer design for a nonlinear lateral vehicle model. The nonlinear model is represented by an exact Takagi-Sugeno (TS) model via the sector nonlinearity transformation. A proportional multiple integral observer (PMIO) based on the TS model is designed to estimate simultaneously the state vector and the unknown input (road curvature). The convergence conditions of the estimation error are expressed under LMI formulation using the Lyapunov theory which guaranties bounded error. Simulations are carried out and experimental results are provided to illustrate the proposed observer

HAL Evry

Crossref

Springer - Publisher Connector

Affective Music Information Retrieval

Author: Wang Hsin-Min
Wang Ju-Chiang
Yang Yi-Hsuan
Publication venue
Publication date: 18/02/2015
Field of study

Much of the appeal of music lies in its power to convey emotions/moods and to evoke them in listeners. In consequence, the past decade witnessed a growing interest in modeling emotions from musical signals in the music information retrieval (MIR) community. In this article, we present a novel generative approach to music emotion modeling, with a specific focus on the valence-arousal (VA) dimension model of emotion. The presented generative model, called \emph{acoustic emotion Gaussians} (AEG), better accounts for the subjectivity of emotion perception by the use of probability distributions. Specifically, it learns from the emotion annotations of multiple subjects a Gaussian mixture model in the VA space with prior constraints on the corresponding acoustic features of the training music pieces. Such a computational framework is technically sound, capable of learning in an online fashion, and thus applicable to a variety of applications, including user-independent (general) and user-dependent (personalized) emotion recognition and emotion-based music retrieval. We report evaluations of the aforementioned applications of AEG on a larger-scale emotion-annotated corpora, AMG1608, to demonstrate the effectiveness of AEG and to showcase how evaluations are conducted for research on emotion-based MIR. Directions of future work are also discussed.Comment: 40 pages, 18 figures, 5 tables, author versio

arXiv.org e-Print Archive

CiteSeerX

Fast speaker independent large vocabulary continuous speech recognition [online]

Author: Woszczyna Monika
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/1998
Field of study

KITopen

Histogram equalization for robust text-independent speaker verification in telephone environments

Author: Skosan Marshalleno
Publication venue: Department of Electrical Engineering
Publication date: 01/01/2005
Field of study

Word processed copy. Includes bibliographical references

Cape Town University OpenUCT