Search CORE

117 research outputs found

Audio-assisted movie dialogue detection

Author: Kotropoulos C.
Kotropoulos C.
Kotti M.
Kotti M.
Maragos P.
Maragos P.
Panagakis Y.
Panagakis Y.
Pitas I.
Pitas I.
Ververidis D.
Ververidis D.
Publication venue: Institute of Electrical and Electronics Engineers (IEEE)
Publication date: 01/01/2008
Field of study

An audio-assisted system is investigated that detects if a movie scene is a dialogue or not. The system is based on actor indicator functions. That is, functions which define if an actor speaks at a certain time instant. In particular, the crosscorrelation and the magnitude of the corresponding the crosspower spectral density of a pair of indicator functions are input to various classifiers, such as voted perceptrons, radial basis function networks, random trees, and support vector machines for dialogue/non-dialogue detection. To boost classifier efficiency AdaBoost is also exploited. The aforementioned classifiers are trained using ground truth indicator functions determined by human annotators for 41 dialogue and another 20 non-dialogue audio instances. For testing, actual indicator functions are derived by applying audio activity detection and actor clustering to audio recordings. 23 instances are randomly chosen among the aforementioned 41 dialogue instances, 17 of which correspond to dialogue scenes and 6 to non-dialogue ones. Accuracy ranging between 0.739 and 0.826 is reported

Middlesex University Research Repository

Audio-assisted movie dialogue detection

Author: Evangelopoulos G
Kotropoulos C
Kotti M
Maragos P
Panagakis I
Pitas I
Ververidis D
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

An audio-assisted system is investigated that detects if a movie scene is a dialogue or not. The system is based on actor indicator functions. That is, functions which define if an actor speaks at a certain time instant. In particular, the cross-correlation and the magnitude of the corresponding the cross-power spectral density of a pair of indicator functions are input to various classifiers, such as voted perceptions, radial basis function networks, random trees, and support vector machines for dialogue/non-dialogue detection. To boost classifier efficiency AdaBoost is also exploited. The aforementioned classifiers are trained using ground truth indicator functions determined by human annotators for 41 dialogue and another 20 non-dialogue audio instances. For testing, actual indicator functions are derived by applying audio activity detection and actor clustering to audio recordings. 23 instances are randomly chosen among the aforementioned 41 dialogue instances, 17 of which correspond to dialogue scenes and 6 to non-dialogue ones. Accuracy ranging between 0.739 and 0.826 is reported. © 2008 IEEE

Crossref

Middlesex University Research Repository

DSpace at NTUA

Spiral - Imperial College Digital Repository

Speech Emotion Recognition Considering Local Dynamic Features

Author: BW Schuller
D Ververidis
KS Rao
M Ayadi El
M Hall
M Wollmer
ME Sánchez-Gutiérrez
P Gangamohan
S Johar
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/03/2018
Field of study

Recently, increasing attention has been directed to the study of the speech emotion recognition, in which global acoustic features of an utterance are mostly used to eliminate the content differences. However, the expression of speech emotion is a dynamic process, which is reflected through dynamic durations, energies, and some other prosodic information when one speaks. In this paper, a novel local dynamic pitch probability distribution feature, which is obtained by drawing the histogram, is proposed to improve the accuracy of speech emotion recognition. Compared with most of the previous works using global features, the proposed method takes advantage of the local dynamic information conveyed by the emotional speech. Several experiments on Berlin Database of Emotional Speech are conducted to verify the effectiveness of the proposed method. The experimental results demonstrate that the local dynamic information obtained with the proposed method is more effective for speech emotion recognition than the traditional global features.Comment: 10 pages, 3 figures, accepted by ISSP 201

arXiv.org e-Print Archive

Crossref

Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema

Author: A. Austermann
B. Schuller
B. Schuller
B. Schuller
B. Schuller
B. Schuller
B. Schuller
B. Yang
C. Busso
C. M. Lee
C. Nass
D. Bitouk
D. J. C. MacKay
D. Ververidis
D. Ververidis
D. Watson
E. Benetos
E. Benetos
E. Fersini
E. I. Konstantinidis
F. Burkhardt
F. Burkhardt
Fabio Paternò
H. Altun
H. Gunes
H. K. Mishra
H. Mixdorff
H. P. Espinosa
I. Guyon
I. Guyon
I. R. Murray
J. D. Markel
J. Hirschberg
J. Pittermann
K. Dai
K. R. Scherer
L. B. Jackson
M. Ayadi El
M. Kotti
M. Kotti
M. M. Sondhi
M. Pantic
M. Pantic
Margarita Kotti
N. Sato
N. Vanello
P. Boersma
P. Ekman
P. Ekman
P. N. Juslin
P. Ruvolo
P. Zervas
R. A. Calvo
R. Cowie
R. Tato
R. W. Picard
S. Chandaka
S. Ntalampiras
T. Iliou
T. L. Pao
T. P. Kostoulas
T. Vogt
W. Bosma
W. Minker
Z. Inanoglu
Z. Zeng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2012
Field of study

In this paper, a psychologically-inspired binary cascade classification schema is proposed for speech emotion recognition. Performance is enhanced because commonly confused pairs of emotions are distinguishable from one another. Extracted features are related to statistics of pitch, formants, and energy contours, as well as spectrum, cepstrum, perceptual and temporal features, autocorrelation, MPEG-7 descriptors, Fujisakis model parameters, voice quality, jitter, and shimmer. Selected features are fed as input to K nearest neighborhood classifier and to support vector machines. Two kernels are tested for the latter: Linear and Gaussian radial basis function. The recently proposed speaker-independent experimental protocol is tested on the Berlin emotional speech database for each gender separately. The best emotion recognition accuracy, achieved by support vector machines with linear kernel, equals 87.7%, outperforming state-of-the-art approaches. Statistical analysis is first carried out with respect to the classifiers error rates and then to evaluate the information expressed by the classifiers confusion matrices. © Springer Science+Business Media, LLC 2011

Crossref

Spiral - Imperial College Digital Repository

Tracking the Expression of Annoyance in Call Centers

Author: B Schuller
BM Ben-David
C Ashwin
C Clavel
CN Anagnostopoulos
D Ververidis
F Ringeval
G Paltoglou
JC Kim
JJG Meilán
JM Girard
K Wang
KM Rump
ME Ayadi
P Baranyi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Machine learning researchers have dealt with the identification of emo- tional cues from speech since it is research domain showing a large number of po- tential applications. Many acoustic parameters have been analyzed when searching for cues to identify emotional categories. Then classical classifiers and also out- standing computational approaches have been developed. Experiments have been carried out mainly over induced emotions, even if recently research is shifting to work over spontaneous emotions. In such a framework, it is worth mentioning that the expression of spontaneous emotions depends on cultural factors, on the particu- lar individual and also on the specific situation. In this work, we were interested in the emotional shifts during conversation. In particular we were aimed to track the annoyance shifts appearing in phone conversations to complaint services. To this end we analyzed a set of audio files showing different ways to express annoyance. The call center operators found disappointment, impotence or anger as expression of annoyance. However, our experiments showed that variations of parameters derived from intensity combined with some spectral information and suprasegmental fea- tures are very robust for each speaker and annoyance rate. The work also discussed the annotation problem arising when dealing with human labelling of subjective events. In this work we proposed an extended rating scale in order to include anno- tators disagreements. Our frame classification results validated the chosen annota- tion procedure. Experimental results also showed that shifts in customer annoyance rates could be potentially tracked during phone callsSpanish Mineco under grant TIN2014- 54288-C4-4-R H2020 EU under Empathic RIA action number 769872

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital para la Docencia y la Investigación

Affective Man-Machine Interface: Unveiling human emotions through biosignals

Author: A. Choi
A. Daly
A. Haag
A.C. Rencher
A.J. Fridlund
A.M. Kring
B. Schölkopf
B.L. Frederickson
C. Liu
C. Liu
C.D. Katsis
C.L. Cooper
C.L. Lisetti
C.M. Bishop
C.M.A.V. Ravenswaaij-Arts
D. Grandjean
D. Ververidis
E. Aarts
E. Leon
E.A. Butler
E.L. Broek Van den
E.L. Broek van den
E.L. Broek van den
E.L. Broek van den
F. Lotte
F. Nasoz
G.F. Solomon
G.G. Berntson
G.H.E. Gendolla
G.N. Yannakakis
H. Gunes
H.D. Critchley
I.B. Mauss
J. Cacioppo
J. Kim
J. Scheirer
J. Whitehill
J. Zhai
J.A. Healey
J.A. Russel
J.A. Russell
J.H.D.M. Westerink
J.L.H. Schuler
J.T. Cacioppo
K.H. Kim
L.F. Barrett
M. Marwitz
M. Minsky
M. Tulder van
M.B.I. Reaz
P. Carrera
P. Grossman
P. Lukowicz
P. Rani
P. Rani
R. Ader
R. Sinha
R.W. Picard
R.W. Picard
S.D. Kreibig
S.H. Fairclough
S.K. Yoo
T.M. Cover
T.M. Mitchell
Task Force
W. Boucsein
W. James
Z. Zeng
Publication venue: Springer Verlag
Publication date: 01/01/2010
Field of study

As is known for centuries, humans exhibit an electrical profile. This profile is altered through various psychological and physiological processes, which can be measured through biosignals; e.g., electromyography (EMG) and electrodermal activity (EDA). These biosignals can reveal our emotions and, as such, can serve as an advanced man-machine interface (MMI) for empathic consumer products. However, such a MMI requires the correct classification of biosignals to emotion classes. This chapter starts with an introduction on biosignals for emotion detection. Next, a state-of-the-art review is presented on automatic emotion classification. Moreover, guidelines are presented for affective MMI. Subsequently, a research is presented that explores the use of EDA and three facial EMG signals to determine neutral, positive, negative, and mixed emotions, using recordings of 21 people. A range of techniques is tested, which resulted in a generic framework for automated emotion classification with up to 61.31% correct classification of the four emotion classes, without the need of personal profiles. Among various other directives for future research, the results emphasize the need for parallel processing of multiple biosignals

Crossref

Repository TU/e

Pure OAI Repository

University of Twente Research Information

Cross validation of bi-modal health-related stress assessment

Author: A Marty
A Tawari
B Arnrich
B Kedem
B Schuller
B Schölkopf
D Morrison
D Ververidis
DA Craig
DF Tolin
DM Hilty
DR Ladd
DW Aha
EB Baum
Egon L. van den Broek
EL Broek van den
EL Broek van den
EN Khalil
F Pallavicini
Frans van der Sluis
IR Murray
J Blascovich
J Krumm
J Sánchez-Meca
J Wolpe
JA Healey
K Domschke
K Nieuwenhuijsen
KR Scherer
LK Hansen
LM Blainlow
M El Ayadi
M Hall
MD Zwaag van der
MG Newman
N Rüscha
N Rüscha
P Rani
PL Bartlett
R Banse
R Cowie
R Likert
RB Fillingim
RC Kessler
RG Lyons
RW Picard
S Wu
T Shimamura
TM Cover
Ton Dijkstra
TR Kosten
Publication venue: Springer Verlag
Publication date: 01/01/2011
Field of study

This study explores the feasibility of objective and ubiquitous stress assessment. 25 post-traumatic stress disorder patients participated in a controlled storytelling (ST) study and an ecologically valid reliving (RL) study. The two studies were meant to represent an early and a late therapy session, and each consisted of a "happy" and a "stress triggering" part. Two instruments were chosen to assess the stress level of the patients at various point in time during therapy: (i) speech, used as an objective and ubiquitous stress indicator and (ii) the subjective unit of distress (SUD), a clinically validated Likert scale. In total, 13 statistical parameters were derived from each of five speech features: amplitude, zero-crossings, power, high-frequency power, and pitch. To model the emotional state of the patients, 28 parameters were selected from this set by means of a linear regression model and, subsequently, compressed into 11 principal components. The SUD and speech model were cross-validated, using 3 machine learning algorithms. Between 90% (2 SUD levels) and 39% (10 SUD levels) correct classification was achieved. The two sessions could be discriminated in 89% (for ST) and 77% (for RL) of the cases. This report fills a gap between laboratory and clinical studies, and its results emphasize the usefulness of Computer Aided Diagnostics (CAD) for mental health care

Crossref

Springer - Publisher Connector

Copenhagen University Research Information System

Radboud Repository

University of Twente Research Information

Music-aided affective interaction between human and service robot

Author: A Gabrielsson
A Makiko
C Bartneck
C Breazeal
CC Pratt
CL Sidner
D Ververidis
DE Rumelhart
E Cowie
EM Schmidt
EM Schmidt
Gil-Jin Jang
HA Rowley
HG Lee
J Han
J Ledoux
Jeong-Sik Park
JS Park
KR Scherer
L Franco
LC De Silva
LM Ignacio
M Paleari
M Richins
N Tosa
O Kwon
P Ahrendt
P Ekman
P Vanroose
R Cowie
R Huang
R Nakatsu
RC Arkin
S Giripunje
T Nwe
T Shibata
X Yang
X Zhu
YE Kim
YH Yang
Yong-Ho Seo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 19/08/2014
Field of study

This study proposes a music-aided framework for affective interaction of service robots with humans. The framework consists of three systems, respectively, for perception, memory, and expression on the basis of the human brain mechanism. We propose a novel approach to identify human emotions in the perception system. The conventional approaches use speech and facial expressions as representative bimodal indicators for emotion recognition. But, our approach uses the mood of music as a supplementary indicator to more correctly determine emotions along with speech and facial expressions. For multimodal emotion recognition, we propose an effective decision criterion using records of bimodal recognition results relevant to the musical mood. The memory and expression systems also utilize musical data to provide natural and affective reactions to human emotions. For evaluation of our approach, we simulated the proposed human-robot interaction with a service robot, iRobiQ. Our perception system exhibited superior performance over the conventional approach, and most human participants noted favorable reactions toward the music-aided affective interaction.open0

Crossref

ScholarWorks@UNIST