Search CORE

15,704 research outputs found

Cultural dialects of real and synthetic emotional facial expressions

Author: Ruttkay Zsófia
Publication venue: Springer
Publication date: 01/01/2009
Field of study

In this article we discuss the aspects of designing facial expressions for virtual humans (VHs) with a specific culture. First we explore the notion of cultures and its relevance for applications with a VH. Then we give a general scheme of designing emotional facial expressions, and identify the stages where a human is involved, either as a real person with some specific role, or as a VH displaying facial expressions. We discuss how the display and the emotional meaning of facial expressions may be measured in objective ways, and how the culture of displayers and the judges may influence the process of analyzing human facial expressions and evaluating synthesized ones. We review psychological experiments on cross-cultural perception of emotional facial expressions. By identifying the culturally critical issues of data collection and interpretation with both real and VHs, we aim at providing a methodological reference and inspiration for further research

Springer - Publisher Connector

University of Twente Research Information

Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema

Author: A. Austermann
B. Schuller
B. Schuller
B. Schuller
B. Schuller
B. Schuller
B. Schuller
B. Yang
C. Busso
C. M. Lee
C. Nass
D. Bitouk
D. J. C. MacKay
D. Ververidis
D. Ververidis
D. Watson
E. Benetos
E. Benetos
E. Fersini
E. I. Konstantinidis
F. Burkhardt
F. Burkhardt
Fabio Paternò
H. Altun
H. Gunes
H. K. Mishra
H. Mixdorff
H. P. Espinosa
I. Guyon
I. Guyon
I. R. Murray
J. D. Markel
J. Hirschberg
J. Pittermann
K. Dai
K. R. Scherer
L. B. Jackson
M. Ayadi El
M. Kotti
M. Kotti
M. M. Sondhi
M. Pantic
M. Pantic
Margarita Kotti
N. Sato
N. Vanello
P. Boersma
P. Ekman
P. Ekman
P. N. Juslin
P. Ruvolo
P. Zervas
R. A. Calvo
R. Cowie
R. Tato
R. W. Picard
S. Chandaka
S. Ntalampiras
T. Iliou
T. L. Pao
T. P. Kostoulas
T. Vogt
W. Bosma
W. Minker
Z. Inanoglu
Z. Zeng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2012
Field of study

In this paper, a psychologically-inspired binary cascade classification schema is proposed for speech emotion recognition. Performance is enhanced because commonly confused pairs of emotions are distinguishable from one another. Extracted features are related to statistics of pitch, formants, and energy contours, as well as spectrum, cepstrum, perceptual and temporal features, autocorrelation, MPEG-7 descriptors, Fujisakis model parameters, voice quality, jitter, and shimmer. Selected features are fed as input to K nearest neighborhood classifier and to support vector machines. Two kernels are tested for the latter: Linear and Gaussian radial basis function. The recently proposed speaker-independent experimental protocol is tested on the Berlin emotional speech database for each gender separately. The best emotion recognition accuracy, achieved by support vector machines with linear kernel, equals 87.7%, outperforming state-of-the-art approaches. Statistical analysis is first carried out with respect to the classifiers error rates and then to evaluate the information expressed by the classifiers confusion matrices. © Springer Science+Business Media, LLC 2011

Crossref

Spiral - Imperial College Digital Repository

Evaluation of a transplantation algorithm for expressive speech synthesis

Author: Barra Chicote Roberto
Lorenzo Trueba Jaime
Montero Martínez Juan Manuel
Watts Oliver
Yamagishi J.
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2013
Field of study

When designing human-machine interfaces it is important to consider not only the bare bones functionality but also the ease of use and accessibility it provides. When talking about voice-based inter- faces, it has been proven that imbuing expressiveness into the synthetic voices increases signi?cantly its perceived naturalness, which in the end is very helpful when building user friendly interfaces. This paper proposes an adaptation based expressiveness transplantation system capable of copying the emotions of a source speaker into any desired target speaker with just a few minutes of read speech and without requiring the record- ing of additional expressive data. This system was evaluated through a perceptual test for 3 speakers showing up to an average of 52% emotion recognition rates relative to the natural voice recognition rates, while at the same time keeping good scores in similarity and naturality

Archivo Digital UPM

Expression of basic emotions in Estonian parametric text-to-speech synthesis

Author: Mihkla Meelis
Tamuri Kairi
Publication venue: 'University of Tartu'
Publication date: 31/12/2015
Field of study

The goal of this study was to conduct modelling experiments, the purpose of which was the expression of three basic emotions (joy, sadness and anger) in Estonian parametric text-to-speech synthesis on the basis of both a male and a female voice. For each emotion, three different test models were constructed and presented for evaluation to subjects in perception tests. The test models were based on the basic emotions’ characteristic parameter values that had been determined on the basis of human speech. In synthetic speech, the test subjects most accurately recognized the emotion of sadness, and least accurately the emotion of joy. The results of the test showed that, in the case of the synthesized male voice, the model with enhanced parameter values performed best for all three emotions, whereas in the case of the synthetic female voice, different emotions called for different models: the model with decreased values was the most suitable one for the expression of joy, and the model with enhanced values was the most suitable for the expression of sadness and anger. Logistic regression was applied to the results of the perception tests in order to determine the significance and contribution of each acoustic parameter in the emotion models, and the possible need to adjust the values of the parameters.Kokkuvõte. Kairi Tamuri ja Meelis Mihkla: Põhiemotsioonide väljendusvõimalused eestikeelsel parameetrilisel kõnesünteesil. Uurimistöö eesmärk oli läbi viia modelleerimiseksperimente kolme põhiemotsiooni (rõõmu, kurbuse ja viha) väljendamiseks eestikeelsel parameetrilisel kõnesünteesil nii mees- kui ka naissünteeshääle baasil. Selleks koostati iga emotsiooni kohta kolm erinevat katsemudelit, mida lasti katseisikutel tajutestidel hinnata. Katsemudelite aluseks oli inimkõne põhjal määratud põhiemotsioonidele omased parameetrite väärtused. Emotsioonidest tunti sünteeskõnes kõige paremini ära kurbuse-emotsioon ning kõige halvemini rõõmu-emotsioon. Testitulemused näitasid, et kui meessünteeshääle puhul töötas kõigi kolme emotsiooni puhul kõige paremini võimendatud väärtuste mudel, siis naissünteeshääle puhul vajasid erinevad emotsioonid erinevaid mudeleid: rõõmu väljendamiseks sobis kõige paremini vähendatud väärtuste mudel, kurbuse ja viha väljendamiseks võimendatud väärtuste mudel. Tajutestide tulemusi analüüsiti logistilisel regressioonil, et teha kindlaks üksikute akustiliste parameetrite olulisus ja osakaal emotsiooni mudelites ning parameetrite väärtuste korrigeerimisvajadused.Märksõnad: eesti keel, emotsioonid, kõnesüntees, akustiline mudel, kõnetempo, intensiivsus, põhitoo

Journals from University of Tartu

Crossref

A virtual diary companion

Author: Maanen Peter-Paul van
Meijerink Ferdi
Nijholt Anton
Publication venue: University of Sheffield
Publication date: 01/01/2008
Field of study

Chatbots and embodied conversational agents show turn based conversation behaviour. In current research we almost always assume that each utterance of a human conversational partner should be followed by an intelligent and/or empathetic reaction of chatbot or embodied agent. They are assumed to be alert, trying to please the user. There are other applications which have not yet received much attention and which require a more patient or relaxed attitude, waiting for the right moment to provide feedback to the human partner. Being able and willing to listen is one of the conditions for being successful. In this paper we have some observations on listening behaviour research and introduce one of our applications, the virtual diary companion

VU Research Portal

University of Twente Research Information

Expressive characters and a text chat interface

Author: Ballin D
Crabtree IB
Gillies M
Publication venue
Publication date: 01/01/2004
Field of study

UCL Discovery

Recommended from our members

Speaker and Expression Factorization for Audiobook Data: Expressiveness and Transplantation

Author: Braunschweiler N
Chen L
Gales MJF
Publication venue: IEEE Transactions on Audio, Speech and Language Processing
Publication date: 01/01/2015
Field of study

Expressive synthesis from text is a challenging problem. There are two issues. First, read text is often highly expressive to convey the emotion and scenario in the text. Second, since the expressive training speech is not always available for different speakers, it is necessary to develop methods to share the expressive information over speakers. This paper investigates the approach of using very expressive, highly diverse audiobook data from multiple speakers to build an expressive speech synthesis system. Both of two problems are addressed by considering a factorized framework where speaker and emotion are modelled in separate sub-spaces of a cluster adaptive training (CAT) parametric speech synthesis system. The sub-spaces for the expressive state of a speaker and the characteristics of the speaker are jointly trained using a set of audiobooks. In this work, the expressive speech synthesis system works in two distinct modes. In the first mode, the expressive information is given by audio data and the adaptation method is used to extract the expressive information in the audio data. In the second mode, the input of the synthesis system is plain text and a full expressive synthesis system is examined where the expressive state is predicted from the text. In both modes, the expressive information is shared and transplanted over different speakers. Experimental results show that in both modes, the expressive speech synthesis method proposed in this work significantly improves the expressiveness of the synthetic speech for different speakers. Finally, this paper also examines whether it is possible to predict the expressive states from text for multiple speakers using a single model, or whether the prediction process needs to be speaker specific.This is the accepted manuscript. The final version is available from IEEE at http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6995936&filter%3DAND%28p_IS_Number%3A7055953%29

Apollo (Cambridge)