Search CORE

5,893 research outputs found

DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models

Author: Deng Zhidong
Ma Yifeng
Wang Jiayu
Wang Xiang
Zhang Shiwei
Zhang Yingya
Publication venue
Publication date: 15/12/2023
Field of study

Diffusion models have shown remarkable success in a variety of downstream generative tasks, yet remain under-explored in the important and challenging expressive talking head generation. In this work, we propose a DreamTalk framework to fulfill this gap, which employs meticulous design to unlock the potential of diffusion models in generating expressive talking heads. Specifically, DreamTalk consists of three crucial components: a denoising network, a style-aware lip expert, and a style predictor. The diffusion-based denoising network is able to consistently synthesize high-quality audio-driven face motions across diverse expressions. To enhance the expressiveness and accuracy of lip motions, we introduce a style-aware lip expert that can guide lip-sync while being mindful of the speaking styles. To eliminate the need for expression reference video or text, an extra diffusion-based style predictor is utilized to predict the target expression directly from the audio. By this means, DreamTalk can harness powerful diffusion models to generate expressive faces effectively and reduce the reliance on expensive style references. Experimental results demonstrate that DreamTalk is capable of generating photo-realistic talking faces with diverse speaking styles and achieving accurate lip motions, surpassing existing state-of-the-art counterparts.Comment: Project Page: https://dreamtalk-project.github.i

arXiv.org e-Print Archive

Neural Cognition and Affective Computing on Cyber Language

Author: Duo Xu
Ke Xue
Mirjana Ivanović
Shuang Huang
Xiqiong Wan
Xuan Zhou
Xueer Yu
Zhenyi Yang
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2015
Field of study

Characterized by its customary symbol system and simple and vivid expression patterns, cyber language acts as not only a tool for convenient communication but also a carrier of abundant emotions and causes high attention in public opinion analysis, internet marketing, service feedback monitoring, and social emergency management. Based on our multidisciplinary research, this paper presents a classification of the emotional symbols in cyber language, analyzes the cognitive characteristics of different symbols, and puts forward a mechanism model to show the dominant neural activities in that process. Through the comparative study of Chinese, English, and Spanish, which are used by the largest population in the world, this paper discusses the expressive patterns of emotions in international cyber languages and proposes an intelligent method for affective computing on cyber language in a unified PAD (Pleasure-Arousal-Dominance) emotional space

Crossref

Directory of Open Access Journals

Continuous Analysis of Affect from Voice and Face

Author: Gunes Hatice
Nicolaou Mihalis A.
Pantic Maja
Publication venue: Springer
Publication date: 01/01/2011
Field of study

Human affective behavior is multimodal, continuous and complex. Despite major advances within the affective computing research field, modeling, analyzing, interpreting and responding to human affective behavior still remains a challenge for automated systems as affect and emotions are complex constructs, with fuzzy boundaries and with substantial individual differences in expression and experience [7]. Therefore, affective and behavioral computing researchers have recently invested increased effort in exploring how to best model, analyze and interpret the subtlety, complexity and continuity (represented along a continuum e.g., from −1 to +1) of affective behavior in terms of latent dimensions (e.g., arousal, power and valence) and appraisals, rather than in terms of a small number of discrete emotion categories (e.g., happiness and sadness). This chapter aims to (i) give a brief overview of the existing efforts and the major accomplishments in modeling and analysis of emotional expressions in dimensional and continuous space while focusing on open issues and new challenges in the field, and (ii) introduce a representative approach for multimodal continuous analysis of affect from voice and face, and provide experimental results using the audiovisual Sensitive Artificial Listener (SAL) Database of natural interactions. The chapter concludes by posing a number of questions that highlight the significant issues in the field, and by extracting potential answers to these questions from the relevant literature. The chapter is organized as follows. Section 10.2 describes theories of emotion, Sect. 10.3 provides details on the affect dimensions employed in the literature as well as how emotions are perceived from visual, audio and physiological modalities. Section 10.4 summarizes how current technology has been developed, in terms of data acquisition and annotation, and automatic analysis of affect in continuous space by bringing forth a number of issues that need to be taken into account when applying a dimensional approach to emotion recognition, namely, determining the duration of emotions for automatic analysis, modeling the intensity of emotions, determining the baseline, dealing with high inter-subject expression variation, defining optimal strategies for fusion of multiple cues and modalities, and identifying appropriate machine learning techniques and evaluation measures. Section 10.5 presents our representative system that fuses vocal and facial expression cues for dimensional and continuous prediction of emotions in valence and arousal space by employing the bidirectional Long Short-Term Memory neural networks (BLSTM-NN), and introduces an output-associative fusion framework that incorporates correlations between the emotion dimensions to further improve continuous affect prediction. Section 10.6 concludes the chapter

University of Twente Research Information

A Methodology for the Extraction of Reader\u27s Emotional State Triggered from Text Typography

Author: Dimitrios Tsonos
Georgios Kouroupetroglou
Publication venue: 'IntechOpen'
Publication date: 01/08/2008
Field of study

IntechOpen

Crossref

eMuu : an embodied emotional character for the ambient intelligent home

Author: Bartneck C.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2002
Field of study

Repository TU/e

Pure OAI Repository

Spanish Expressive Voices: corpus for emotion research in Spanish

Author: Barra Chicote Roberto
Córdoba Herralde Ricardo de
D'haro Enríquez Luis Fernando
Fernández Martínez Fernando
Ferreiros López Javier
Lucas Cuesta Juan Manuel
Lutfi Syaheerah L.
Macías Guarasa Javier
Montero Martínez Juan Manuel
Pardo Muñoz José Manuel
San Segundo Hernández Rubén
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/05/2008
Field of study

A new emotional multimedia database has been recorded and aligned. The database comprises speech and video recordings of one actor and one actress simulating a neutral state and the Big Six emotions: happiness, sadness, anger, surprise, fear and disgust. Due to a careful design and its size (more than 100 minutes per emotion), the recorded database allows comprehensive studies on emotional speech synthesis, prosodic modelling, speech conversion, far-field speech recognition and speech and video-based emotion identification. The database has been automatically labelled for prosodic purposes (5% was manually revised). The whole database has been validated thorough objective and perceptual tests, achieving a validation score as high as 89%

Archivo Digital UPM

Exploiting the robot kinematic redundancy for emotion conveyance to humans as a lower priority task

Author: Basañez Villaluenga Luis
Claret Robert Josep Arnau
Venture Gentiane
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Current approaches do not allow robots to execute a task and simultaneously convey emotions to users using their body motions. This paper explores the capabilities of the Jacobian null space of a humanoid robot to convey emotions. A task priority formulation has been implemented in a Pepper robot which allows the specification of a primary task (waving gesture, transportation of an object, etc.) and exploits the kinematic redundancy of the robot to convey emotions to humans as a lower priority task. The emotions, defined by Mehrabian as points in the pleasure–arousal–dominance space, generate intermediate motion features (jerkiness, activity and gaze) that carry the emotional information. A map from this features to the joints of the robot is presented. A user study has been conducted in which emotional motions have been shown to 30 participants. The results show that happiness and sadness are very well conveyed to the user, calm is moderately well conveyed, and fear is not well conveyed. An analysis on the dependencies between the motion features and the emotions perceived by the participants shows that activity correlates positively with arousal, jerkiness is not perceived by the user, and gaze conveys dominance when activity is low. The results indicate a strong influence of the most energetic motions of the emotional task and point out new directions for further research. Overall, the results show that the null space approach can be regarded as a promising mean to convey emotions as a lower priority task.Postprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC