Search CORE

3,164 research outputs found

LaughTalk: Expressive 3D Talking Head Generation with Laughter

Author: Hong Da Hye
Hyun Lee
Ju Janghoon
Nam Suekyeong
Oh Tae-Hyun
Sung-Bin Kim
Publication venue
Publication date: 02/11/2023
Field of study

Laughter is a unique expression, essential to affirmative social interactions of humans. Although current 3D talking head generation methods produce convincing verbal articulations, they often fail to capture the vitality and subtleties of laughter and smiles despite their importance in social context. In this paper, we introduce a novel task to generate 3D talking heads capable of both articulate speech and authentic laughter. Our newly curated dataset comprises 2D laughing videos paired with pseudo-annotated and human-validated 3D FLAME parameters and vertices. Given our proposed dataset, we present a strong baseline with a two-stage training scheme: the model first learns to talk and then acquires the ability to express laughter. Extensive experiments demonstrate that our method performs favorably compared to existing approaches in both talking head generation and expressing laughter signals. We further explore potential applications on top of our proposed method for rigging realistic avatars.Comment: Accepted to WACV202

arXiv.org e-Print Archive

HeadOn: Real-time Reenactment of Human Portrait Videos

Author: Nießner Matthias
Stamminger Marc
Theobalt Christian
Thies Justus
Zollhöfer Michael
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

We propose HeadOn, the first real-time source-to-target reenactment approach for complete human portrait videos that enables transfer of torso and head motion, face expression, and eye gaze. Given a short RGB-D video of the target actor, we automatically construct a personalized geometry proxy that embeds a parametric head, eye, and kinematic torso model. A novel real-time reenactment algorithm employs this proxy to photo-realistically map the captured motion from the source actor to the target actor. On top of the coarse geometric proxy, we propose a video-based rendering technique that composites the modified target portrait video via view- and pose-dependent texturing, and creates photo-realistic imagery of the target actor under novel torso and head poses, facial expressions, and gaze directions. To this end, we propose a robust tracking of the face and torso of the source actor. We extensively evaluate our approach and show significant improvements in enabling much greater flexibility in creating realistic reenacted output videos.Comment: Video: https://www.youtube.com/watch?v=7Dg49wv2c_g Presented at Siggraph'1

arXiv.org e-Print Archive

MPG.PuRe

Recommended from our members

Highly automated method for facial expression synthesis

Author: Ersotelos Nikolaos
Publication venue: Brunel University, School of Information Systems, Computing and Mathematics
Publication date: 01/01/2010
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.The synthesis of realistic facial expressions has been an unexplored area for computer graphics scientists. Over the last three decades, several different construction methods have been formulated in order to obtain natural graphic results. Despite these advancements, though, current techniques still require costly resources, heavy user intervention and specific training and outcomes are still not completely realistic. This thesis, therefore, aims to achieve an automated synthesis that will produce realistic facial expressions at a low cost. This thesis, proposes a highly automated approach for achieving a realistic facial expression synthesis, which allows for enhanced performance in speed (3 minutes processing time maximum) and quality with a minimum of user intervention. It will also demonstrate a highly technical and automated method of facial feature detection, by allowing users to obtain their desired facial expression synthesis with minimal physical input. Moreover, it will describe a novel approach to the normalization of the illumination settings values between source and target images, thereby allowing the algorithm to work accurately, even in different lighting conditions. Finally, we will present the results obtained from the proposed techniques, together with our conclusions, at the end of the paper

Brunel University Research Archive

Audio-driven Robot Upper-body Motion Synthesis

Author: Bremner Paul
Celiktutan Oya
Gunes Hatice
Ondras Jan
Publication venue
Publication date: 10/02/2020
Field of study

Body language is an important aspect of human communication, which an effective human-robot interaction interface should mimic well. The currently available robotic platforms are limited in their ability to automatically generate behaviours that align with their speech. In this paper, we developed a neural network based system that takes audio from a user as an input and generates upper-body gestures including head, hand and hip movements of the user on a humanoid robot, namely, Softbank Robotics’ Pepper. The developed system was evaluated quantitatively as well as qualitatively using web-surveys when driven by natural speech and synthetic speech. We particularly compared the impact of generic and person-specific neural network models on the quality of synthesised movements. We further investigated the relationships between quantitative and qualitative evaluations and examined how the speaker’s personality traits affect the synthesised movements

UWE Bristol Research Repository

King's Research Portal

Apollo (Cambridge)

Generation of realistic human behaviour

Author: Vougioukas Konstantinos
Publication venue: Computing, Imperial College London
Publication date: 01/08/2022
Field of study

As the use of computers and robots in our everyday lives increases so does the need for better interaction with these devices. Human-computer interaction relies on the ability to understand and generate human behavioural signals such as speech, facial expressions and motion. This thesis deals with the synthesis and evaluation of such signals, focusing not only on their intelligibility but also on their realism. Since these signals are often correlated, it is common for methods to drive the generation of one signal using another. The thesis begins by tackling the problem of speech-driven facial animation and proposing models capable of producing realistic animations from a single image and an audio clip. The goal of these models is to produce a video of a target person, whose lips move in accordance with the driving audio. Particular focus is also placed on a) generating spontaneous expression such as blinks, b) achieving audio-visual synchrony and c) transferring or producing natural head motion. The second problem addressed in this thesis is that of video-driven speech reconstruction, which aims at converting a silent video into waveforms containing speech. The method proposed for solving this problem is capable of generating intelligible and accurate speech for both seen and unseen speakers. The spoken content is correctly captured thanks to a perceptual loss, which uses features from pre-trained speech-driven animation models. The ability of the video-to-speech model to run in real-time allows its use in hearing assistive devices and telecommunications. The final work proposed in this thesis is a generic domain translation system, that can be used for any translation problem including those mapping across different modalities. The framework is made up of two networks performing translations in opposite directions and can be successfully applied to solve diverse sets of translation problems, including speech-driven animation and video-driven speech reconstruction.Open Acces

Spiral - Imperial College Digital Repository

Intelligent facial animation: Creating emphatic characters with stimuli based animation

Author: José Mário Figueiredo Serra
Publication venue
Publication date: 11/12/2017
Field of study

Repositório Aberto da Universidade do Porto