7,320 research outputs found
Animation of generic 3D Head models driven by speech
International audienceIn this paper, a system for speech-driven animation of generic 3D head models is presented. The system is based on the inversion of a joint Audio-Visual Hidden Markov Model to estimate the visual information from speech data. Estimated visual speech features are used to animate a simple face model. The animation of a more complex head model is then obtained by automatically mapping the deformation of the simple model to it. The proposed algorithm allows the animation of 3D head models of arbitrary complexity through a simple setup procedure. The resulting animation is evaluated in terms of intelligibility of visual speech through subjective tests, showing a promising performance
Capture, Learning, and Synthesis of 3D Speaking Styles
Audio-driven 3D facial animation has been widely explored, but achieving
realistic, human-like performance is still unsolved. This is due to the lack of
available 3D datasets, models, and standard evaluation metrics. To address
this, we introduce a unique 4D face dataset with about 29 minutes of 4D scans
captured at 60 fps and synchronized audio from 12 speakers. We then train a
neural network on our dataset that factors identity from facial motion. The
learned model, VOCA (Voice Operated Character Animation) takes any speech
signal as input - even speech in languages other than English - and
realistically animates a wide range of adult faces. Conditioning on subject
labels during training allows the model to learn a variety of realistic
speaking styles. VOCA also provides animator controls to alter speaking style,
identity-dependent facial shape, and pose (i.e. head, jaw, and eyeball
rotations) during animation. To our knowledge, VOCA is the only realistic 3D
facial animation model that is readily applicable to unseen subjects without
retargeting. This makes VOCA suitable for tasks like in-game video, virtual
reality avatars, or any scenario in which the speaker, speech, or language is
not known in advance. We make the dataset and model available for research
purposes at http://voca.is.tue.mpg.de.Comment: To appear in CVPR 201
Recommended from our members
Speech Enabled Avatar from a Single Photograph
This paper presents a complete framework for creating speech-enabled 2D and 3D avatars from a single image of a person. Our approach uses a generic facial motion model which represents deformations of the prototype face during speech. We have developed an HMM-based facial animation algorithm which takes into account both lexical stress and coarticulation. This algorithm produces realistic animations of the prototype facial surface from either text or speech. The generic facial motion model is transformed to a novel face geometry using a set of corresponding points between the generic mesh and the novel face. In the case of a 2D avatar, a single photograph of the person is used as input. We manually select a small number of features on the photograph and these are used to deform the prototype surface. The deformed surface is then used to animate the photograph. In the case of a 3D avatar, we use a single stereo image of the person as input. The sparse geometry of the face is computed from this image and used to warp the prototype surface to obtain the complete 3D surface of the person's face. This surface is etched into a glass cube using sub-surface laser engraving (SSLE) technology. Synthesized facial animation videos are then projected onto the etched glass cube. Even though the etched surface is static, the projection of facial animation onto it results in a compelling experience for the viewer. We show several examples of 2D and 3D avatars that are driven by text and speech inputs
An efficient virtual patient image model: interview training in pharmacy
This paper presents the development of a virtual patient simulation by a 3D talking head and its use by pharmacy students as a training aid for patient consultation. The paper concentrates on the virtual patient modeling, its synthesis with a speech engine and facial expression interaction. The virtual patient model is developed in three stages: building a personalized 3D face model; animation of the face model; and speech driven face synthesis. The model is used in conjunction with a training artificial intelligence module that creates several scenarios in which the student oral interview ability is assessed. The final evaluation phase is a randomized controlled trial at three partner universities: The University of Newcastle, Monash University and Charles Stuart University. It shows the potential to revolutionize the way pharmacy students’ training is conducted
LaughTalk: Expressive 3D Talking Head Generation with Laughter
Laughter is a unique expression, essential to affirmative social interactions
of humans. Although current 3D talking head generation methods produce
convincing verbal articulations, they often fail to capture the vitality and
subtleties of laughter and smiles despite their importance in social context.
In this paper, we introduce a novel task to generate 3D talking heads capable
of both articulate speech and authentic laughter. Our newly curated dataset
comprises 2D laughing videos paired with pseudo-annotated and human-validated
3D FLAME parameters and vertices. Given our proposed dataset, we present a
strong baseline with a two-stage training scheme: the model first learns to
talk and then acquires the ability to express laughter. Extensive experiments
demonstrate that our method performs favorably compared to existing approaches
in both talking head generation and expressing laughter signals. We further
explore potential applications on top of our proposed method for rigging
realistic avatars.Comment: Accepted to WACV202
EgoFace: Egocentric Face Performance Capture and Videorealistic Reenactment
Face performance capture and reenactment techniques use multiple cameras and sensors, positioned at a distance from the face or mounted on heavy wearable devices. This limits their applications in mobile and outdoor environments. We present EgoFace, a radically new lightweight setup for face performance capture and front-view videorealistic reenactment using a single egocentric RGB camera. Our lightweight setup allows operations in uncontrolled environments, and lends itself to telepresence applications such as video-conferencing from dynamic environments. The input image is projected into a low dimensional latent space of the facial expression parameters. Through careful adversarial training of the parameter-space synthetic rendering, a videorealistic animation is produced. Our problem is challenging as the human visual system is sensitive to the smallest face irregularities that could occur in the final results. This sensitivity is even stronger for video results. Our solution is trained in a pre-processing stage, through a supervised manner without manual annotations. EgoFace captures a wide variety of facial expressions, including mouth movements and asymmetrical expressions. It works under varying illuminations, background, movements, handles people from different ethnicities and can operate in real time
- …