310 research outputs found
Capture, Learning, and Synthesis of 3D Speaking Styles
Audio-driven 3D facial animation has been widely explored, but achieving
realistic, human-like performance is still unsolved. This is due to the lack of
available 3D datasets, models, and standard evaluation metrics. To address
this, we introduce a unique 4D face dataset with about 29 minutes of 4D scans
captured at 60 fps and synchronized audio from 12 speakers. We then train a
neural network on our dataset that factors identity from facial motion. The
learned model, VOCA (Voice Operated Character Animation) takes any speech
signal as input - even speech in languages other than English - and
realistically animates a wide range of adult faces. Conditioning on subject
labels during training allows the model to learn a variety of realistic
speaking styles. VOCA also provides animator controls to alter speaking style,
identity-dependent facial shape, and pose (i.e. head, jaw, and eyeball
rotations) during animation. To our knowledge, VOCA is the only realistic 3D
facial animation model that is readily applicable to unseen subjects without
retargeting. This makes VOCA suitable for tasks like in-game video, virtual
reality avatars, or any scenario in which the speaker, speech, or language is
not known in advance. We make the dataset and model available for research
purposes at http://voca.is.tue.mpg.de.Comment: To appear in CVPR 201
Neural Sign Reenactor: Deep Photorealistic Sign Language Retargeting
In this paper, we introduce a neural rendering pipeline for transferring the
facial expressions, head pose, and body movements of one person in a source
video to another in a target video. We apply our method to the challenging case
of Sign Language videos: given a source video of a sign language user, we can
faithfully transfer the performed manual (e.g., handshape, palm orientation,
movement, location) and non-manual (e.g., eye gaze, facial expressions, mouth
patterns, head, and body movements) signs to a target video in a
photo-realistic manner. Our method can be used for Sign Language Anonymization,
Sign Language Production (synthesis module), as well as for reenacting other
types of full body activities (dancing, acting performance, exercising, etc.).
We conduct detailed qualitative and quantitative evaluations and comparisons,
which demonstrate the particularly promising and realistic results that we
obtain and the advantages of our method over existing approaches.Comment: Accepted at AI4CC Workshop at CVPR 202
- …