1,276 research outputs found
HeadOn: Real-time Reenactment of Human Portrait Videos
We propose HeadOn, the first real-time source-to-target reenactment approach
for complete human portrait videos that enables transfer of torso and head
motion, face expression, and eye gaze. Given a short RGB-D video of the target
actor, we automatically construct a personalized geometry proxy that embeds a
parametric head, eye, and kinematic torso model. A novel real-time reenactment
algorithm employs this proxy to photo-realistically map the captured motion
from the source actor to the target actor. On top of the coarse geometric
proxy, we propose a video-based rendering technique that composites the
modified target portrait video via view- and pose-dependent texturing, and
creates photo-realistic imagery of the target actor under novel torso and head
poses, facial expressions, and gaze directions. To this end, we propose a
robust tracking of the face and torso of the source actor. We extensively
evaluate our approach and show significant improvements in enabling much
greater flexibility in creating realistic reenacted output videos.Comment: Video: https://www.youtube.com/watch?v=7Dg49wv2c_g Presented at
Siggraph'1
Capture, Learning, and Synthesis of 3D Speaking Styles
Audio-driven 3D facial animation has been widely explored, but achieving
realistic, human-like performance is still unsolved. This is due to the lack of
available 3D datasets, models, and standard evaluation metrics. To address
this, we introduce a unique 4D face dataset with about 29 minutes of 4D scans
captured at 60 fps and synchronized audio from 12 speakers. We then train a
neural network on our dataset that factors identity from facial motion. The
learned model, VOCA (Voice Operated Character Animation) takes any speech
signal as input - even speech in languages other than English - and
realistically animates a wide range of adult faces. Conditioning on subject
labels during training allows the model to learn a variety of realistic
speaking styles. VOCA also provides animator controls to alter speaking style,
identity-dependent facial shape, and pose (i.e. head, jaw, and eyeball
rotations) during animation. To our knowledge, VOCA is the only realistic 3D
facial animation model that is readily applicable to unseen subjects without
retargeting. This makes VOCA suitable for tasks like in-game video, virtual
reality avatars, or any scenario in which the speaker, speech, or language is
not known in advance. We make the dataset and model available for research
purposes at http://voca.is.tue.mpg.de.Comment: To appear in CVPR 201
VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
We present VideoReTalking, a new system to edit the faces of a real-world
talking head video according to input audio, producing a high-quality and
lip-syncing output video even with a different emotion. Our system disentangles
this objective into three sequential tasks: (1) face video generation with a
canonical expression; (2) audio-driven lip-sync; and (3) face enhancement for
improving photo-realism. Given a talking-head video, we first modify the
expression of each frame according to the same expression template using the
expression editing network, resulting in a video with the canonical expression.
This video, together with the given audio, is then fed into the lip-sync
network to generate a lip-syncing video. Finally, we improve the photo-realism
of the synthesized faces through an identity-aware face enhancement network and
post-processing. We use learning-based approaches for all three steps and all
our modules can be tackled in a sequential pipeline without any user
intervention. Furthermore, our system is a generic approach that does not need
to be retrained to a specific person. Evaluations on two widely-used datasets
and in-the-wild examples demonstrate the superiority of our framework over
other state-of-the-art methods in terms of lip-sync accuracy and visual
quality.Comment: Accepted by SIGGRAPH Asia 2022 Conference Proceedings. Project page:
https://vinthony.github.io/video-retalking
Head motion synthesis: evaluation and a template motion approach
The use of conversational agents has increased across the world. From providing automated
support for companies to being virtual psychologists they have moved from
an academic curiosity to an application with real world relevance. While many researchers
have focused on the content of the dialogue and synthetic speech to give the
agents a voice, more recently animating these characters has become a topic of interest.
An additional use for character animation technology is in the film and video game industry
where having characters animated without needing to pay for expensive labour
would save tremendous costs.
When animating characters there are many aspects to consider, for example the way
they walk. However, to truly assist with communication automated animation needs to
duplicate the body language used when speaking. In particular conversational agents
are often only an animation of the upper parts of the body, so head motion is one of
the keys to a believable agent. While certain linguistic features are obvious, such as
nodding to indicate agreement, research has shown that head motion also aids understanding
of speech. Additionally head motion often contains emotional cues, prosodic
information, and other paralinguistic information.
In this thesis we will present our research into synthesising head motion using only
recorded speech as input. During this research we collected a large dataset of head
motion synchronised with speech, examined evaluation methodology, and developed a
synthesis system.
Our dataset is one of the larger ones available. From it we present some statistics
about head motion in general. Including differences between read speech and story
telling speech, and differences between speakers. From this we are able to draw some
conclusions as to what type of source data will be the most interesting in head motion
research, and if speaker-dependent models are needed for synthesis.
In our examination of head motion evaluation methodology we introduce Forced Canonical
Correlation Analysis (FCCA). FCCA shows the difference between head motion
shaped noise and motion capture better than standard methods for objective evaluation
used in the literature. We have shown that for subjective testing it is best practice to
use a variation of MUltiple Stimuli with Hidden Reference and Anchor (MUSHRA)
based testing, adapted for head motion. Through experimentation we have developed
guidelines for the implementation of the test, and the constraints on the length.
Finally we present a new system for head motion synthesis. We make use of simple
templates of motion, automatically extracted from source data, that are warped to
suit the speech features. Our system uses clustering to pick the small motion units,
and a combined HMM and GMM based approach for determining the values of warping
parameters at synthesis time. This results in highly natural looking motion that
outperforms other state of the art systems. Our system requires minimal human intervention
and produces believable motion. The key innovates were the new methods
for segmenting head motion and creating a process similar to language modelling for
synthesising head motion
3D Human Face Reconstruction and 2D Appearance Synthesis
3D human face reconstruction has been an extensive research for decades due to its wide applications, such as animation, recognition and 3D-driven appearance synthesis. Although commodity depth sensors are widely available in recent years, image based face reconstruction are significantly valuable as images are much easier to access and store.
In this dissertation, we first propose three image-based face reconstruction approaches according to different assumption of inputs.
In the first approach, face geometry is extracted from multiple key frames of a video sequence with different head poses. The camera should be calibrated under this assumption.
As the first approach is limited to videos, we propose the second approach then focus on single image. This approach also improves the geometry by adding fine grains using shading cue. We proposed a novel albedo estimation and linear optimization algorithm in this approach.
In the third approach, we further loose the constraint of the input image to arbitrary in the wild images. Our proposed approach can robustly reconstruct high quality model even with extreme expressions and large poses.
We then explore the applicability of our face reconstructions on four interesting applications: video face beautification, generating personalized facial blendshape from image sequences, face video stylizing and video face replacement. We demonstrate great potentials of our reconstruction approaches on these real-world applications. In particular, with the recent surge of interests in VR/AR, it is increasingly common to see people wearing head-mounted displays. However, the large occlusion on face is a big obstacle for people to communicate in a face-to-face manner. Our another application is that we explore hardware/software solutions for synthesizing the face image with presence of HMDs. We design two setups (experimental and mobile) which integrate two near IR cameras and one color camera to solve this problem. With our algorithm and prototype, we can achieve photo-realistic results.
We further propose a deep neutral network to solve the HMD removal problem considering it as a face inpainting problem. This approach doesn\u27t need special hardware and run in real-time with satisfying results
Final Report to NSF of the Standards for Facial Animation Workshop
The human face is an important and complex communication channel. It is a very familiar and sensitive object of human perception. The facial animation field has increased greatly in the past few years as fast computer graphics workstations have made the modeling and real-time animation of hundreds of thousands of polygons affordable and almost commonplace. Many applications have been developed such as teleconferencing, surgery, information assistance systems, games, and entertainment. To solve these different problems, different approaches for both animation control and modeling have been developed
- …