506 research outputs found
Improving Facial Analysis and Performance Driven Animation through Disentangling Identity and Expression
We present techniques for improving performance driven facial animation,
emotion recognition, and facial key-point or landmark prediction using learned
identity invariant representations. Established approaches to these problems
can work well if sufficient examples and labels for a particular identity are
available and factors of variation are highly controlled. However, labeled
examples of facial expressions, emotions and key-points for new individuals are
difficult and costly to obtain. In this paper we improve the ability of
techniques to generalize to new and unseen individuals by explicitly modeling
previously seen variations related to identity and expression. We use a
weakly-supervised approach in which identity labels are used to learn the
different factors of variation linked to identity separately from factors
related to expression. We show how probabilistic modeling of these sources of
variation allows one to learn identity-invariant representations for
expressions which can then be used to identity-normalize various procedures for
facial expression analysis and animation control. We also show how to extend
the widely used techniques of active appearance models and constrained local
models through replacing the underlying point distribution models which are
typically constructed using principal component analysis with
identity-expression factorized representations. We present a wide variety of
experiments in which we consistently improve performance on emotion
recognition, markerless performance-driven facial animation and facial
key-point tracking.Comment: to appear in Image and Vision Computing Journal (IMAVIS
MoSculp: Interactive Visualization of Shape and Time
We present a system that allows users to visualize complex human motion via
3D motion sculptures---a representation that conveys the 3D structure swept by
a human body as it moves through space. Given an input video, our system
computes the motion sculptures and provides a user interface for rendering it
in different styles, including the options to insert the sculpture back into
the original video, render it in a synthetic scene or physically print it.
To provide this end-to-end workflow, we introduce an algorithm that estimates
that human's 3D geometry over time from a set of 2D images and develop a
3D-aware image-based rendering approach that embeds the sculpture back into the
scene. By automating the process, our system takes motion sculpture creation
out of the realm of professional artists, and makes it applicable to a wide
range of existing video material.
By providing viewers with 3D information, motion sculptures reveal space-time
motion information that is difficult to perceive with the naked eye, and allow
viewers to interpret how different parts of the object interact over time. We
validate the effectiveness of this approach with user studies, finding that our
motion sculpture visualizations are significantly more informative about motion
than existing stroboscopic and space-time visualization methods.Comment: UIST 2018. Project page: http://mosculp.csail.mit.edu
A Keypoint Based Enhancement Method for Audio Driven Free View Talking Head Synthesis
Audio driven talking head synthesis is a challenging task that attracts
increasing attention in recent years. Although existing methods based on 2D
landmarks or 3D face models can synthesize accurate lip synchronization and
rhythmic head pose for arbitrary identity, they still have limitations, such as
the cut feeling in the mouth mapping and the lack of skin highlights. The
morphed region is blurry compared to the surrounding face. A Keypoint Based
Enhancement (KPBE) method is proposed for audio driven free view talking head
synthesis to improve the naturalness of the generated video. Firstly, existing
methods were used as the backend to synthesize intermediate results. Then we
used keypoint decomposition to extract video synthesis controlling parameters
from the backend output and the source image. After that, the controlling
parameters were composited to the source keypoints and the driving keypoints. A
motion field based method was used to generate the final image from the
keypoint representation. With keypoint representation, we overcame the cut
feeling in the mouth mapping and the lack of skin highlights. Experiments show
that our proposed enhancement method improved the quality of talking-head
videos in terms of mean opinion score
Implicit Warping for Animation with Image Sets
We present a new implicit warping framework for image animation using sets of
source images through the transfer of the motion of a driving video. A single
cross- modal attention layer is used to find correspondences between the source
images and the driving image, choose the most appropriate features from
different source images, and warp the selected features. This is in contrast to
the existing methods that use explicit flow-based warping, which is designed
for animation using a single source and does not extend well to multiple
sources. The pick-and-choose capability of our framework helps it achieve
state-of-the-art results on multiple datasets for image animation using both
single and multiple source images. The project website is available at
https://deepimagination.cc/implicit warping/Comment: To be published at NeurIPS 202
- …