19 research outputs found
Toward Fine-grained Facial Expression Manipulation
Facial expression manipulation aims at editing facial expression with a given
condition. Previous methods edit an input image under the guidance of a
discrete emotion label or absolute condition (e.g., facial action units) to
possess the desired expression. However, these methods either suffer from
changing condition-irrelevant regions or are inefficient for fine-grained
editing. In this study, we take these two objectives into consideration and
propose a novel method. First, we replace continuous absolute condition with
relative condition, specifically, relative action units. With relative action
units, the generator learns to only transform regions of interest which are
specified by non-zero-valued relative AUs. Second, our generator is built on
U-Net but strengthened by Multi-Scale Feature Fusion (MSF) mechanism for
high-quality expression editing purposes. Extensive experiments on both
quantitative and qualitative evaluation demonstrate the improvements of our
proposed approach compared to the state-of-the-art expression editing methods.
Code is available at \url{https://github.com/junleen/Expression-manipulator}
Learning Motion Refinement for Unsupervised Face Animation
Unsupervised face animation aims to generate a human face video based on the
appearance of a source image, mimicking the motion from a driving video.
Existing methods typically adopted a prior-based motion model (e.g., the local
affine motion model or the local thin-plate-spline motion model). While it is
able to capture the coarse facial motion, artifacts can often be observed
around the tiny motion in local areas (e.g., lips and eyes), due to the limited
ability of these methods to model the finer facial motions. In this work, we
design a new unsupervised face animation approach to learn simultaneously the
coarse and finer motions. In particular, while exploiting the local affine
motion model to learn the global coarse facial motion, we design a novel motion
refinement module to compensate for the local affine motion model for modeling
finer face motions in local areas. The motion refinement is learned from the
dense correlation between the source and driving images. Specifically, we first
construct a structure correlation volume based on the keypoint features of the
source and driving images. Then, we train a model to generate the tiny facial
motions iteratively from low to high resolution. The learned motion refinements
are combined with the coarse motion to generate the new image. Extensive
experiments on widely used benchmarks demonstrate that our method achieves the
best results among state-of-the-art baselines.Comment: NeurIPS 202
Modeling Caricature Expressions by 3D Blendshape and Dynamic Texture
The problem of deforming an artist-drawn caricature according to a given
normal face expression is of interest in applications such as social media,
animation and entertainment. This paper presents a solution to the problem,
with an emphasis on enhancing the ability to create desired expressions and
meanwhile preserve the identity exaggeration style of the caricature, which
imposes challenges due to the complicated nature of caricatures. The key of our
solution is a novel method to model caricature expression, which extends
traditional 3DMM representation to caricature domain. The method consists of
shape modelling and texture generation for caricatures. Geometric optimization
is developed to create identity-preserving blendshapes for reconstructing
accurate and stable geometric shape, and a conditional generative adversarial
network (cGAN) is designed for generating dynamic textures under target
expressions. The combination of both shape and texture components makes the
non-trivial expressions of a caricature be effectively defined by the extension
of the popular 3DMM representation and a caricature can thus be flexibly
deformed into arbitrary expressions with good results visually in both shape
and color spaces. The experiments demonstrate the effectiveness of the proposed
method.Comment: Accepted by the 28th ACM International Conference on Multimedia (ACM
MM 2020
Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis
Recent works in implicit representations, such as Neural Radiance Fields
(NeRF), have advanced the generation of realistic and animatable head avatars
from video sequences. These implicit methods are still confronted by visual
artifacts and jitters, since the lack of explicit geometric constraints poses a
fundamental challenge in accurately modeling complex facial deformations. In
this paper, we introduce Dynamic Tetrahedra (DynTet), a novel hybrid
representation that encodes explicit dynamic meshes by neural networks to
ensure geometric consistency across various motions and viewpoints. DynTet is
parameterized by the coordinate-based networks which learn signed distance,
deformation, and material texture, anchoring the training data into a
predefined tetrahedra grid. Leveraging Marching Tetrahedra, DynTet efficiently
decodes textured meshes with a consistent topology, enabling fast rendering
through a differentiable rasterizer and supervision via a pixel loss. To
enhance training efficiency, we incorporate classical 3D Morphable Models to
facilitate geometry learning and define a canonical space for simplifying
texture learning. These advantages are readily achievable owing to the
effective geometric representation employed in DynTet. Compared with prior
works, DynTet demonstrates significant improvements in fidelity, lip
synchronization, and real-time performance according to various metrics. Beyond
producing stable and visually appealing synthesis videos, our method also
outputs the dynamic meshes which is promising to enable many emerging
applications.Comment: CVPR 202
LEED: Label-Free Expression Editing via Disentanglement
Recent studies on facial expression editing have obtained very promising
progress. On the other hand, existing methods face the constraint of requiring
a large amount of expression labels which are often expensive and
time-consuming to collect. This paper presents an innovative label-free
expression editing via disentanglement (LEED) framework that is capable of
editing the expression of both frontal and profile facial images without
requiring any expression label. The idea is to disentangle the identity and
expression of a facial image in the expression manifold, where the neutral face
captures the identity attribute and the displacement between the neutral image
and the expressive image captures the expression attribute. Two novel losses
are designed for optimal expression disentanglement and consistent synthesis,
including a mutual expression information loss that aims to extract pure
expression-related features and a siamese loss that aims to enhance the
expression similarity between the synthesized image and the reference image.
Extensive experiments over two public facial expression datasets show that LEED
achieves superior facial expression editing qualitatively and quantitatively.Comment: Accepted to ECCV 202