Search CORE

19 research outputs found

Toward Fine-grained Facial Expression Manipulation

Author: A Mollahosseini
A Pumarola
G Zhang
J Johnson
X Wang
Z He
Z Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/12/2020
Field of study

Facial expression manipulation aims at editing facial expression with a given condition. Previous methods edit an input image under the guidance of a discrete emotion label or absolute condition (e.g., facial action units) to possess the desired expression. However, these methods either suffer from changing condition-irrelevant regions or are inefficient for fine-grained editing. In this study, we take these two objectives into consideration and propose a novel method. First, we replace continuous absolute condition with relative condition, specifically, relative action units. With relative action units, the generator learns to only transform regions of interest which are specified by non-zero-valued relative AUs. Second, our generator is built on U-Net but strengthened by Multi-Scale Feature Fusion (MSF) mechanism for high-quality expression editing purposes. Extensive experiments on both quantitative and qualitative evaluation demonstrate the improvements of our proposed approach compared to the state-of-the-art expression editing methods. Code is available at \url{https://github.com/junleen/Expression-manipulator}

arXiv.org e-Print Archive

Crossref

Learning Motion Refinement for Unsupervised Face Animation

Author: Duan Lixin
Gu Shuhang
Li Wen
Tao Jiale
Publication venue
Publication date: 21/10/2023
Field of study

Unsupervised face animation aims to generate a human face video based on the appearance of a source image, mimicking the motion from a driving video. Existing methods typically adopted a prior-based motion model (e.g., the local affine motion model or the local thin-plate-spline motion model). While it is able to capture the coarse facial motion, artifacts can often be observed around the tiny motion in local areas (e.g., lips and eyes), due to the limited ability of these methods to model the finer facial motions. In this work, we design a new unsupervised face animation approach to learn simultaneously the coarse and finer motions. In particular, while exploiting the local affine motion model to learn the global coarse facial motion, we design a novel motion refinement module to compensate for the local affine motion model for modeling finer face motions in local areas. The motion refinement is learned from the dense correlation between the source and driving images. Specifically, we first construct a structure correlation volume based on the keypoint features of the source and driving images. Then, we train a model to generate the tiny facial motions iteratively from low to high resolution. The learned motion refinements are combined with the coarse motion to generate the new image. Extensive experiments on widely used benchmarks demonstrate that our method achieves the best results among state-of-the-art baselines.Comment: NeurIPS 202

arXiv.org e-Print Archive

Modeling Caricature Expressions by 3D Blendshape and Dynamic Texture

Author: Arjovsky Martín
Chiang Wen-Hung Liao Pei-Ying
Goodfellow Ian J.
Guenter Brian
Huang Xun
Huo Jing
Isola Phillip
Lewis John P.
Sadimon Suriati Bte
Shi Yichun
Zhu Jun-Yan
Zhu Xiangyu
Çetinaslan C.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 13/08/2020
Field of study

The problem of deforming an artist-drawn caricature according to a given normal face expression is of interest in applications such as social media, animation and entertainment. This paper presents a solution to the problem, with an emphasis on enhancing the ability to create desired expressions and meanwhile preserve the identity exaggeration style of the caricature, which imposes challenges due to the complicated nature of caricatures. The key of our solution is a novel method to model caricature expression, which extends traditional 3DMM representation to caricature domain. The method consists of shape modelling and texture generation for caricatures. Geometric optimization is developed to create identity-preserving blendshapes for reconstructing accurate and stable geometric shape, and a conditional generative adversarial network (cGAN) is designed for generating dynamic textures under target expressions. The combination of both shape and texture components makes the non-trivial expressions of a caricature be effectively defined by the extension of the popular 3DMM representation and a caricature can thus be flexibly deformed into arbitrary expressions with good results visually in both shape and color spaces. The experiments demonstrate the effectiveness of the proposed method.Comment: Accepted by the 28th ACM International Conference on Multimedia (ACM MM 2020

arXiv.org e-Print Archive

Crossref

Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis

Author: Chen Jingdong
Guo Tiande
Han Congying
Li Bonan
Li Tianqi
Liu Ziwen
Wang Meng
Yang Ming
Zhang Zicheng
Zheng Ruobing
Publication venue
Publication date: 27/02/2024
Field of study

Recent works in implicit representations, such as Neural Radiance Fields (NeRF), have advanced the generation of realistic and animatable head avatars from video sequences. These implicit methods are still confronted by visual artifacts and jitters, since the lack of explicit geometric constraints poses a fundamental challenge in accurately modeling complex facial deformations. In this paper, we introduce Dynamic Tetrahedra (DynTet), a novel hybrid representation that encodes explicit dynamic meshes by neural networks to ensure geometric consistency across various motions and viewpoints. DynTet is parameterized by the coordinate-based networks which learn signed distance, deformation, and material texture, anchoring the training data into a predefined tetrahedra grid. Leveraging Marching Tetrahedra, DynTet efficiently decodes textured meshes with a consistent topology, enabling fast rendering through a differentiable rasterizer and supervision via a pixel loss. To enhance training efficiency, we incorporate classical 3D Morphable Models to facilitate geometry learning and define a canonical space for simplifying texture learning. These advantages are readily achievable owing to the effective geometric representation employed in DynTet. Compared with prior works, DynTet demonstrates significant improvements in fidelity, lip synchronization, and real-time performance according to various metrics. Beyond producing stable and visually appealing synthesis videos, our method also outputs the dynamic meshes which is promising to enable many emerging applications.Comment: CVPR 202

arXiv.org e-Print Archive

LEED: Label-Free Expression Editing via Disentanglement

Author: A Mollahosseini
C Cao
H Li
I Higgins
J Geng
J Johnson
K Nagano
L van der Maaten
O Langner
S Du
Y Bengio
Y Chang
Z Wang
Publication venue
Publication date: 01/01/2020
Field of study

Recent studies on facial expression editing have obtained very promising progress. On the other hand, existing methods face the constraint of requiring a large amount of expression labels which are often expensive and time-consuming to collect. This paper presents an innovative label-free expression editing via disentanglement (LEED) framework that is capable of editing the expression of both frontal and profile facial images without requiring any expression label. The idea is to disentangle the identity and expression of a facial image in the expression manifold, where the neutral face captures the identity attribute and the displacement between the neutral image and the expressive image captures the expression attribute. Two novel losses are designed for optimal expression disentanglement and consistent synthesis, including a mutual expression information loss that aims to extract pure expression-related features and a siamese loss that aims to enhance the expression similarity between the synthesized image and the reference image. Extensive experiments over two public facial expression datasets show that LEED achieves superior facial expression editing qualitatively and quantitatively.Comment: Accepted to ECCV 202

arXiv.org e-Print Archive

Crossref

DR-NTU (Digital Repository of NTU)