Search CORE

32 research outputs found

Photorealistic Audio-driven Video Portraits

Author: Chen Ze-Yin
Hi Shi-Min
Richardt Christian
Wang Miao
Wen Xin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 31/12/2020
Field of study

Video portraits are common in a variety of applications, such as videoconferencing, news broadcasting, and virtual education and training. We present a novel method to synthesize photorealistic video portraits for an input portrait video, automatically driven by a person’s voice. The main challenge in this task is the hallucination of plausible, photorealistic facial expressions from input speech audio. To address this challenge, we employ a parametric 3D face model represented by geometry, facial expression, illumination, etc., and learn a mapping from audio features to model parameters. The input source audio is first represented as a high-dimensional feature, which is used to predict facial expression parameters of the 3D face model. We then replace the expression parameters computed from the original target video with the predicted one, and rerender the reenacted face. Finally, we generate a photorealistic video portrait from the reenacted synthetic face sequence via a neural face renderer. One appealing feature of our approach is the generalization capability for various input speech audio, including synthetic speech audio from text-to-speech software. Extensive experimental results show that our approach outperforms previous general-purpose audio-driven video portrait methods. This includes a user study demonstrating that our results are rated as more realistic than previous methods

OPUS

Example-Guided Style-Consistent Image Synthesis from Semantic Labeling

Author: Hall Peter
Hi Shi-Min
Li Ruilong
Liang Run-Ze
Wang Miao
Zhang Song-Hai
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/01/2020
Field of study

Example-guided image synthesis aims to synthesize an image from a semantic label map and an exemplary image indicating style. We use the term “style” in this problem to refer to implicit characteristics of images, for example: in portraits “style” includes gender, racial identity, age, hairstyle; in full body pictures it includes clothing; in street scenes it refers to weather and time of day and such like. A semantic label map in these cases indicates facial expres- sion, full body pose, or scene segmentation. We propose a solution to the example-guided image synthesis problem us- ing conditional generative adversarial networks with style consistency. Our key contributions are (i) a novel style consistency discriminator to determine whether a pair of im- ages are consistent in style; (ii) an adaptive semantic con- sistency loss; and (iii) a training data sampling strategy, for synthesizing style-consistent results to the exemplar. We demonstrate the efficiency of our method on face, dance and street view synthesis tasks

OPUS