2,290 research outputs found
3D Face Arbitrary Style Transfer
Style transfer of 3D faces has gained more and more attention. However,
previous methods mainly use images of artistic faces for style transfer while
ignoring arbitrary style images such as abstract paintings. To solve this
problem, we propose a novel method, namely Face-guided Dual Style Transfer
(FDST). To begin with, FDST employs a 3D decoupling module to separate facial
geometry and texture. Then we propose a style fusion strategy for facial
geometry. Subsequently, we design an optimization-based DDSG mechanism for
textures that can guide the style transfer by two style images. Besides the
normal style image input, DDSG can utilize the original face input as another
style input as the face prior. By this means, high-quality face arbitrary style
transfer results can be obtained. Furthermore, FDST can be applied in many
downstream tasks, including region-controllable style transfer, high-fidelity
face texture reconstruction, large-pose face reconstruction, and artistic face
reconstruction. Comprehensive quantitative and qualitative results show that
our method can achieve comparable performance. All source codes and pre-trained
weights will be released to the public
HeadOn: Real-time Reenactment of Human Portrait Videos
We propose HeadOn, the first real-time source-to-target reenactment approach
for complete human portrait videos that enables transfer of torso and head
motion, face expression, and eye gaze. Given a short RGB-D video of the target
actor, we automatically construct a personalized geometry proxy that embeds a
parametric head, eye, and kinematic torso model. A novel real-time reenactment
algorithm employs this proxy to photo-realistically map the captured motion
from the source actor to the target actor. On top of the coarse geometric
proxy, we propose a video-based rendering technique that composites the
modified target portrait video via view- and pose-dependent texturing, and
creates photo-realistic imagery of the target actor under novel torso and head
poses, facial expressions, and gaze directions. To this end, we propose a
robust tracking of the face and torso of the source actor. We extensively
evaluate our approach and show significant improvements in enabling much
greater flexibility in creating realistic reenacted output videos.Comment: Video: https://www.youtube.com/watch?v=7Dg49wv2c_g Presented at
Siggraph'1
DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation
Talking head synthesis is a promising approach for the video production
industry. Recently, a lot of effort has been devoted in this research area to
improve the generation quality or enhance the model generalization. However,
there are few works able to address both issues simultaneously, which is
essential for practical applications. To this end, in this paper, we turn
attention to the emerging powerful Latent Diffusion Models, and model the
Talking head generation as an audio-driven temporally coherent denoising
process (DiffTalk). More specifically, instead of employing audio signals as
the single driving factor, we investigate the control mechanism of the talking
face, and incorporate reference face images and landmarks as conditions for
personality-aware generalized synthesis. In this way, the proposed DiffTalk is
capable of producing high-quality talking head videos in synchronization with
the source audio, and more importantly, it can be naturally generalized across
different identities without any further fine-tuning. Additionally, our
DiffTalk can be gracefully tailored for higher-resolution synthesis with
negligible extra computational cost. Extensive experiments show that the
proposed DiffTalk efficiently synthesizes high-fidelity audio-driven talking
head videos for generalized novel identities. For more video results, please
refer to \url{https://sstzal.github.io/DiffTalk/}.Comment: Project page https://sstzal.github.io/DiffTalk
- …