723 research outputs found
HyperReenact: one-shot reenactment via jointly learning to refine and retarget faces
In this paper, we present our method for neural face
reenactment, called HyperReenact, that aims to generate
realistic talking head images of a source identity, driven
by a target facial pose. Existing state-of-the-art face reenactment methods train controllable generative models that
learn to synthesize realistic facial images, yet producing
reenacted faces that are prone to significant visual artifacts,
especially under the challenging condition of extreme head
pose changes, or requiring expensive few-shot fine-tuning
to better preserve the source identity characteristics. We
propose to address these limitations by leveraging the photorealistic generation ability and the disentangled properties of a pretrained StyleGAN2 generator, by first inverting
the real images into its latent space and then using a hypernetwork to perform: (i) refinement of the source identity characteristics and (ii) facial pose re-targeting, eliminating this way the dependence on external editing methods that typically produce artifacts. Our method operates under the one-shot setting (i.e., using a single source
frame) and allows for cross-subject reenactment, without
requiring any subject-specific fine-tuning. We compare
our method both quantitatively and qualitatively against
several state-of-the-art techniques on the standard benchmarks of VoxCeleb1 and VoxCeleb2, demonstrating the superiority of our approach in producing artifact-free images, exhibiting remarkable robustness even under extreme
head pose changes. We make the code and the pretrained
models publicly available at: https://github.com/
StelaBou/HyperReenact
HyperReenact: One-Shot Reenactment via Jointly Learning to Refine and Retarget Faces
In this paper, we present our method for neural face reenactment, called HyperReenact, that aims to generate realistic talking head images of a source identity, driven by a target facial pose. Existing state-of-the-art face reenactment methods train controllable generative models that learn to synthesize realistic facial images, yet producing reenacted faces that are prone to significant visual artifacts, especially under the challenging condition of extreme head pose changes, or requiring expensive few-shot fine-tuning to better preserve the source identity characteristics. We propose to address these limitations by leveraging the photorealistic generation ability and the disentangled properties of a pretrained StyleGAN2 generator, by first inverting the real images into its latent space and then using a hypernetwork to perform:(i) refinement of the source identity characteristics and (ii) facial pose re-targeting, eliminating this way the dependence on external editing methods that typically produce artifacts. Our method operates under the one-shot setting (ie, using a single source frame) and allows for cross-subject reenactment, without requiring any subject-specific fine-tuning. We compare our method both quantitatively and qualitatively against several state-of-the-art techniques on the standard benchmarks of VoxCeleb1 and VoxCeleb2, demonstrating the superiority of our approach in producing artifact-free images, exhibiting remarkable robustness even under extreme head pose changes
HyperReenact: One-Shot Reenactment via Jointly Learning to Refine and Retarget Faces
In this paper, we present our method for neural face reenactment, called
HyperReenact, that aims to generate realistic talking head images of a source
identity, driven by a target facial pose. Existing state-of-the-art face
reenactment methods train controllable generative models that learn to
synthesize realistic facial images, yet producing reenacted faces that are
prone to significant visual artifacts, especially under the challenging
condition of extreme head pose changes, or requiring expensive few-shot
fine-tuning to better preserve the source identity characteristics. We propose
to address these limitations by leveraging the photorealistic generation
ability and the disentangled properties of a pretrained StyleGAN2 generator, by
first inverting the real images into its latent space and then using a
hypernetwork to perform: (i) refinement of the source identity characteristics
and (ii) facial pose re-targeting, eliminating this way the dependence on
external editing methods that typically produce artifacts. Our method operates
under the one-shot setting (i.e., using a single source frame) and allows for
cross-subject reenactment, without requiring any subject-specific fine-tuning.
We compare our method both quantitatively and qualitatively against several
state-of-the-art techniques on the standard benchmarks of VoxCeleb1 and
VoxCeleb2, demonstrating the superiority of our approach in producing
artifact-free images, exhibiting remarkable robustness even under extreme head
pose changes. We make the code and the pretrained models publicly available at:
https://github.com/StelaBou/HyperReenact .Comment: Accepted for publication in ICCV 2023. Project page:
https://stelabou.github.io/hyperreenact.github.io/ Code:
https://github.com/StelaBou/HyperReenac
Semantic-aware One-shot Face Re-enactment with Dense Correspondence Estimation
One-shot face re-enactment is a challenging task due to the identity mismatch
between source and driving faces. Specifically, the suboptimally disentangled
identity information of driving subjects would inevitably interfere with the
re-enactment results and lead to face shape distortion. To solve this problem,
this paper proposes to use 3D Morphable Model (3DMM) for explicit facial
semantic decomposition and identity disentanglement. Instead of using 3D
coefficients alone for re-enactment control, we take the advantage of the
generative ability of 3DMM to render textured face proxies. These proxies
contain abundant yet compact geometric and semantic information of human faces,
which enable us to compute the face motion field between source and driving
images by estimating the dense correspondence. In this way, we could
approximate re-enactment results by warping source images according to the
motion field, and a Generative Adversarial Network (GAN) is adopted to further
improve the visual quality of warping results. Extensive experiments on various
datasets demonstrate the advantages of the proposed method over existing
start-of-the-art benchmarks in both identity preservation and re-enactment
fulfillment
MarioNETte: Few-shot Face Reenactment Preserving Identity of Unseen Targets
When there is a mismatch between the target identity and the driver identity,
face reenactment suffers severe degradation in the quality of the result,
especially in a few-shot setting. The identity preservation problem, where the
model loses the detailed information of the target leading to a defective
output, is the most common failure mode. The problem has several potential
sources such as the identity of the driver leaking due to the identity
mismatch, or dealing with unseen large poses. To overcome such problems, we
introduce components that address the mentioned problem: image attention block,
target feature alignment, and landmark transformer. Through attending and
warping the relevant features, the proposed architecture, called MarioNETte,
produces high-quality reenactments of unseen identities in a few-shot setting.
In addition, the landmark transformer dramatically alleviates the identity
preservation problem by isolating the expression geometry through landmark
disentanglement. Comprehensive experiments are performed to verify that the
proposed framework can generate highly realistic faces, outperforming all other
baselines, even under a significant mismatch of facial characteristics between
the target and the driver.Comment: In AAAI 202
- …