Face swapping aims at injecting a source image's identity (i.e., facial
features) into a target image, while strictly preserving the target's
attributes, which are irrelevant to identity. However, we observed that
previous approaches still suffer from source attribute leakage, where the
source image's attributes interfere with the target image's. In this paper, we
analyze the latent space of StyleGAN and find the adequate combination of the
latents geared for face swapping task. Based on the findings, we develop a
simple yet robust face swapping model, RobustSwap, which is resistant to the
potential source attribute leakage. Moreover, we exploit the coordination of
3DMM's implicit and explicit information as a guidance to incorporate the
structure of the source image and the precise pose of the target image. Despite
our method solely utilizing an image dataset without identity labels for
training, our model has the capability to generate high-fidelity and temporally
consistent videos. Through extensive qualitative and quantitative evaluations,
we demonstrate that our method shows significant improvements compared with the
previous face swapping models in synthesizing both images and videos. Project
page is available at https://robustswap.github.io/Comment: 21 page