298 research outputs found
NARRATE: A Normal Assisted Free-View Portrait Stylizer
In this work, we propose NARRATE, a novel pipeline that enables
simultaneously editing portrait lighting and perspective in a photorealistic
manner. As a hybrid neural-physical face model, NARRATE leverages complementary
benefits of geometry-aware generative approaches and normal-assisted physical
face models. In a nutshell, NARRATE first inverts the input portrait to a
coarse geometry and employs neural rendering to generate images resembling the
input, as well as producing convincing pose changes. However, inversion step
introduces mismatch, bringing low-quality images with less facial details. As
such, we further estimate portrait normal to enhance the coarse geometry,
creating a high-fidelity physical face model. In particular, we fuse the neural
and physical renderings to compensate for the imperfect inversion, resulting in
both realistic and view-consistent novel perspective images. In relighting
stage, previous works focus on single view portrait relighting but ignoring
consistency between different perspectives as well, leading unstable and
inconsistent lighting effects for view changes. We extend Total Relighting to
fix this problem by unifying its multi-view input normal maps with the physical
face model. NARRATE conducts relighting with consistent normal maps, imposing
cross-view constraints and exhibiting stable and coherent illumination effects.
We experimentally demonstrate that NARRATE achieves more photorealistic,
reliable results over prior works. We further bridge NARRATE with animation and
style transfer tools, supporting pose change, light change, facial animation,
and style transfer, either separately or in combination, all at a photographic
quality. We showcase vivid free-view facial animations as well as 3D-aware
relightable stylization, which help facilitate various AR/VR applications like
virtual cinematography, 3D video conferencing, and post-production.Comment: 14 pages,13 figures https://youtu.be/mP4FV3evmy
PhotoApp: Photorealistic Appearance Editing of Head Portraits
Photorealistic editing of portraits is a challenging task as humans are very
sensitive to inconsistencies in faces. We present an approach for high-quality
intuitive editing of the camera viewpoint and scene illumination in a portrait
image. This requires our method to capture and control the full reflectance
field of the person in the image. Most editing approaches rely on supervised
learning using training data captured with setups such as light and camera
stages. Such datasets are expensive to acquire, not readily available and do
not capture all the rich variations of in-the-wild portrait images. In
addition, most supervised approaches only focus on relighting, and do not allow
camera viewpoint editing. Thus, they only capture and control a subset of the
reflectance field. Recently, portrait editing has been demonstrated by
operating in the generative model space of StyleGAN. While such approaches do
not require direct supervision, there is a significant loss of quality when
compared to the supervised approaches. In this paper, we present a method which
learns from limited supervised training data. The training images only include
people in a fixed neutral expression with eyes closed, without much hair or
background variations. Each person is captured under 150 one-light-at-a-time
conditions and under 8 camera poses. Instead of training directly in the image
space, we design a supervised problem which learns transformations in the
latent space of StyleGAN. This combines the best of supervised learning and
generative adversarial modeling. We show that the StyleGAN prior allows for
generalisation to different expressions, hairstyles and backgrounds. This
produces high-quality photorealistic results for in-the-wild images and
significantly outperforms existing methods. Our approach can edit the
illumination and pose simultaneously, and runs at interactive rates.Comment: http://gvv.mpi-inf.mpg.de/projects/PhotoApp
PhotoApp: Photorealistic Appearance Editing of Head Portraits
Photorealistic editing of head portraits is a challenging task as humans are very sensitive to inconsistencies in faces. We present an approach for high-quality intuitive editing of the camera viewpoint and scene illumination (parameterised with an environment map) in a portrait image. This requires our method to capture and control the full reflectance field of the person in the image. Most editing approaches rely on supervised learning using training data captured with setups such as light and camera stages. Such datasets are expensive to acquire, not readily available and do not capture all the rich variations of in-the-wild portrait images. In addition, most supervised approaches only focus on relighting, and do not allow camera viewpoint editing. Thus, they only capture and control a subset of the reflectance field. Recently, portrait editing has been demonstrated by operating in the generative model space of StyleGAN. While such approaches do not require direct supervision, there is a significant loss of quality when compared to the supervised approaches. In this paper, we present a method which learns from limited supervised training data. The training images only include people in a fixed neutral expression with eyes closed, without much hair or background variations. Each person is captured under 150 one-light-at-a-time conditions and under 8 camera poses. Instead of training directly in the image space, we design a supervised problem which learns transformations in the latent space of StyleGAN. This combines the best of supervised learning and generative adversarial modeling. We show that the StyleGAN prior allows for generalisation to different expressions, hairstyles and backgrounds. This produces high-quality photorealistic results for in-the-wild images and significantly outperforms existing methods. Our approach can edit the illumination and pose simultaneously, and runs at interactive rates
{PIE}: {P}ortrait Image Embedding for Semantic Control
Editing of portrait images is a very popular and important research topic with a large variety of applications. For ease of use, control should be provided via a semantically meaningful parameterization that is akin to computer animation controls. The vast majority of existing techniques do not provide such intuitive and fine-grained control, or only enable coarse editing of a single isolated control parameter. Very recently, high-quality semantically controlled editing has been demonstrated, however only on synthetically created StyleGAN images. We present the first approach for embedding real portrait images in the latent space of StyleGAN, which allows for intuitive editing of the head pose, facial expression, and scene illumination in the image. Semantic editing in parameter space is achieved based on StyleRig, a pretrained neural network that maps the control space of a 3D morphable face model to the latent space of the GAN. We design a novel hierarchical non-linear optimization problem to obtain the embedding. An identity preservation energy term allows spatially coherent edits while maintaining facial integrity. Our approach runs at interactive frame rates and thus allows the user to explore the space of possible edits. We evaluate our approach on a wide set of portrait photos, compare it to the current state of the art, and validate the effectiveness of its components in an ablation study
ReliTalk: Relightable Talking Portrait Generation from a Single Video
Recent years have witnessed great progress in creating vivid audio-driven
portraits from monocular videos. However, how to seamlessly adapt the created
video avatars to other scenarios with different backgrounds and lighting
conditions remains unsolved. On the other hand, existing relighting studies
mostly rely on dynamically lighted or multi-view data, which are too expensive
for creating video portraits. To bridge this gap, we propose ReliTalk, a novel
framework for relightable audio-driven talking portrait generation from
monocular videos. Our key insight is to decompose the portrait's reflectance
from implicitly learned audio-driven facial normals and images. Specifically,
we involve 3D facial priors derived from audio features to predict delicate
normal maps through implicit functions. These initially predicted normals then
take a crucial part in reflectance decomposition by dynamically estimating the
lighting condition of the given video. Moreover, the stereoscopic face
representation is refined using the identity-consistent loss under simulated
multiple lighting conditions, addressing the ill-posed problem caused by
limited views available from a single monocular video. Extensive experiments
validate the superiority of our proposed framework on both real and synthetic
datasets. Our code is released in https://github.com/arthur-qiu/ReliTalk
- …