78 research outputs found
Cycle Encoding of a StyleGAN Encoder for Improved Reconstruction and Editability
GAN inversion aims to invert an input image into the latent space of a
pre-trained GAN. Despite the recent advances in GAN inversion, there remain
challenges to mitigate the tradeoff between distortion and editability, i.e.
reconstructing the input image accurately and editing the inverted image with a
small visual quality drop. The recently proposed pivotal tuning model makes
significant progress towards reconstruction and editability, by using a
two-step approach that first inverts the input image into a latent code, called
pivot code, and then alters the generator so that the input image can be
accurately mapped into the pivot code. Here, we show that both reconstruction
and editability can be improved by a proper design of the pivot code. We
present a simple yet effective method, named cycle encoding, for a high-quality
pivot code. The key idea of our method is to progressively train an encoder in
varying spaces according to a cycle scheme: W->W+->W. This training methodology
preserves the properties of both W and W+ spaces, i.e. high editability of W
and low distortion of W+. To further decrease the distortion, we also propose
to refine the pivot code with an optimization-based method, where a
regularization term is introduced to reduce the degradation in editability.
Qualitative and quantitative comparisons to several state-of-the-art methods
demonstrate the superiority of our approach
TriPlaneNet: An Encoder for EG3D Inversion
Recent progress in NeRF-based GANs has introduced a number of approaches for
high-resolution and high-fidelity generative modeling of human heads with a
possibility for novel view rendering. At the same time, one must solve an
inverse problem to be able to re-render or modify an existing image or video.
Despite the success of universal optimization-based methods for 2D GAN
inversion, those applied to 3D GANs may fail to extrapolate the result onto the
novel view, whereas optimization-based 3D GAN inversion methods are
time-consuming and can require at least several minutes per image. Fast
encoder-based techniques, such as those developed for StyleGAN, may also be
less appealing due to the lack of identity preservation. Our work introduces a
fast technique that bridges the gap between the two approaches by directly
utilizing the tri-plane representation presented for the EG3D generative model.
In particular, we build upon a feed-forward convolutional encoder for the
latent code and extend it with a fully-convolutional predictor of tri-plane
numerical offsets. The renderings are similar in quality to the ones produced
by optimization-based techniques and outperform the ones by encoder-based
methods. As we empirically prove, this is a consequence of directly operating
in the tri-plane space, not in the GAN parameter space, while making use of an
encoder-based trainable approach. Finally, we demonstrate significantly more
correct embedding of a face image in 3D than for all the baselines, further
strengthened by a probably symmetric prior enabled during training.Comment: Project page: https://anantarb.github.io/triplanene
Generative Prior for Unsupervised Image Restoration
The challenge of restoring real world low-quality images is due to a lack of appropriate training data and difficulty in determining how the image was degraded. Recently, generative models have demonstrated great potential for creating high- quality images by utilizing the rich and diverse information contained within the model’s trained weights and learned latent representations. One popular type of generative model is the generative adversarial network (GAN). Many new methods have been developed to harness the information found in GANs for image manipulation. Our proposed approach is to utilize generative models for both understanding the degradation of an image and restoring it. We propose using a combination of cycle consistency losses and self-attention to enhance face images by first learning the degradation and then using this information to train a style-based neural network. We also aim to use the latent representation to achieve a high level of magnification for face images (x64). By incorporating the weights of a pre-trained StyleGAN into a restoration network with a vision transformer layer, we hope to improve the current state-of-the-art in face image restoration. Finally, we present a projection-based image-denoising algorithm named Noise2Code in the latent space of the VQGAN model with a fixed-point regularization strategy. The fixed-point condition follows the observation that the pre-trained VQGAN affects the clean and noisy images in a drastically different way. Unlike previous projection-based image restoration in the latent space, both the denoising network and VQGAN model parameters are jointly trained, although the latter is not needed during the testing. We report experimental results to demonstrate that the proposed Noise2Code approach is conceptually simple, computationally efficient, and generalizable to real-world degradation scenarios
StyleDomain: Efficient and Lightweight Parameterizations of StyleGAN for One-shot and Few-shot Domain Adaptation
Domain adaptation of GANs is a problem of fine-tuning the state-of-the-art
GAN models (e.g. StyleGAN) pretrained on a large dataset to a specific domain
with few samples (e.g. painting faces, sketches, etc.). While there are a great
number of methods that tackle this problem in different ways, there are still
many important questions that remain unanswered.
In this paper, we provide a systematic and in-depth analysis of the domain
adaptation problem of GANs, focusing on the StyleGAN model. First, we perform a
detailed exploration of the most important parts of StyleGAN that are
responsible for adapting the generator to a new domain depending on the
similarity between the source and target domains. As a result of this in-depth
study, we propose new efficient and lightweight parameterizations of StyleGAN
for domain adaptation. Particularly, we show there exist directions in
StyleSpace (StyleDomain directions) that are sufficient for adapting to similar
domains and they can be reduced further. For dissimilar domains, we propose
Affine and AffineLight parameterizations that allows us to outperform
existing baselines in few-shot adaptation with low data regime. Finally, we
examine StyleDomain directions and discover their many surprising properties
that we apply for domain mixing and cross-domain image morphing.Comment: Preprin
- …