3,266 research outputs found
In-Domain GAN Inversion for Faithful Reconstruction and Editability
Generative Adversarial Networks (GANs) have significantly advanced image
synthesis through mapping randomly sampled latent codes to high-fidelity
synthesized images. However, applying well-trained GANs to real image editing
remains challenging. A common solution is to find an approximate latent code
that can adequately recover the input image to edit, which is also known as GAN
inversion. To invert a GAN model, prior works typically focus on reconstructing
the target image at the pixel level, yet few studies are conducted on whether
the inverted result can well support manipulation at the semantic level. This
work fills in this gap by proposing in-domain GAN inversion, which consists of
a domain-guided encoder and a domain-regularized optimizer, to regularize the
inverted code in the native latent space of the pre-trained GAN model. In this
way, we manage to sufficiently reuse the knowledge learned by GANs for image
reconstruction, facilitating a wide range of editing applications without any
retraining. We further make comprehensive analyses on the effects of the
encoder structure, the starting inversion point, as well as the inversion
parameter space, and observe the trade-off between the reconstruction quality
and the editing property. Such a trade-off sheds light on how a GAN model
represents an image with various semantics encoded in the learned latent
distribution. Code, models, and demo are available at the project page:
https://genforce.github.io/idinvert/
Unsupervised Domain Adaptation GAN Inversion for Image Editing
Existing GAN inversion methods work brilliantly for high-quality image
reconstruction and editing while struggling with finding the corresponding
high-quality images for low-quality inputs. Therefore, recent works are
directed toward leveraging the supervision of paired high-quality and
low-quality images for inversion. However, these methods are infeasible in
real-world scenarios and further hinder performance improvement. In this paper,
we resolve this problem by introducing Unsupervised Domain Adaptation (UDA)
into the Inversion process, namely UDA-Inversion, for both high-quality and
low-quality image inversion and editing. Particularly, UDA-Inversion first
regards the high-quality and low-quality images as the source domain and
unlabeled target domain, respectively. Then, a discrepancy function is
presented to measure the difference between two domains, after which we
minimize the source error and the discrepancy between the distributions of two
domains in the latent space to obtain accurate latent codes for low-quality
images. Without direct supervision, constructive representations of
high-quality images can be spontaneously learned and transformed into
low-quality images based on unsupervised domain adaptation. Experimental
results indicate that UDA-inversion is the first that achieves a comparable
level of performance with supervised methods in low-quality images across
multiple domain datasets. We hope this work provides a unique inspiration for
latent embedding distributions in image process tasks
InfoScrub: Towards Attribute Privacy by Targeted Obfuscation
Personal photos of individuals when shared online, apart from exhibiting a
myriad of memorable details, also reveals a wide range of private information
and potentially entails privacy risks (e.g., online harassment, tracking). To
mitigate such risks, it is crucial to study techniques that allow individuals
to limit the private information leaked in visual data. We tackle this problem
in a novel image obfuscation framework: to maximize entropy on inferences over
targeted privacy attributes, while retaining image fidelity. We approach the
problem based on an encoder-decoder style architecture, with two key novelties:
(a) introducing a discriminator to perform bi-directional translation
simultaneously from multiple unpaired domains; (b) predicting an image
interpolation which maximizes uncertainty over a target set of attributes. We
find our approach generates obfuscated images faithful to the original input
images, and additionally increase uncertainty by 6.2 (or up to 0.85
bits) over the non-obfuscated counterparts.Comment: 20 pages, 7 figure
Out-of-domain GAN inversion via Invertibility Decomposition for Photo-Realistic Human Face Manipulation
The fidelity of Generative Adversarial Networks (GAN) inversion is impeded by
Out-Of-Domain (OOD) areas (e.g., background, accessories) in the image.
Detecting the OOD areas beyond the generation ability of the pre-trained model
and blending these regions with the input image can enhance fidelity. The
"invertibility mask" figures out these OOD areas, and existing methods predict
the mask with the reconstruction error. However, the estimated mask is usually
inaccurate due to the influence of the reconstruction error in the In-Domain
(ID) area. In this paper, we propose a novel framework that enhances the
fidelity of human face inversion by designing a new module to decompose the
input images to ID and OOD partitions with invertibility masks. Unlike previous
works, our invertibility detector is simultaneously learned with a spatial
alignment module. We iteratively align the generated features to the input
geometry and reduce the reconstruction error in the ID regions. Thus, the OOD
areas are more distinguishable and can be precisely predicted. Then, we improve
the fidelity of our results by blending the OOD areas from the input image with
the ID GAN inversion results. Our method produces photo-realistic results for
real-world human face image inversion and manipulation. Extensive experiments
demonstrate our method's superiority over existing methods in the quality of
GAN inversion and attribute manipulation
GAN Inversion: A Survey
GAN inversion aims to invert a given image back into the latent space of a
pretrained GAN model, for the image to be faithfully reconstructed from the
inverted code by the generator. As an emerging technique to bridge the real and
fake image domains, GAN inversion plays an essential role in enabling the
pretrained GAN models such as StyleGAN and BigGAN to be used for real image
editing applications. Meanwhile, GAN inversion also provides insights on the
interpretation of GAN's latent space and how the realistic images can be
generated. In this paper, we provide an overview of GAN inversion with a focus
on its recent algorithms and applications. We cover important techniques of GAN
inversion and their applications to image restoration and image manipulation.
We further elaborate on some trends and challenges for future directions
Spatial-Contextual Discrepancy Information Compensation for GAN Inversion
Most existing GAN inversion methods either achieve accurate reconstruction
but lack editability or offer strong editability at the cost of fidelity.
Hence, how to balance the distortioneditability trade-off is a significant
challenge for GAN inversion. To address this challenge, we introduce a novel
spatial-contextual discrepancy information compensationbased GAN-inversion
method (SDIC), which consists of a discrepancy information prediction network
(DIPN) and a discrepancy information compensation network (DICN). SDIC follows
a "compensate-and-edit" paradigm and successfully bridges the gap in image
details between the original image and the reconstructed/edited image. On the
one hand, DIPN encodes the multi-level spatial-contextual information of the
original and initial reconstructed images and then predicts a
spatial-contextual guided discrepancy map with two hourglass modules. In this
way, a reliable discrepancy map that models the contextual relationship and
captures finegrained image details is learned. On the other hand, DICN
incorporates the predicted discrepancy information into both the latent code
and the GAN generator with different transformations, generating high-quality
reconstructed/edited images. This effectively compensates for the loss of image
details during GAN inversion. Both quantitative and qualitative experiments
demonstrate that our proposed method achieves the excellent
distortion-editability trade-off at a fast inference speed for both image
inversion and editing tasks
- …