145 research outputs found
Improving Out-of-Distribution Robustness of Classifiers via Generative Interpolation
Deep neural networks achieve superior performance for learning from
independent and identically distributed (i.i.d.) data. However, their
performance deteriorates significantly when handling out-of-distribution (OoD)
data, where the training and test are drawn from different distributions. In
this paper, we explore utilizing the generative models as a data augmentation
source for improving out-of-distribution robustness of neural classifiers.
Specifically, we develop a simple yet effective method called Generative
Interpolation to fuse generative models trained from multiple domains for
synthesizing diverse OoD samples. Training a generative model directly on the
source domains tends to suffer from mode collapse and sometimes amplifies the
data bias. Instead, we first train a StyleGAN model on one source domain and
then fine-tune it on the other domains, resulting in many correlated generators
where their model parameters have the same initialization thus are aligned. We
then linearly interpolate the model parameters of the generators to spawn new
sets of generators. Such interpolated generators are used as an extra data
augmentation source to train the classifiers. The interpolation coefficients
can flexibly control the augmentation direction and strength. In addition, a
style-mixing mechanism is applied to further improve the diversity of the
generated OoD samples. Our experiments show that the proposed method explicitly
increases the diversity of training domains and achieves consistent
improvements over baselines across datasets and multiple different distribution
shifts
In-Domain GAN Inversion for Faithful Reconstruction and Editability
Generative Adversarial Networks (GANs) have significantly advanced image
synthesis through mapping randomly sampled latent codes to high-fidelity
synthesized images. However, applying well-trained GANs to real image editing
remains challenging. A common solution is to find an approximate latent code
that can adequately recover the input image to edit, which is also known as GAN
inversion. To invert a GAN model, prior works typically focus on reconstructing
the target image at the pixel level, yet few studies are conducted on whether
the inverted result can well support manipulation at the semantic level. This
work fills in this gap by proposing in-domain GAN inversion, which consists of
a domain-guided encoder and a domain-regularized optimizer, to regularize the
inverted code in the native latent space of the pre-trained GAN model. In this
way, we manage to sufficiently reuse the knowledge learned by GANs for image
reconstruction, facilitating a wide range of editing applications without any
retraining. We further make comprehensive analyses on the effects of the
encoder structure, the starting inversion point, as well as the inversion
parameter space, and observe the trade-off between the reconstruction quality
and the editing property. Such a trade-off sheds light on how a GAN model
represents an image with various semantics encoded in the learned latent
distribution. Code, models, and demo are available at the project page:
https://genforce.github.io/idinvert/
Spatial Steerability of GANs via Self-Supervision from Discriminator
Generative models make huge progress to the photorealistic image synthesis in
recent years. To enable human to steer the image generation process and
customize the output, many works explore the interpretable dimensions of the
latent space in GANs. Existing methods edit the attributes of the output image
such as orientation or color scheme by varying the latent code along certain
directions. However, these methods usually require additional human annotations
for each pretrained model, and they mostly focus on editing global attributes.
In this work, we propose a self-supervised approach to improve the spatial
steerability of GANs without searching for steerable directions in the latent
space or requiring extra annotations. Specifically, we design randomly sampled
Gaussian heatmaps to be encoded into the intermediate layers of generative
models as spatial inductive bias. Along with training the GAN model from
scratch, these heatmaps are being aligned with the emerging attention of the
GAN's discriminator in a self-supervised learning manner. During inference,
human users can intuitively interact with the spatial heatmaps to edit the
output image, such as varying the scene layout or moving objects in the scene.
Extensive experiments show that the proposed method not only enables spatial
editing over human faces, animal faces, outdoor scenes, and complicated indoor
scenes, but also brings improvement in synthesis quality.Comment: This manuscript is a journal extension of our previous conference
work (arXiv:2112.00718), submitted to TPAM
Improving GANs with A Dynamic Discriminator
Discriminator plays a vital role in training generative adversarial networks
(GANs) via distinguishing real and synthesized samples. While the real data
distribution remains the same, the synthesis distribution keeps varying because
of the evolving generator, and thus effects a corresponding change to the
bi-classification task for the discriminator. We argue that a discriminator
with an on-the-fly adjustment on its capacity can better accommodate such a
time-varying task. A comprehensive empirical study confirms that the proposed
training strategy, termed as DynamicD, improves the synthesis performance
without incurring any additional computation cost or training objectives. Two
capacity adjusting schemes are developed for training GANs under different data
regimes: i) given a sufficient amount of training data, the discriminator
benefits from a progressively increased learning capacity, and ii) when the
training data is limited, gradually decreasing the layer width mitigates the
over-fitting issue of the discriminator. Experiments on both 2D and 3D-aware
image synthesis tasks conducted on a range of datasets substantiate the
generalizability of our DynamicD as well as its substantial improvement over
the baselines. Furthermore, DynamicD is synergistic to other
discriminator-improving approaches (including data augmentation, regularizers,
and pre-training), and brings continuous performance gain when combined for
learning GANs.Comment: To appear in NeurIPS 202
- …