7 research outputs found
On Catastrophic Forgetting and Mode Collapse in Generative Adversarial Networks
In this paper, we show that Generative Adversarial Networks (GANs) suffer
from catastrophic forgetting even when they are trained to approximate a single
target distribution. We show that GAN training is a continual learning problem
in which the sequence of changing model distributions is the sequence of tasks
to the discriminator. The level of mismatch between tasks in the sequence
determines the level of forgetting. Catastrophic forgetting is interrelated to
mode collapse and can make the training of GANs non-convergent. We investigate
the landscape of the discriminator's output in different variants of GANs and
find that when a GAN converges to a good equilibrium, real training datapoints
are wide local maxima of the discriminator. We empirically show the
relationship between the sharpness of local maxima and mode collapse and
generalization in GANs. We show how catastrophic forgetting prevents the
discriminator from making real datapoints local maxima, and thus causes
non-convergence. Finally, we study methods for preventing catastrophic
forgetting in GANs.Comment: This is an extended version of our paper in ICML'18 Workshop on
Theoretical Foundation and Applications of Deep Generative Models. Accepted
to IJCNN 202
Self-Supervised GAN to Counter Forgetting
GANs involve training two networks in an adversarial game, where each
network's task depends on its adversary. Recently, several works have framed
GAN training as an online or continual learning problem. We focus on the
discriminator, which must perform classification under an (adversarially)
shifting data distribution. When trained on sequential tasks, neural networks
exhibit \emph{forgetting}. For GANs, discriminator forgetting leads to training
instability. To counter forgetting, we encourage the discriminator to maintain
useful representations by adding a self-supervision. Conditional GANs have a
similar effect using labels. However, our self-supervised GAN does not require
labels, and closes the performance gap between conditional and unconditional
models. We show that, in doing so, the self-supervised discriminator learns
better representations than regular GANs.Comment: NeurIPS'18 Continual Learning worksho
Autoencoding Generative Adversarial Networks
In the years since Goodfellow et al. introduced Generative Adversarial
Networks (GANs), there has been an explosion in the breadth and quality of
generative model applications. Despite this work, GANs still have a long way to
go before they see mainstream adoption, owing largely to their infamous
training instability. Here I propose the Autoencoding Generative Adversarial
Network (AEGAN), a four-network model which learns a bijective mapping between
a specified latent space and a given sample space by applying an adversarial
loss and a reconstruction loss to both the generated images and the generated
latent vectors. The AEGAN technique offers several improvements to typical GAN
training, including training stabilization, mode-collapse prevention, and
permitting the direct interpolation between real samples. The effectiveness of
the technique is illustrated using an anime face dataset.Comment: 7 pages, 4 figure
Blind Image Deconvolution using Deep Generative Priors
This paper proposes a novel approach to regularize the \textit{ill-posed} and
\textit{non-linear} blind image deconvolution (blind deblurring) using deep
generative networks as priors. We employ two separate generative models --- one
trained to produce sharp images while the other trained to generate blur
kernels from lower-dimensional parameters. To deblur, we propose an alternating
gradient descent scheme operating in the latent lower-dimensional space of each
of the pretrained generative models. Our experiments show promising deblurring
results on images even under large blurs, and heavy noise. To address the
shortcomings of generative models such as mode collapse, we augment our
generative priors with classical image priors and report improved performance
on complex image datasets. The deblurring performance depends on how well the
range of the generator spans the image class. Interestingly, our experiments
show that even an untrained structured (convolutional) generative networks acts
as an image prior in the image deblurring context allowing us to extend our
results to more diverse natural image datasets
Generative Adversarial Network Training is a Continual Learning Problem
Generative Adversarial Networks (GANs) have proven to be a powerful framework
for learning to draw samples from complex distributions. However, GANs are also
notoriously difficult to train, with mode collapse and oscillations a common
problem. We hypothesize that this is at least in part due to the evolution of
the generator distribution and the catastrophic forgetting tendency of neural
networks, which leads to the discriminator losing the ability to remember
synthesized samples from previous instantiations of the generator. Recognizing
this, our contributions are twofold. First, we show that GAN training makes for
a more interesting and realistic benchmark for continual learning methods
evaluation than some of the more canonical datasets. Second, we propose
leveraging continual learning techniques to augment the discriminator,
preserving its ability to recognize previous generator samples. We show that
the resulting methods add only a light amount of computation, involve minimal
changes to the model, and result in better overall performance on the examined
image and text generation tasks
Task Agnostic Continual Learning Using Online Variational Bayes with Fixed-Point Updates
Background: Catastrophic forgetting is the notorious vulnerability of neural
networks to the changes in the data distribution during learning. This
phenomenon has long been considered a major obstacle for using learning agents
in realistic continual learning settings. A large body of continual learning
research assumes that task boundaries are known during training. However, only
a few works consider scenarios in which task boundaries are unknown or not well
defined -- task agnostic scenarios. The optimal Bayesian solution for this
requires an intractable online Bayes update to the weights posterior.
Contributions: We aim to approximate the online Bayes update as accurately as
possible. To do so, we derive novel fixed-point equations for the online
variational Bayes optimization problem, for multivariate Gaussian parametric
distributions. By iterating the posterior through these fixed-point equations,
we obtain an algorithm (FOO-VB) for continual learning which can handle
non-stationary data distribution using a fixed architecture and without using
external memory (i.e. without access to previous data). We demonstrate that our
method (FOO-VB) outperforms existing methods in task agnostic scenarios. FOO-VB
Pytorch implementation will be available online.Comment: The arXiv paper "Task Agnostic Continual Learning Using Online
Variational Bayes" is a preliminary pre-print of this paper. The main
differences between the versions are: 1. We develop new algorithmic framework
(FOO-VB). 2. We add multivariate Gaussian and matrix variate Gaussian
versions of the algorithm. 3. We demonstrate the new algorithm performance in
task agnostic scenario
Sample weighting as an explanation for mode collapse in generative adversarial networks
Generative adversarial networks were introduced with a logistic MiniMax cost
formulation, which normally fails to train due to saturation, and a
Non-Saturating reformulation. While addressing the saturation problem, NS-GAN
also inverts the generator's sample weighting, implicitly shifting emphasis
from higher-scoring to lower-scoring samples when updating parameters. We
present both theory and empirical results suggesting that this makes NS-GAN
prone to mode dropping. We design MM-nsat, which preserves MM-GAN sample
weighting while avoiding saturation by rescaling the MM-GAN minibatch gradient
such that its magnitude approximates NS-GAN's gradient magnitude. MM-nsat has
qualitatively different training dynamics, and on MNIST and CIFAR-10 it is
stronger in terms of mode coverage, stability and FID. While the empirical
results for MM-nsat are promising and favorable also in comparison with the
LS-GAN and Hinge-GAN formulations, our main contribution is to show how and why
NS-GAN's sample weighting causes mode dropping and training collapse.Comment: 41 pages, 21 figures, preprin