29 research outputs found
Semi-Supervised Learning with Generative Adversarial Networks
We extend Generative Adversarial Networks (GANs) to the semi-supervised
context by forcing the discriminator network to output class labels. We train a
generative model G and a discriminator D on a dataset with inputs belonging to
one of N classes. At training time, D is made to predict which of N+1 classes
the input belongs to, where an extra class is added to correspond to the
outputs of G. We show that this method can be used to create a more
data-efficient classifier and that it allows for generating higher quality
samples than a regular GAN.Comment: Appearing in the Data Efficient Machine Learning workshop at ICML
201
Faster Asynchronous SGD
Asynchronous distributed stochastic gradient descent methods have trouble
converging because of stale gradients. A gradient update sent to a parameter
server by a client is stale if the parameters used to calculate that gradient
have since been updated on the server. Approaches have been proposed to
circumvent this problem that quantify staleness in terms of the number of
elapsed updates. In this work, we propose a novel method that quantifies
staleness in terms of moving averages of gradient statistics. We show that this
method outperforms previous methods with respect to convergence speed and
scalability to many clients. We also discuss how an extension to this method
can be used to dramatically reduce bandwidth costs in a distributed training
context. In particular, our method allows reduction of total bandwidth usage by
a factor of 5 with little impact on cost convergence. We also describe (and
link to) a software library that we have used to simulate these algorithms
deterministically on a single machine.Comment: 10 page
Changing Model Behavior at Test-Time Using Reinforcement Learning
Machine learning models are often used at test-time subject to constraints
and trade-offs not present at training-time. For example, a computer vision
model operating on an embedded device may need to perform real-time inference,
or a translation model operating on a cell phone may wish to bound its average
compute time in order to be power-efficient. In this work we describe a
mixture-of-experts model and show how to change its test-time resource-usage on
a per-input basis using reinforcement learning. We test our method on a small
MNIST-based example.Comment: Submitted to ICLR 2017 Workshop Trac
TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing
Machine learning models are notoriously difficult to interpret and debug.
This is particularly true of neural networks. In this work, we introduce
automated software testing techniques for neural networks that are well-suited
to discovering errors which occur only for rare inputs. Specifically, we
develop coverage-guided fuzzing (CGF) methods for neural networks. In CGF,
random mutations of inputs to a neural network are guided by a coverage metric
toward the goal of satisfying user-specified constraints. We describe how fast
approximate nearest neighbor algorithms can provide this coverage metric. We
then discuss the application of CGF to the following goals: finding numerical
errors in trained neural networks, generating disagreements between neural
networks and quantized versions of those networks, and surfacing undesirable
behavior in character level language models. Finally, we release an open source
library called TensorFuzz that implements the described techniques.Comment: Preprint - work in progres
Self-Attention Generative Adversarial Networks
In this paper, we propose the Self-Attention Generative Adversarial Network
(SAGAN) which allows attention-driven, long-range dependency modeling for image
generation tasks. Traditional convolutional GANs generate high-resolution
details as a function of only spatially local points in lower-resolution
feature maps. In SAGAN, details can be generated using cues from all feature
locations. Moreover, the discriminator can check that highly detailed features
in distant portions of the image are consistent with each other. Furthermore,
recent work has shown that generator conditioning affects GAN performance.
Leveraging this insight, we apply spectral normalization to the GAN generator
and find that this improves training dynamics. The proposed SAGAN achieves the
state-of-the-art results, boosting the best published Inception score from 36.8
to 52.52 and reducing Frechet Inception distance from 27.62 to 18.65 on the
challenging ImageNet dataset. Visualization of the attention layers shows that
the generator leverages neighborhoods that correspond to object shapes rather
than local regions of fixed shape
Discriminator Rejection Sampling
We propose a rejection sampling scheme using the discriminator of a GAN to
approximately correct errors in the GAN generator distribution. We show that
under quite strict assumptions, this will allow us to recover the data
distribution exactly. We then examine where those strict assumptions break down
and design a practical algorithm - called Discriminator Rejection Sampling
(DRS) - that can be used on real data-sets. Finally, we demonstrate the
efficacy of DRS on a mixture of Gaussians and on the SAGAN model,
state-of-the-art in the image generation task at the time of developing this
work. On ImageNet, we train an improved baseline that increases the Inception
Score from 52.52 to 62.36 and reduces the Frechet Inception Distance from 18.65
to 14.79. We then use DRS to further improve on this baseline, improving the
Inception Score to 76.08 and the FID to 13.75.Comment: Published as a conference paper at ICLR 201
Skill Rating for Generative Models
We explore a new way to evaluate generative models using insights from
evaluation of competitive games between human players. We show experimentally
that tournaments between generators and discriminators provide an effective way
to evaluate generative models. We introduce two methods for summarizing
tournament outcomes: tournament win rate and skill rating. Evaluations are
useful in different contexts, including monitoring the progress of a single
model as it learns during the training process, and comparing the capabilities
of two different fully trained models. We show that a tournament consisting of
a single model playing against past and future versions of itself produces a
useful measure of training progress. A tournament containing multiple separate
models (using different seeds, hyperparameters, and architectures) provides a
useful relative comparison between different trained GANs. Tournament-based
rating methods are conceptually distinct from numerous previous categories of
approaches to evaluation of generative models, and have complementary
advantages and disadvantages
Consistency Regularization for Generative Adversarial Networks
Generative Adversarial Networks (GANs) are known to be difficult to train,
despite considerable research effort. Several regularization techniques for
stabilizing training have been proposed, but they introduce non-trivial
computational overheads and interact poorly with existing techniques like
spectral normalization. In this work, we propose a simple, effective training
stabilizer based on the notion of consistency regularization---a popular
technique in the semi-supervised learning literature. In particular, we augment
data passing into the GAN discriminator and penalize the sensitivity of the
discriminator to these augmentations. We conduct a series of experiments to
demonstrate that consistency regularization works effectively with spectral
normalization and various GAN architectures, loss functions and optimizer
settings. Our method achieves the best FID scores for unconditional image
generation compared to other regularization methods on CIFAR-10 and CelebA.
Moreover, Our consistency regularized GAN (CR-GAN) improves state-of-the-art
FID scores for conditional generation from 14.73 to 11.48 on CIFAR-10 and from
8.73 to 6.66 on ImageNet-2012.Comment: ICLR202
Your Local GAN: Designing Two Dimensional Local Attention Mechanisms for Generative Models
We introduce a new local sparse attention layer that preserves
two-dimensional geometry and locality. We show that by just replacing the dense
attention layer of SAGAN with our construction, we obtain very significant FID,
Inception score and pure visual improvements. FID score is improved from
to on ImageNet, keeping all other parameters the same. The
sparse attention patterns that we propose for our new layer are designed using
a novel information theoretic criterion that uses information flow graphs. We
also present a novel way to invert Generative Adversarial Networks with
attention. Our method extracts from the attention layer of the discriminator a
saliency map, which we use to construct a new loss function for the inversion.
This allows us to visualize the newly introduced attention heads and show that
they indeed capture interesting aspects of two-dimensional geometry of real
images.Comment: Added TFRC, tensorflow-gan acknowledgements. Changed "Ablation Study"
to "Ablation Studies
Is Generator Conditioning Causally Related to GAN Performance?
Recent work (Pennington et al, 2017) suggests that controlling the entire
distribution of Jacobian singular values is an important design consideration
in deep learning. Motivated by this, we study the distribution of singular
values of the Jacobian of the generator in Generative Adversarial Networks
(GANs). We find that this Jacobian generally becomes ill-conditioned at the
beginning of training. Moreover, we find that the average (with z from p(z))
conditioning of the generator is highly predictive of two other ad-hoc metrics
for measuring the 'quality' of trained GANs: the Inception Score and the
Frechet Inception Distance (FID). We test the hypothesis that this relationship
is causal by proposing a 'regularization' technique (called Jacobian Clamping)
that softly penalizes the condition number of the generator Jacobian. Jacobian
Clamping improves the mean Inception Score and the mean FID for GANs trained on
several datasets. It also greatly reduces inter-run variance of the
aforementioned scores, addressing (at least partially) one of the main
criticisms of GANs