Search CORE

29 research outputs found

Semi-Supervised Learning with Generative Adversarial Networks

Author: Odena Augustus
Publication venue
Publication date: 21/10/2016
Field of study

We extend Generative Adversarial Networks (GANs) to the semi-supervised context by forcing the discriminator network to output class labels. We train a generative model G and a discriminator D on a dataset with inputs belonging to one of N classes. At training time, D is made to predict which of N+1 classes the input belongs to, where an extra class is added to correspond to the outputs of G. We show that this method can be used to create a more data-efficient classifier and that it allows for generating higher quality samples than a regular GAN.Comment: Appearing in the Data Efficient Machine Learning workshop at ICML 201

arXiv.org e-Print Archive

Faster Asynchronous SGD

Author: Odena Augustus
Publication venue
Publication date: 15/01/2016
Field of study

Asynchronous distributed stochastic gradient descent methods have trouble converging because of stale gradients. A gradient update sent to a parameter server by a client is stale if the parameters used to calculate that gradient have since been updated on the server. Approaches have been proposed to circumvent this problem that quantify staleness in terms of the number of elapsed updates. In this work, we propose a novel method that quantifies staleness in terms of moving averages of gradient statistics. We show that this method outperforms previous methods with respect to convergence speed and scalability to many clients. We also discuss how an extension to this method can be used to dramatically reduce bandwidth costs in a distributed training context. In particular, our method allows reduction of total bandwidth usage by a factor of 5 with little impact on cost convergence. We also describe (and link to) a software library that we have used to simulate these algorithms deterministically on a single machine.Comment: 10 page

arXiv.org e-Print Archive

Changing Model Behavior at Test-Time Using Reinforcement Learning

Author: Lawson Dieterich
Odena Augustus
Olah Christopher
Publication venue
Publication date: 24/02/2017
Field of study

Machine learning models are often used at test-time subject to constraints and trade-offs not present at training-time. For example, a computer vision model operating on an embedded device may need to perform real-time inference, or a translation model operating on a cell phone may wish to bound its average compute time in order to be power-efficient. In this work we describe a mixture-of-experts model and show how to change its test-time resource-usage on a per-input basis using reinforcement learning. We test our method on a small MNIST-based example.Comment: Submitted to ICLR 2017 Workshop Trac

arXiv.org e-Print Archive

TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing

Author: Goodfellow Ian
Odena Augustus
Publication venue
Publication date: 27/07/2018
Field of study

Machine learning models are notoriously difficult to interpret and debug. This is particularly true of neural networks. In this work, we introduce automated software testing techniques for neural networks that are well-suited to discovering errors which occur only for rare inputs. Specifically, we develop coverage-guided fuzzing (CGF) methods for neural networks. In CGF, random mutations of inputs to a neural network are guided by a coverage metric toward the goal of satisfying user-specified constraints. We describe how fast approximate nearest neighbor algorithms can provide this coverage metric. We then discuss the application of CGF to the following goals: finding numerical errors in trained neural networks, generating disagreements between neural networks and quantized versions of those networks, and surfacing undesirable behavior in character level language models. Finally, we release an open source library called TensorFuzz that implements the described techniques.Comment: Preprint - work in progres

arXiv.org e-Print Archive

Self-Attention Generative Adversarial Networks

Author: Goodfellow Ian
Metaxas Dimitris
Odena Augustus
Zhang Han
Publication venue
Publication date: 14/06/2019
Field of study

In this paper, we propose the Self-Attention Generative Adversarial Network (SAGAN) which allows attention-driven, long-range dependency modeling for image generation tasks. Traditional convolutional GANs generate high-resolution details as a function of only spatially local points in lower-resolution feature maps. In SAGAN, details can be generated using cues from all feature locations. Moreover, the discriminator can check that highly detailed features in distant portions of the image are consistent with each other. Furthermore, recent work has shown that generator conditioning affects GAN performance. Leveraging this insight, we apply spectral normalization to the GAN generator and find that this improves training dynamics. The proposed SAGAN achieves the state-of-the-art results, boosting the best published Inception score from 36.8 to 52.52 and reducing Frechet Inception distance from 27.62 to 18.65 on the challenging ImageNet dataset. Visualization of the attention layers shows that the generator leverages neighborhoods that correspond to object shapes rather than local regions of fixed shape

arXiv.org e-Print Archive

Discriminator Rejection Sampling

Author: Azadi Samaneh
Darrell Trevor
Goodfellow Ian
Odena Augustus
Olsson Catherine
Publication venue
Publication date: 26/02/2019
Field of study

We propose a rejection sampling scheme using the discriminator of a GAN to approximately correct errors in the GAN generator distribution. We show that under quite strict assumptions, this will allow us to recover the data distribution exactly. We then examine where those strict assumptions break down and design a practical algorithm - called Discriminator Rejection Sampling (DRS) - that can be used on real data-sets. Finally, we demonstrate the efficacy of DRS on a mixture of Gaussians and on the SAGAN model, state-of-the-art in the image generation task at the time of developing this work. On ImageNet, we train an improved baseline that increases the Inception Score from 52.52 to 62.36 and reduces the Frechet Inception Distance from 18.65 to 14.79. We then use DRS to further improve on this baseline, improving the Inception Score to 76.08 and the FID to 13.75.Comment: Published as a conference paper at ICLR 201

arXiv.org e-Print Archive

Skill Rating for Generative Models

Author: Bhupatiraju Surya
Brown Tom
Goodfellow Ian
Odena Augustus
Olsson Catherine
Publication venue
Publication date: 14/08/2018
Field of study

We explore a new way to evaluate generative models using insights from evaluation of competitive games between human players. We show experimentally that tournaments between generators and discriminators provide an effective way to evaluate generative models. We introduce two methods for summarizing tournament outcomes: tournament win rate and skill rating. Evaluations are useful in different contexts, including monitoring the progress of a single model as it learns during the training process, and comparing the capabilities of two different fully trained models. We show that a tournament consisting of a single model playing against past and future versions of itself produces a useful measure of training progress. A tournament containing multiple separate models (using different seeds, hyperparameters, and architectures) provides a useful relative comparison between different trained GANs. Tournament-based rating methods are conceptually distinct from numerous previous categories of approaches to evaluation of generative models, and have complementary advantages and disadvantages

arXiv.org e-Print Archive

Consistency Regularization for Generative Adversarial Networks

Author: Lee Honglak
Odena Augustus
Zhang Han
Zhang Zizhao
Publication venue
Publication date: 18/02/2020
Field of study

Generative Adversarial Networks (GANs) are known to be difficult to train, despite considerable research effort. Several regularization techniques for stabilizing training have been proposed, but they introduce non-trivial computational overheads and interact poorly with existing techniques like spectral normalization. In this work, we propose a simple, effective training stabilizer based on the notion of consistency regularization---a popular technique in the semi-supervised learning literature. In particular, we augment data passing into the GAN discriminator and penalize the sensitivity of the discriminator to these augmentations. We conduct a series of experiments to demonstrate that consistency regularization works effectively with spectral normalization and various GAN architectures, loss functions and optimizer settings. Our method achieves the best FID scores for unconditional image generation compared to other regularization methods on CIFAR-10 and CelebA. Moreover, Our consistency regularized GAN (CR-GAN) improves state-of-the-art FID scores for conditional generation from 14.73 to 11.48 on CIFAR-10 and from 8.73 to 6.66 on ImageNet-2012.Comment: ICLR202

arXiv.org e-Print Archive

Your Local GAN: Designing Two Dimensional Local Attention Mechanisms for Generative Models

Author: Daras Giannis
Dimakis Alexandros G.
Odena Augustus
Zhang Han
Publication venue
Publication date: 02/12/2019
Field of study

We introduce a new local sparse attention layer that preserves two-dimensional geometry and locality. We show that by just replacing the dense attention layer of SAGAN with our construction, we obtain very significant FID, Inception score and pure visual improvements. FID score is improved from

18.65

15.94

on ImageNet, keeping all other parameters the same. The sparse attention patterns that we propose for our new layer are designed using a novel information theoretic criterion that uses information flow graphs. We also present a novel way to invert Generative Adversarial Networks with attention. Our method extracts from the attention layer of the discriminator a saliency map, which we use to construct a new loss function for the inversion. This allows us to visualize the newly introduced attention heads and show that they indeed capture interesting aspects of two-dimensional geometry of real images.Comment: Added TFRC, tensorflow-gan acknowledgements. Changed "Ablation Study" to "Ablation Studies

arXiv.org e-Print Archive

Is Generator Conditioning Causally Related to GAN Performance?

Author: Brown Tom B.
Buckman Jacob
Goodfellow Ian
Odena Augustus
Olah Christopher
Olsson Catherine
Raffel Colin
Publication venue
Publication date: 18/06/2018
Field of study

Recent work (Pennington et al, 2017) suggests that controlling the entire distribution of Jacobian singular values is an important design consideration in deep learning. Motivated by this, we study the distribution of singular values of the Jacobian of the generator in Generative Adversarial Networks (GANs). We find that this Jacobian generally becomes ill-conditioned at the beginning of training. Moreover, we find that the average (with z from p(z)) conditioning of the generator is highly predictive of two other ad-hoc metrics for measuring the 'quality' of trained GANs: the Inception Score and the Frechet Inception Distance (FID). We test the hypothesis that this relationship is causal by proposing a 'regularization' technique (called Jacobian Clamping) that softly penalizes the condition number of the generator Jacobian. Jacobian Clamping improves the mean Inception Score and the mean FID for GANs trained on several datasets. It also greatly reduces inter-run variance of the aforementioned scores, addressing (at least partially) one of the main criticisms of GANs

arXiv.org e-Print Archive