7,084 research outputs found
Regularization Methods for Generative Adversarial Networks: An Overview of Recent Studies
Despite its short history, Generative Adversarial Network (GAN) has been
extensively studied and used for various tasks, including its original purpose,
i.e., synthetic sample generation. However, applying GAN to different data
types with diverse neural network architectures has been hindered by its
limitation in training, where the model easily diverges. Such a notorious
training of GANs is well known and has been addressed in numerous studies.
Consequently, in order to make the training of GAN stable, numerous
regularization methods have been proposed in recent years. This paper reviews
the regularization methods that have been recently introduced, most of which
have been published in the last three years. Specifically, we focus on general
methods that can be commonly used regardless of neural network architectures.
To explore the latest research trends in the regularization for GANs, the
methods are classified into several groups by their operation principles, and
the differences between the methods are analyzed. Furthermore, to provide
practical knowledge of using these methods, we investigate popular methods that
have been frequently employed in state-of-the-art GANs. In addition, we discuss
the limitations in existing methods and propose future research directions
Training Faster by Separating Modes of Variation in Batch-normalized Models
Batch Normalization (BN) is essential to effectively train state-of-the-art
deep Convolutional Neural Networks (CNN). It normalizes inputs to the layers
during training using the statistics of each mini-batch. In this work, we study
BN from the viewpoint of Fisher kernels. We show that assuming samples within a
mini-batch are from the same probability density function, then BN is identical
to the Fisher vector of a Gaussian distribution. That means BN can be explained
in terms of kernels that naturally emerge from the probability density function
of the underlying data distribution. However, given the rectifying
non-linearities employed in CNN architectures, distribution of inputs to the
layers show heavy tail and asymmetric characteristics. Therefore, we propose
approximating underlying data distribution not with one, but a mixture of
Gaussian densities. Deriving Fisher vector for a Gaussian Mixture Model (GMM),
reveals that BN can be improved by independently normalizing with respect to
the statistics of disentangled sub-populations. We refer to our proposed soft
piecewise version of BN as Mixture Normalization (MN). Through extensive set of
experiments on CIFAR-10 and CIFAR-100, we show that MN not only effectively
accelerates training image classification and Generative Adversarial networks,
but also reaches higher quality models
On the Effects of Batch and Weight Normalization in Generative Adversarial Networks
Generative adversarial networks (GANs) are highly effective unsupervised
learning frameworks that can generate very sharp data, even for data such as
images with complex, highly multimodal distributions. However GANs are known to
be very hard to train, suffering from problems such as mode collapse and
disturbing visual artifacts. Batch normalization (BN) techniques have been
introduced to address the training. Though BN accelerates the training in the
beginning, our experiments show that the use of BN can be unstable and
negatively impact the quality of the trained model. The evaluation of BN and
numerous other recent schemes for improving GAN training is hindered by the
lack of an effective objective quality measure for GAN models. To address these
issues, we first introduce a weight normalization (WN) approach for GAN
training that significantly improves the stability, efficiency and the quality
of the generated samples. To allow a methodical evaluation, we introduce
squared Euclidean reconstruction error on a test set as a new objective
measure, to assess training performance in terms of speed, stability, and
quality of generated samples. Our experiments with a standard DCGAN
architecture on commonly used datasets (CelebA, LSUN bedroom, and CIFAR-10)
indicate that training using WN is generally superior to BN for GANs, achieving
10% lower mean squared loss for reconstruction and significantly better
qualitative results than BN. We further demonstrate the stability of WN on a
21-layer ResNet trained with the CelebA data set. The code for this paper is
available at https://github.com/stormraiser/gan-weightnorm-resnetComment: v3 rejected by NIPS 2017, updated and re-submitted to CVPR 2018. v4:
add experiments with ResNet and like to new cod
Multi-Generator Generative Adversarial Nets
We propose a new approach to train the Generative Adversarial Nets (GANs)
with a mixture of generators to overcome the mode collapsing problem. The main
intuition is to employ multiple generators, instead of using a single one as in
the original GAN. The idea is simple, yet proven to be extremely effective at
covering diverse data modes, easily overcoming the mode collapse and delivering
state-of-the-art results. A minimax formulation is able to establish among a
classifier, a discriminator, and a set of generators in a similar spirit with
GAN. Generators create samples that are intended to come from the same
distribution as the training data, whilst the discriminator determines whether
samples are true data or generated by generators, and the classifier specifies
which generator a sample comes from. The distinguishing feature is that
internal samples are created from multiple generators, and then one of them
will be randomly selected as final output similar to the mechanism of a
probabilistic mixture model. We term our method Mixture GAN (MGAN). We develop
theoretical analysis to prove that, at the equilibrium, the Jensen-Shannon
divergence (JSD) between the mixture of generators' distributions and the
empirical data distribution is minimal, whilst the JSD among generators'
distributions is maximal, hence effectively avoiding the mode collapse. By
utilizing parameter sharing, our proposed model adds minimal computational cost
to the standard GAN, and thus can also efficiently scale to large-scale
datasets. We conduct extensive experiments on synthetic 2D data and natural
image databases (CIFAR-10, STL-10 and ImageNet) to demonstrate the superior
performance of our MGAN in achieving state-of-the-art Inception scores over
latest baselines, generating diverse and appealing recognizable objects at
different resolutions, and specializing in capturing different types of objects
by generators
Comparison of Batch Normalization and Weight Normalization Algorithms for the Large-scale Image Classification
Batch normalization (BN) has become a de facto standard for training deep
convolutional networks. However, BN accounts for a significant fraction of
training run-time and is difficult to accelerate, since it is a
memory-bandwidth bounded operation. Such a drawback of BN motivates us to
explore recently proposed weight normalization algorithms (WN algorithms), i.e.
weight normalization, normalization propagation and weight normalization with
translated ReLU. These algorithms don't slow-down training iterations and were
experimentally shown to outperform BN on relatively small networks and
datasets. However, it is not clear if these algorithms could replace BN in
practical, large-scale applications. We answer this question by providing a
detailed comparison of BN and WN algorithms using ResNet-50 network trained on
ImageNet. We found that although WN achieves better training accuracy, the
final test accuracy is significantly lower () than that of BN.
This result demonstrates the surprising strength of the BN regularization
effect which we were unable to compensate for using standard regularization
techniques like dropout and weight decay. We also found that training of deep
networks with WN algorithms is significantly less stable compared to BN,
limiting their practical applications
Large Scale GAN Training for High Fidelity Natural Image Synthesis
Despite recent progress in generative image modeling, successfully generating
high-resolution, diverse samples from complex datasets such as ImageNet remains
an elusive goal. To this end, we train Generative Adversarial Networks at the
largest scale yet attempted, and study the instabilities specific to such
scale. We find that applying orthogonal regularization to the generator renders
it amenable to a simple "truncation trick," allowing fine control over the
trade-off between sample fidelity and variety by reducing the variance of the
Generator's input. Our modifications lead to models which set the new state of
the art in class-conditional image synthesis. When trained on ImageNet at
128x128 resolution, our models (BigGANs) achieve an Inception Score (IS) of
166.5 and Frechet Inception Distance (FID) of 7.4, improving over the previous
best IS of 52.52 and FID of 18.6
Twin-GAN -- Unpaired Cross-Domain Image Translation with Weight-Sharing GANs
We present a framework for translating unlabeled images from one domain into
analog images in another domain. We employ a progressively growing
skip-connected encoder-generator structure and train it with a GAN loss for
realistic output, a cycle consistency loss for maintaining same-domain
translation identity, and a semantic consistency loss that encourages the
network to keep the input semantic features in the output. We apply our
framework on the task of translating face images, and show that it is capable
of learning semantic mappings for face images with no supervised one-to-one
image mapping
Towards Photographic Image Manipulation with Balanced Growing of Generative Autoencoders
We present a generative autoencoder that provides fast encoding, faithful
reconstructions (eg. retaining the identity of a face), sharp
generated/reconstructed samples in high resolutions, and a well-structured
latent space that supports semantic manipulation of the inputs. There are no
current autoencoder or GAN models that satisfactorily achieve all of these. We
build on the progressively growing autoencoder model PIONEER, for which we
completely alter the training dynamics based on a careful analysis of recently
introduced normalization schemes. We show significantly improved visual and
quantitative results for face identity conservation in CelebAHQ. Our model
achieves state-of-the-art disentanglement of latent space, both quantitatively
and via realistic image attribute manipulations. On the LSUN Bedrooms dataset,
we improve the disentanglement performance of the vanilla PIONEER, despite
having a simpler model. Overall, our results indicate that the PIONEER networks
provide a way towards photorealistic face manipulation.Comment: WACV 202
Conditional Generative Refinement Adversarial Networks for Unbalanced Medical Image Semantic Segmentation
We propose a new generative adversarial architecture to mitigate imbalance
data problem in medical image semantic segmentation where the majority of
pixels belongs to a healthy region and few belong to lesion or non-health
region. A model trained with imbalanced data tends to bias toward healthy data
which is not desired in clinical applications and predicted outputs by these
networks have high precision and low sensitivity. We propose a new conditional
generative refinement network with three components: a generative, a
discriminative, and a refinement network to mitigate unbalanced data problem
through ensemble learning. The generative network learns to a segment at the
pixel level by getting feedback from the discriminative network according to
the true positive and true negative maps. On the other hand, the refinement
network learns to predict the false positive and the false negative masks
produced by the generative network that has significant value, especially in
medical application. The final semantic segmentation masks are then composed by
the output of the three networks. The proposed architecture shows
state-of-the-art results on LiTS-2017 for liver lesion segmentation, and two
microscopic cell segmentation datasets MDA231, PhC-HeLa. We have achieved
competitive results on BraTS-2017 for brain tumour segmentation
Retinal Vessel Segmentation under Extreme Low Annotation: A Generative Adversarial Network Approach
Contemporary deep learning based medical image segmentation algorithms
require hours of annotation labor by domain experts. These data hungry deep
models perform sub-optimally in the presence of limited amount of labeled data.
In this paper, we present a data efficient learning framework using the recent
concept of Generative Adversarial Networks; this allows a deep neural network
to perform significantly better than its fully supervised counterpart in low
annotation regime. The proposed method is an extension of our previous work
with the addition of a new unsupervised adversarial loss and a structured
prediction based architecture. To the best of our knowledge, this work is the
first demonstration of an adversarial framework based structured prediction
model for medical image segmentation. Though generic, we apply our method for
segmentation of blood vessels in retinal fundus images. We experiment with
extreme low annotation budget (0.8 - 1.6% of contemporary annotation size). On
DRIVE and STARE datasets, the proposed method outperforms our previous method
and other fully supervised benchmark models by significant margins especially
with very low number of annotated examples. In addition, our systematic
ablation studies suggest some key recipes for successfully training GAN based
semi-supervised algorithms with an encoder-decoder style network architecture.Comment: * First 3 authors contributed equall
- β¦