14,344 research outputs found
Generalization Error in Deep Learning
Deep learning models have lately shown great performance in various fields
such as computer vision, speech recognition, speech translation, and natural
language processing. However, alongside their state-of-the-art performance, it
is still generally unclear what is the source of their generalization ability.
Thus, an important question is what makes deep neural networks able to
generalize well from the training set to new data. In this article, we provide
an overview of the existing theory and bounds for the characterization of the
generalization error of deep neural networks, combining both classical and more
recent theoretical and empirical results
Generalization Error Bounds of Gradient Descent for Learning Over-parameterized Deep ReLU Networks
Empirical studies show that gradient-based methods can learn deep neural
networks (DNNs) with very good generalization performance in the
over-parameterization regime, where DNNs can easily fit a random labeling of
the training data. Very recently, a line of work explains in theory that with
over-parameterization and proper random initialization, gradient-based methods
can find the global minima of the training loss for DNNs. However, existing
generalization error bounds are unable to explain the good generalization
performance of over-parameterized DNNs. The major limitation of most existing
generalization bounds is that they are based on uniform convergence and are
independent of the training algorithm. In this work, we derive an
algorithm-dependent generalization error bound for deep ReLU networks, and show
that under certain assumptions on the data distribution, gradient descent (GD)
with proper random initialization is able to train a sufficiently
over-parameterized DNN to achieve arbitrarily small generalization error. Our
work sheds light on explaining the good generalization performance of
over-parameterized deep neural networks.Comment: 27 pages. This version simplifies the proof and improves the
presentation in Version 3. In AAAI 202
Generalization and Equilibrium in Generative Adversarial Nets (GANs)
We show that training of generative adversarial network (GAN) may not have
good generalization properties; e.g., training may appear successful but the
trained distribution may be far from target distribution in standard metrics.
However, generalization does occur for a weaker metric called neural net
distance. It is also shown that an approximate pure equilibrium exists in the
discriminator/generator game for a special class of generators with natural
training objectives when generator capacity and training set sizes are
moderate.
This existence of equilibrium inspires MIX+GAN protocol, which can be
combined with any existing GAN training, and empirically shown to improve some
of them.Comment: This is an updated version of an ICML'17 paper with the same title.
The main difference is that in the ICML'17 version the pure equilibrium
result was only proved for Wasserstein GAN. In the current version the result
applies to most reasonable training objectives. In particular, Theorem 4.3
now applies to both original GAN and Wasserstein GA
- …