1,794 research outputs found
Stronger generalization bounds for deep nets via a compression approach
Deep nets generalize well despite having more parameters than the number of
training samples. Recent works try to give an explanation using PAC-Bayes and
Margin-based analyses, but do not as yet result in sample complexity bounds
better than naive parameter counting. The current paper shows generalization
bounds that're orders of magnitude better in practice. These rely upon new
succinct reparametrizations of the trained net --- a compression that is
explicit and efficient. These yield generalization bounds via a simple
compression-based framework introduced here. Our results also provide some
theoretical justification for widespread empirical success in compressing deep
nets. Analysis of correctness of our compression relies upon some newly
identified \textquotedblleft noise stability\textquotedblright properties of
trained deep nets, which are also experimentally verified. The study of these
properties and resulting generalization bounds are also extended to
convolutional nets, which had eluded earlier attempts on proving
generalization
Generalization Error in Deep Learning
Deep learning models have lately shown great performance in various fields
such as computer vision, speech recognition, speech translation, and natural
language processing. However, alongside their state-of-the-art performance, it
is still generally unclear what is the source of their generalization ability.
Thus, an important question is what makes deep neural networks able to
generalize well from the training set to new data. In this article, we provide
an overview of the existing theory and bounds for the characterization of the
generalization error of deep neural networks, combining both classical and more
recent theoretical and empirical results
Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks
Recent works have cast some light on the mystery of why deep nets fit any
data and generalize despite being very overparametrized. This paper analyzes
training and generalization for a simple 2-layer ReLU net with random
initialization, and provides the following improvements over recent works:
(i) Using a tighter characterization of training speed than recent papers, an
explanation for why training a neural net with random labels leads to slower
training, as originally observed in [Zhang et al. ICLR'17].
(ii) Generalization bound independent of network size, using a data-dependent
complexity measure. Our measure distinguishes clearly between random labels and
true labels on MNIST and CIFAR, as shown by experiments. Moreover, recent
papers require sample complexity to increase (slowly) with the size, while our
sample complexity is completely independent of the network size.
(iii) Learnability of a broad class of smooth functions by 2-layer ReLU nets
trained via gradient descent.
The key idea is to track dynamics of training and generalization via
properties of a related kernel.Comment: In ICML 201
- …