Search CORE

1,794 research outputs found

Stronger generalization bounds for deep nets via a compression approach

Author: Arora Sanjeev
Ge Rong
Neyshabur Behnam
Zhang Yi
Publication venue
Publication date: 01/01/2018
Field of study

Deep nets generalize well despite having more parameters than the number of training samples. Recent works try to give an explanation using PAC-Bayes and Margin-based analyses, but do not as yet result in sample complexity bounds better than naive parameter counting. The current paper shows generalization bounds that're orders of magnitude better in practice. These rely upon new succinct reparametrizations of the trained net --- a compression that is explicit and efficient. These yield generalization bounds via a simple compression-based framework introduced here. Our results also provide some theoretical justification for widespread empirical success in compressing deep nets. Analysis of correctness of our compression relies upon some newly identified \textquotedblleft noise stability\textquotedblright properties of trained deep nets, which are also experimentally verified. The study of these properties and resulting generalization bounds are also extended to convolutional nets, which had eluded earlier attempts on proving generalization

arXiv.org e-Print Archive

Princeton University Open Access Repository

Generalization Error in Deep Learning

Author: D McAllester
D Vainsencher
DA McAllester
Daniel Jakubovitz
Huan Xu
J Bruna
J Sokolic
K Schnass
M Anthony
N Akhtar
PL Bartlett
PL Bartlett
R Gribonval
R Gribonval
S Shalev-Shwartz
SJ Pan
TM Cover
V Papyan
Publication venue
Publication date: 06/04/2019
Field of study

Deep learning models have lately shown great performance in various fields such as computer vision, speech recognition, speech translation, and natural language processing. However, alongside their state-of-the-art performance, it is still generally unclear what is the source of their generalization ability. Thus, an important question is what makes deep neural networks able to generalize well from the training set to new data. In this article, we provide an overview of the existing theory and bounds for the characterization of the generalization error of deep neural networks, combining both classical and more recent theoretical and empirical results

arXiv.org e-Print Archive

Crossref

UCL Discovery

Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks

Author: Arora Sanjeev
Du Simon S.
Hu Wei
Li Zhiyuan
Wang Ruosong
Publication venue
Publication date: 01/01/2019
Field of study

Recent works have cast some light on the mystery of why deep nets fit any data and generalize despite being very overparametrized. This paper analyzes training and generalization for a simple 2-layer ReLU net with random initialization, and provides the following improvements over recent works: (i) Using a tighter characterization of training speed than recent papers, an explanation for why training a neural net with random labels leads to slower training, as originally observed in [Zhang et al. ICLR'17]. (ii) Generalization bound independent of network size, using a data-dependent complexity measure. Our measure distinguishes clearly between random labels and true labels on MNIST and CIFAR, as shown by experiments. Moreover, recent papers require sample complexity to increase (slowly) with the size, while our sample complexity is completely independent of the network size. (iii) Learnability of a broad class of smooth functions by 2-layer ReLU nets trained via gradient descent. The key idea is to track dynamics of training and generalization via properties of a related kernel.Comment: In ICML 201

arXiv.org e-Print Archive

Princeton University Open Access Repository