6 research outputs found
Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks
Recent works have cast some light on the mystery of why deep nets fit any
data and generalize despite being very overparametrized. This paper analyzes
training and generalization for a simple 2-layer ReLU net with random
initialization, and provides the following improvements over recent works:
(i) Using a tighter characterization of training speed than recent papers, an
explanation for why training a neural net with random labels leads to slower
training, as originally observed in [Zhang et al. ICLR'17].
(ii) Generalization bound independent of network size, using a data-dependent
complexity measure. Our measure distinguishes clearly between random labels and
true labels on MNIST and CIFAR, as shown by experiments. Moreover, recent
papers require sample complexity to increase (slowly) with the size, while our
sample complexity is completely independent of the network size.
(iii) Learnability of a broad class of smooth functions by 2-layer ReLU nets
trained via gradient descent.
The key idea is to track dynamics of training and generalization via
properties of a related kernel.Comment: In ICML 201