600 research outputs found
Highly over-parameterized classifiers generalize since bad solutions are rare
We study the generalization of over-parameterized classifiers where Empirical
Risk Minimization (ERM) for learning leads to zero training error. In these
over-parameterized settings there are many global minima with zero training
error, some of which generalize better than others. We show that under certain
conditions the fraction of "bad" global minima with a true error larger than
{\epsilon} decays to zero exponentially fast with the number of training data
n. The bound depends on the distribution of the true error over the set of
classifier functions used for the given classification problem, and does not
necessarily depend on the size or complexity (e.g. the number of parameters) of
the classifier function set. This might explain the unexpectedly good
generalization even of highly over-parameterized Neural Networks. We support
our mathematical framework with experiments on a synthetic data set and a
subset of MNIST
- …