697 research outputs found
A jamming transition from under- to over-parametrization affects loss landscape and generalization
We argue that in fully-connected networks a phase transition delimits the
over- and under-parametrized regimes where fitting can or cannot be achieved.
Under some general conditions, we show that this transition is sharp for the
hinge loss. In the whole over-parametrized regime, poor minima of the loss are
not encountered during training since the number of constraints to satisfy is
too small to hamper minimization. Our findings support a link between this
transition and the generalization properties of the network: as we increase the
number of parameters of a given model, starting from an under-parametrized
network, we observe that the generalization error displays three phases: (i)
initial decay, (ii) increase until the transition point --- where it displays a
cusp --- and (iii) slow decay toward a constant for the rest of the
over-parametrized regime. Thereby we identify the region where the classical
phenomenon of over-fitting takes place, and the region where the model keeps
improving, in line with previous empirical observations for modern neural
networks.Comment: arXiv admin note: text overlap with arXiv:1809.0934
Neural networks: from the perceptron to deep nets
Artificial networks have been studied through the prism of statistical
mechanics as disordered systems since the 80s, starting from the simple models
of Hopfield's associative memory and the single-neuron perceptron classifier.
Assuming data is generated by a teacher model, asymptotic generalisation
predictions were originally derived using the replica method and the online
learning dynamics has been described in the large system limit. In this
chapter, we review the key original ideas of this literature along with their
heritage in the ongoing quest to understand the efficiency of modern deep
learning algorithms. One goal of current and future research is to characterize
the bias of the learning algorithms toward well-generalising minima in a
complex overparametrized loss landscapes with many solutions perfectly
interpolating the training data. Works on perceptrons, two-layer committee
machines and kernel-like learning machines shed light on these benefits of
overparametrization. Another goal is to understand the advantage of depth while
models now commonly feature tens or hundreds of layers. If replica computations
apparently fall short in describing general deep neural networks learning,
studies of simplified linear or untrained models, as well as the derivation of
scaling laws provide the first elements of answers.Comment: Contribution to the book Spin Glass Theory and Far Beyond: Replica
Symmetry Breaking after 40 Years; Chap. 2
Higher order corrections to the effective potential close to the jamming transition in the perceptron model
We analyze the perceptron model performing a Plefka-like expansion of the
free energy. This model falls in the same universality class as hard spheres
near jamming, allowing to get exact predictions in high dimensions for more
complex systems. Our method enables to define an effective potential (or TAP
free energy), namely a coarse-grained functional depending on the contact
forces and the effective gaps between the particles. The derivation is
performed up to the third order, with a particular emphasis on the role of
third order corrections to the TAP free energy. These corrections, irrelevant
in a mean-field framework in the thermodynamic limit, might instead play a
fundamental role when considering finite-size effects. We also study the
typical behavior of the forces and we show that two kinds of corrections can
occur. The first contribution arises since the system is analyzed at a finite
distance from jamming, while the second one is due to finite-size corrections.
In our analysis, third order contributions vanish in the jamming limit, both
for the potential and the generalized forces, in agreement with the argument
proposed by Wyart and coworkers invoking isostaticity. Finally, we analyze the
scalings emerging close to the jamming line, which define a crossover regime
connecting the control parameters of the model to an effective temperature.Comment: 14 pages, 4 figure
- …