697 research outputs found

    A jamming transition from under- to over-parametrization affects loss landscape and generalization

    Full text link
    We argue that in fully-connected networks a phase transition delimits the over- and under-parametrized regimes where fitting can or cannot be achieved. Under some general conditions, we show that this transition is sharp for the hinge loss. In the whole over-parametrized regime, poor minima of the loss are not encountered during training since the number of constraints to satisfy is too small to hamper minimization. Our findings support a link between this transition and the generalization properties of the network: as we increase the number of parameters of a given model, starting from an under-parametrized network, we observe that the generalization error displays three phases: (i) initial decay, (ii) increase until the transition point --- where it displays a cusp --- and (iii) slow decay toward a constant for the rest of the over-parametrized regime. Thereby we identify the region where the classical phenomenon of over-fitting takes place, and the region where the model keeps improving, in line with previous empirical observations for modern neural networks.Comment: arXiv admin note: text overlap with arXiv:1809.0934

    Neural networks: from the perceptron to deep nets

    Full text link
    Artificial networks have been studied through the prism of statistical mechanics as disordered systems since the 80s, starting from the simple models of Hopfield's associative memory and the single-neuron perceptron classifier. Assuming data is generated by a teacher model, asymptotic generalisation predictions were originally derived using the replica method and the online learning dynamics has been described in the large system limit. In this chapter, we review the key original ideas of this literature along with their heritage in the ongoing quest to understand the efficiency of modern deep learning algorithms. One goal of current and future research is to characterize the bias of the learning algorithms toward well-generalising minima in a complex overparametrized loss landscapes with many solutions perfectly interpolating the training data. Works on perceptrons, two-layer committee machines and kernel-like learning machines shed light on these benefits of overparametrization. Another goal is to understand the advantage of depth while models now commonly feature tens or hundreds of layers. If replica computations apparently fall short in describing general deep neural networks learning, studies of simplified linear or untrained models, as well as the derivation of scaling laws provide the first elements of answers.Comment: Contribution to the book Spin Glass Theory and Far Beyond: Replica Symmetry Breaking after 40 Years; Chap. 2

    Higher order corrections to the effective potential close to the jamming transition in the perceptron model

    Full text link
    We analyze the perceptron model performing a Plefka-like expansion of the free energy. This model falls in the same universality class as hard spheres near jamming, allowing to get exact predictions in high dimensions for more complex systems. Our method enables to define an effective potential (or TAP free energy), namely a coarse-grained functional depending on the contact forces and the effective gaps between the particles. The derivation is performed up to the third order, with a particular emphasis on the role of third order corrections to the TAP free energy. These corrections, irrelevant in a mean-field framework in the thermodynamic limit, might instead play a fundamental role when considering finite-size effects. We also study the typical behavior of the forces and we show that two kinds of corrections can occur. The first contribution arises since the system is analyzed at a finite distance from jamming, while the second one is due to finite-size corrections. In our analysis, third order contributions vanish in the jamming limit, both for the potential and the generalized forces, in agreement with the argument proposed by Wyart and coworkers invoking isostaticity. Finally, we analyze the scalings emerging close to the jamming line, which define a crossover regime connecting the control parameters of the model to an effective temperature.Comment: 14 pages, 4 figure
    corecore