14 research outputs found

    The Random-Diluted Triangular Plaquette Model: study of phase transitions in a Kinetically Constrained Model

    Full text link
    We study how the thermodynamic properties of the Triangular Plaquette Model (TPM) are influenced by the addition of extra interactions. The thermodynamics of the original TPM is trivial, while its dynamics is glassy, as usual in Kinetically Constrained Models. As soon as we generalize the model to include additional interactions, a thermodynamic phase transition appears in the system. The additional interactions we consider are either short ranged, forming a regular lattice in the plane, or long ranged of the small-world kind. In the case of long-range interactions we call the new model Random-Diluted TPM. We provide arguments that the model so modified should undergo a thermodynamic phase transition, and that in the long-range case this is a glass transition of the "Random First-Order" kind. Finally, we give support to our conjectures studying the finite temperature phase diagram of the Random-Diluted TPM in the Bethe approximation. This corresponds to the exact calculation on the random regular graph, where free-energy and configurational entropy can be computed by means of the cavity equations.Comment: 20 pages, 7 figures; final version to appear on PR

    Asymptotic learning curves of kernel methods: empirical data v.s. Teacher-Student paradigm

    Full text link
    How many training data are needed to learn a supervised task? It is often observed that the generalization error decreases as nβn^{-\beta} where nn is the number of training examples and β\beta an exponent that depends on both data and algorithm. In this work we measure β\beta when applying kernel methods to real datasets. For MNIST we find β0.4\beta\approx 0.4 and for CIFAR10 β0.1\beta\approx 0.1, for both regression and classification tasks, and for Gaussian or Laplace kernels. To rationalize the existence of non-trivial exponents that can be independent of the specific kernel used, we study the Teacher-Student framework for kernels. In this scheme, a Teacher generates data according to a Gaussian random field, and a Student learns them via kernel regression. With a simplifying assumption -- namely that the data are sampled from a regular lattice -- we derive analytically β\beta for translation invariant kernels, using previous results from the kriging literature. Provided that the Student is not too sensitive to high frequencies, β\beta depends only on the smoothness and dimension of the training data. We confirm numerically that these predictions hold when the training points are sampled at random on a hypersphere. Overall, the test error is found to be controlled by the magnitude of the projection of the true function on the kernel eigenvectors whose rank is larger than nn. Using this idea we predict relate the exponent β\beta to an exponent aa describing how the coefficients of the true function in the eigenbasis of the kernel decay with rank. We extract aa from real data by performing kernel PCA, leading to β0.36\beta\approx0.36 for MNIST and β0.07\beta\approx0.07 for CIFAR10, in good agreement with observations. We argue that these rather large exponents are possible due to the small effective dimension of the data.Comment: We added (i) the prediction of the exponent β\beta for real data using kernel PCA; (ii) the generalization of our results to non-Gaussian data from reference [11] (Bordelon et al., "Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks"

    A jamming transition from under- to over-parametrization affects loss landscape and generalization

    Full text link
    We argue that in fully-connected networks a phase transition delimits the over- and under-parametrized regimes where fitting can or cannot be achieved. Under some general conditions, we show that this transition is sharp for the hinge loss. In the whole over-parametrized regime, poor minima of the loss are not encountered during training since the number of constraints to satisfy is too small to hamper minimization. Our findings support a link between this transition and the generalization properties of the network: as we increase the number of parameters of a given model, starting from an under-parametrized network, we observe that the generalization error displays three phases: (i) initial decay, (ii) increase until the transition point --- where it displays a cusp --- and (iii) slow decay toward a constant for the rest of the over-parametrized regime. Thereby we identify the region where the classical phenomenon of over-fitting takes place, and the region where the model keeps improving, in line with previous empirical observations for modern neural networks.Comment: arXiv admin note: text overlap with arXiv:1809.0934

    Disentangling feature and lazy training in deep neural networks

    Full text link
    Two distinct limits for deep learning have been derived as the network width hh\rightarrow \infty, depending on how the weights of the last layer scale with hh. In the Neural Tangent Kernel (NTK) limit, the dynamics becomes linear in the weights and is described by a frozen kernel Θ\Theta. By contrast, in the Mean-Field limit, the dynamics can be expressed in terms of the distribution of the parameters associated with a neuron, that follows a partial differential equation. In this work we consider deep networks where the weights in the last layer scale as αh1/2\alpha h^{-1/2} at initialization. By varying α\alpha and hh, we probe the crossover between the two limits. We observe the previously identified regimes of lazy training and feature training. In the lazy-training regime, the dynamics is almost linear and the NTK barely changes after initialization. The feature-training regime includes the mean-field formulation as a limiting case and is characterized by a kernel that evolves in time, and learns some features. We perform numerical experiments on MNIST, Fashion-MNIST, EMNIST and CIFAR10 and consider various architectures. We find that (i) The two regimes are separated by an α\alpha^* that scales as h1/2h^{-1/2}. (ii) Network architecture and data structure play an important role in determining which regime is better: in our tests, fully-connected networks perform generally better in the lazy-training regime, unlike convolutional networks. (iii) In both regimes, the fluctuations δF\delta F induced on the learned function by initial conditions decay as δF1/h\delta F\sim 1/\sqrt{h}, leading to a performance that increases with hh. The same improvement can also be obtained at an intermediate width by ensemble-averaging several networks. (iv) In the feature-training regime we identify a time scale t1hαt_1\sim\sqrt{h}\alpha, such that for tt1t\ll t_1 the dynamics is linear.Comment: minor revision

    Avalanches in glassy systems

    No full text
    Beaucoup de systèmes qui ont un certain degré de désordre ont des similaritésdans leur structure: le paysage énergétique est aléatoire et il a plusieursminima locaux de l’énergie. Quand on ajoute une petite perturbation externeau système à basse temprature, il est raisonnable d’attendre que la dynamiqueconduira le système d’un minimum à l’autre, et ça donne lieu à une réponsealéatoire et saccadé. Les sautes discontinus que l’on observe sont appelésavalanches, et l’intérêt de ce travail est le calcul de leur distribution. Undes résultats est en effet le développement d’un cadre pour calculer cettedistribution dans des systèmes en dimension infinie qui peuvent être décritsavec le replica symmetry breaking. Nous appliquons les résultats à l’un desmodèles les plus simples des verres structuraux, c’est à dire les empilementsdenses de sphères molles avec répulsion harmonique, avec une déformation(shear strain) du volume comme perturbation. Nous soutenons que, quandla déformation est suffisamment petite, une portion de la distribution desavalanches devient une loi de puissance, dont l’exposant peut être directementlié au paramètre d’ordre de la brisure de symétrie de replica. Cet exposant estégalement lié à la distribution des forces de contact (au moins entre certainessphères), dont le comportement asymptotique on sais que ne dpends pasfortement de la dimension spatiale; pour cette raison nous comparons lesprdictions de champ moyen en dimension infinie avec des simulation du mêmesystème en dimension trois et, remarquablement, on trouve un bon accord.Dans le reste de la thèse nous discutons aussi les similarités avec des travauxprécédents et quelques consquences que la distribution des avalanches donnesur les propriétés élastiques de la matière granulaire dense.Many systems that are somehow characterized by a degree of disorder sharea similar structure: the energy landscape has many sample-dependent localenergy minima. When a small external perturbation is applied to the systemat low temperature, it is reasonable to expect that the dynamics will leadthe system from a minimum to another, thus displaying a random and jerkyresponse. The discontinuous jumps that one observes are called avalanches,and the focus of this work is the computation of their distribution. Oneof the results is indeed the development of a framework that allows thecomputation of this distribution in infinite-dimensional systems that canbe described within a replica symmetry breaking ansatz. We apply theresults to one of the simplest models of structural glasses, namely densepackings of (harmonic) soft spheres, either at jamming or at larger densities,subject to a shear transformation that induces jumps both in the totalenergy and in the shear stress of the system. We argue that, when theshear strain is small enough, the avalanche distribution develops a power-lawbehavior, whose exponent can be directly related to the functional orderparameter of the replica symmetry breaking solution. This exponent is alsorelated to the distribution of contact forces (or at least of the contact forcesbetween some of the spheres), whose asymptotic behavior is known not todepend strongly on the spatial dimension; for this reason, we compare theinfinite-dimensional prediction with three dimensional simulations of thesame systems and, remarkably, we find a good agreement. In the rest of thethesis we compare our results with previous works, and we also discuss someof the consequences that the avalanche distribution lead to, concerning thestatistical elastic properties of dense granular media

    Les avalanches dans les systèmes vitreux

    Get PDF
    Many systems that are somehow characterized by a degree of disorder sharea similar structure: the energy landscape has many sample-dependent localenergy minima. When a small external perturbation is applied to the systemat low temperature, it is reasonable to expect that the dynamics will leadthe system from a minimum to another, thus displaying a random and jerkyresponse. The discontinuous jumps that one observes are called avalanches,and the focus of this work is the computation of their distribution. Oneof the results is indeed the development of a framework that allows thecomputation of this distribution in infinite-dimensional systems that canbe described within a replica symmetry breaking ansatz. We apply theresults to one of the simplest models of structural glasses, namely densepackings of (harmonic) soft spheres, either at jamming or at larger densities,subject to a shear transformation that induces jumps both in the totalenergy and in the shear stress of the system. We argue that, when theshear strain is small enough, the avalanche distribution develops a power-lawbehavior, whose exponent can be directly related to the functional orderparameter of the replica symmetry breaking solution. This exponent is alsorelated to the distribution of contact forces (or at least of the contact forcesbetween some of the spheres), whose asymptotic behavior is known not todepend strongly on the spatial dimension; for this reason, we compare theinfinite-dimensional prediction with three dimensional simulations of thesame systems and, remarkably, we find a good agreement. In the rest of thethesis we compare our results with previous works, and we also discuss someof the consequences that the avalanche distribution lead to, concerning thestatistical elastic properties of dense granular media.Beaucoup de systèmes qui ont un certain degré de désordre ont des similaritésdans leur structure: le paysage énergétique est aléatoire et il a plusieursminima locaux de l’énergie. Quand on ajoute une petite perturbation externeau système à basse temprature, il est raisonnable d’attendre que la dynamiqueconduira le système d’un minimum à l’autre, et ça donne lieu à une réponsealéatoire et saccadé. Les sautes discontinus que l’on observe sont appelésavalanches, et l’intérêt de ce travail est le calcul de leur distribution. Undes résultats est en effet le développement d’un cadre pour calculer cettedistribution dans des systèmes en dimension infinie qui peuvent être décritsavec le replica symmetry breaking. Nous appliquons les résultats à l’un desmodèles les plus simples des verres structuraux, c’est à dire les empilementsdenses de sphères molles avec répulsion harmonique, avec une déformation(shear strain) du volume comme perturbation. Nous soutenons que, quandla déformation est suffisamment petite, une portion de la distribution desavalanches devient une loi de puissance, dont l’exposant peut être directementlié au paramètre d’ordre de la brisure de symétrie de replica. Cet exposant estégalement lié à la distribution des forces de contact (au moins entre certainessphères), dont le comportement asymptotique on sais que ne dpends pasfortement de la dimension spatiale; pour cette raison nous comparons lesprdictions de champ moyen en dimension infinie avec des simulation du mêmesystème en dimension trois et, remarquablement, on trouve un bon accord.Dans le reste de la thèse nous discutons aussi les similarités avec des travauxprécédents et quelques consquences que la distribution des avalanches donnesur les propriétés élastiques de la matière granulaire dense

    Mean-field avalanches in jammed spheres

    No full text
    23 pages, 9 figuresInternational audienceDisordered systems are characterized by the existence of many sample- dependent local energy minima, that cause a stepwise response when the system is perturbed. In this article we use an approach based on elementary probabilistic methods to compute the complete probability distribution of the jumps (static avalanches) in the response of mean-field systems described by replica symmetry breaking; we find a precise condition for having a power-law behavior in the distribution of avalanches caused by small perturbations, and we show that our predictions are in remarkable agreement both with previous results and with what is found in simulations of three dimensional systems of soft-spheres, either at jamming or at slightly higher densities

    Comparing Dynamics: Deep Neural Networks versus Glassy Systems

    Get PDF
    We analyze numerically the training dynamics of deep neural networks (DNN) by using methods developed in statistical physics of glassy systems. The two main issues we address are (1) the complexity of the loss landscape and of the dynamics within it, and (2) to what extent DNNs share similarities with glassy systems. Our findings, obtained for different architectures and datasets, suggest that during the training process the dynamics slows down because of an increasingly large number of flat directions. At large times, when the loss is approaching zero, the system diffuses at the bottom of the landscape. Despite some similarities with the dynamics of mean-field glassy systems, in particular, the absence of barrier crossing, we find distinctive dynamical behaviors in the two cases, showing that the statistical properties of the corresponding loss and energy landscapes are different. In contrast, when the network is under-parametrized we observe a typical glassy behavior, thus suggesting the existence of different phases depending on whether the network is under-parametrized or over-parametrized.Comment: 10 pages, 5 figures. Version accepted at ICML 201
    corecore