Search CORE

14 research outputs found

The Random-Diluted Triangular Plaquette Model: study of phase transitions in a Kinetically Constrained Model

Author: Franz Silvio
Gradenigo Giacomo
Spigler Stefano
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2016
Field of study

We study how the thermodynamic properties of the Triangular Plaquette Model (TPM) are influenced by the addition of extra interactions. The thermodynamics of the original TPM is trivial, while its dynamics is glassy, as usual in Kinetically Constrained Models. As soon as we generalize the model to include additional interactions, a thermodynamic phase transition appears in the system. The additional interactions we consider are either short ranged, forming a regular lattice in the plane, or long ranged of the small-world kind. In the case of long-range interactions we call the new model Random-Diluted TPM. We provide arguments that the model so modified should undergo a thermodynamic phase transition, and that in the long-range case this is a glass transition of the "Random First-Order" kind. Finally, we give support to our conjectures studying the finite temperature phase diagram of the Random-Diluted TPM in the Bethe approximation. This corresponds to the exact calculation on the random regular graph, where free-energy and configurational entropy can be computed by means of the cavity equations.Comment: 20 pages, 7 figures; final version to appear on PR

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

HAL-CEA

Asymptotic learning curves of kernel methods: empirical data v.s. Teacher-Student paradigm

Author: Geiger Mario
Spigler Stefano
Wyart Matthieu
Publication venue: 'IOP Publishing'
Publication date: 18/08/2020
Field of study

How many training data are needed to learn a supervised task? It is often observed that the generalization error decreases as

n^{-\beta}

where

n

is the number of training examples and

\beta

an exponent that depends on both data and algorithm. In this work we measure

\beta

when applying kernel methods to real datasets. For MNIST we find

\beta\approx 0.4

and for CIFAR10

\beta\approx 0.1

, for both regression and classification tasks, and for Gaussian or Laplace kernels. To rationalize the existence of non-trivial exponents that can be independent of the specific kernel used, we study the Teacher-Student framework for kernels. In this scheme, a Teacher generates data according to a Gaussian random field, and a Student learns them via kernel regression. With a simplifying assumption -- namely that the data are sampled from a regular lattice -- we derive analytically

\beta

for translation invariant kernels, using previous results from the kriging literature. Provided that the Student is not too sensitive to high frequencies,

\beta

depends only on the smoothness and dimension of the training data. We confirm numerically that these predictions hold when the training points are sampled at random on a hypersphere. Overall, the test error is found to be controlled by the magnitude of the projection of the true function on the kernel eigenvectors whose rank is larger than

n

. Using this idea we predict relate the exponent

\beta

to an exponent

a

describing how the coefficients of the true function in the eigenbasis of the kernel decay with rank. We extract

a

from real data by performing kernel PCA, leading to

\beta\approx0.36

for MNIST and

\beta\approx0.07

for CIFAR10, in good agreement with observations. We argue that these rather large exponents are possible due to the small effective dimension of the data.Comment: We added (i) the prediction of the exponent

\beta

for real data using kernel PCA; (ii) the generalization of our results to non-Gaussian data from reference [11] (Bordelon et al., "Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks"

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

A jamming transition from under- to over-parametrization affects loss landscape and generalization

Author: Biroli Giulio
d'Ascoli Stéphane
Geiger Mario
Sagun Levent
Spigler Stefano
Wyart Matthieu
Publication venue: 'IOP Publishing'
Publication date: 18/06/2019
Field of study

We argue that in fully-connected networks a phase transition delimits the over- and under-parametrized regimes where fitting can or cannot be achieved. Under some general conditions, we show that this transition is sharp for the hinge loss. In the whole over-parametrized regime, poor minima of the loss are not encountered during training since the number of constraints to satisfy is too small to hamper minimization. Our findings support a link between this transition and the generalization properties of the network: as we increase the number of parameters of a given model, starting from an under-parametrized network, we observe that the generalization error displays three phases: (i) initial decay, (ii) increase until the transition point --- where it displays a cusp --- and (iii) slow decay toward a constant for the rest of the over-parametrized regime. Thereby we identify the region where the classical phenomenon of over-fitting takes place, and the region where the model keeps improving, in line with previous empirical observations for modern neural networks.Comment: arXiv admin note: text overlap with arXiv:1809.0934

arXiv.org e-Print Archive

Hal-Diderot

Disentangling feature and lazy training in deep neural networks

Author: Geiger Mario
Jacot Arthur
Spigler Stefano
Wyart Matthieu
Publication venue: 'IOP Publishing'
Publication date: 04/10/2020
Field of study

Two distinct limits for deep learning have been derived as the network width

h\rightarrow \infty

, depending on how the weights of the last layer scale with

h

. In the Neural Tangent Kernel (NTK) limit, the dynamics becomes linear in the weights and is described by a frozen kernel

\Theta

. By contrast, in the Mean-Field limit, the dynamics can be expressed in terms of the distribution of the parameters associated with a neuron, that follows a partial differential equation. In this work we consider deep networks where the weights in the last layer scale as

\alpha h^{-1/2}

at initialization. By varying

\alpha

and

h

, we probe the crossover between the two limits. We observe the previously identified regimes of lazy training and feature training. In the lazy-training regime, the dynamics is almost linear and the NTK barely changes after initialization. The feature-training regime includes the mean-field formulation as a limiting case and is characterized by a kernel that evolves in time, and learns some features. We perform numerical experiments on MNIST, Fashion-MNIST, EMNIST and CIFAR10 and consider various architectures. We find that (i) The two regimes are separated by an

\alpha^*

that scales as

h^{-1/2}

. (ii) Network architecture and data structure play an important role in determining which regime is better: in our tests, fully-connected networks perform generally better in the lazy-training regime, unlike convolutional networks. (iii) In both regimes, the fluctuations

\delta F

induced on the learned function by initial conditions decay as

\delta F\sim 1/\sqrt{h}

, leading to a performance that increases with

h

. The same improvement can also be obtained at an intermediate width by ensemble-averaging several networks. (iv) In the feature-training regime we identify a time scale

t_1\sim\sqrt{h}\alpha

, such that for

t\ll t_1

the dynamics is linear.Comment: minor revision

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Avalanches in glassy systems

Author: Spigler Stefano
Publication venue
Publication date: 25/09/2017
Field of study

Beaucoup de systèmes qui ont un certain degré de désordre ont des similaritésdans leur structure: le paysage énergétique est aléatoire et il a plusieursminima locaux de l’énergie. Quand on ajoute une petite perturbation externeau système à basse temprature, il est raisonnable d’attendre que la dynamiqueconduira le système d’un minimum à l’autre, et ça donne lieu à une réponsealéatoire et saccadé. Les sautes discontinus que l’on observe sont appelésavalanches, et l’intérêt de ce travail est le calcul de leur distribution. Undes résultats est en effet le développement d’un cadre pour calculer cettedistribution dans des systèmes en dimension infinie qui peuvent être décritsavec le replica symmetry breaking. Nous appliquons les résultats à l’un desmodèles les plus simples des verres structuraux, c’est à dire les empilementsdenses de sphères molles avec répulsion harmonique, avec une déformation(shear strain) du volume comme perturbation. Nous soutenons que, quandla déformation est suffisamment petite, une portion de la distribution desavalanches devient une loi de puissance, dont l’exposant peut être directementlié au paramètre d’ordre de la brisure de symétrie de replica. Cet exposant estégalement lié à la distribution des forces de contact (au moins entre certainessphères), dont le comportement asymptotique on sais que ne dpends pasfortement de la dimension spatiale; pour cette raison nous comparons lesprdictions de champ moyen en dimension infinie avec des simulation du mêmesystème en dimension trois et, remarquablement, on trouve un bon accord.Dans le reste de la thèse nous discutons aussi les similarités avec des travauxprécédents et quelques consquences que la distribution des avalanches donnesur les propriétés élastiques de la matière granulaire dense.Many systems that are somehow characterized by a degree of disorder sharea similar structure: the energy landscape has many sample-dependent localenergy minima. When a small external perturbation is applied to the systemat low temperature, it is reasonable to expect that the dynamics will leadthe system from a minimum to another, thus displaying a random and jerkyresponse. The discontinuous jumps that one observes are called avalanches,and the focus of this work is the computation of their distribution. Oneof the results is indeed the development of a framework that allows thecomputation of this distribution in infinite-dimensional systems that canbe described within a replica symmetry breaking ansatz. We apply theresults to one of the simplest models of structural glasses, namely densepackings of (harmonic) soft spheres, either at jamming or at larger densities,subject to a shear transformation that induces jumps both in the totalenergy and in the shear stress of the system. We argue that, when theshear strain is small enough, the avalanche distribution develops a power-lawbehavior, whose exponent can be directly related to the functional orderparameter of the replica symmetry breaking solution. This exponent is alsorelated to the distribution of contact forces (or at least of the contact forcesbetween some of the spheres), whose asymptotic behavior is known not todepend strongly on the spatial dimension; for this reason, we compare theinfinite-dimensional prediction with three dimensional simulations of thesame systems and, remarkably, we find a good agreement. In the rest of thethesis we compare our results with previous works, and we also discuss someof the consequences that the avalanche distribution lead to, concerning thestatistical elastic properties of dense granular media

Theses.fr

Les avalanches dans les systèmes vitreux

Author: Spigler Stefano
Publication venue: HAL CCSD
Publication date: 25/09/2017
Field of study

Many systems that are somehow characterized by a degree of disorder sharea similar structure: the energy landscape has many sample-dependent localenergy minima. When a small external perturbation is applied to the systemat low temperature, it is reasonable to expect that the dynamics will leadthe system from a minimum to another, thus displaying a random and jerkyresponse. The discontinuous jumps that one observes are called avalanches,and the focus of this work is the computation of their distribution. Oneof the results is indeed the development of a framework that allows thecomputation of this distribution in infinite-dimensional systems that canbe described within a replica symmetry breaking ansatz. We apply theresults to one of the simplest models of structural glasses, namely densepackings of (harmonic) soft spheres, either at jamming or at larger densities,subject to a shear transformation that induces jumps both in the totalenergy and in the shear stress of the system. We argue that, when theshear strain is small enough, the avalanche distribution develops a power-lawbehavior, whose exponent can be directly related to the functional orderparameter of the replica symmetry breaking solution. This exponent is alsorelated to the distribution of contact forces (or at least of the contact forcesbetween some of the spheres), whose asymptotic behavior is known not todepend strongly on the spatial dimension; for this reason, we compare theinfinite-dimensional prediction with three dimensional simulations of thesame systems and, remarkably, we find a good agreement. In the rest of thethesis we compare our results with previous works, and we also discuss someof the consequences that the avalanche distribution lead to, concerning thestatistical elastic properties of dense granular media.Beaucoup de systèmes qui ont un certain degré de désordre ont des similaritésdans leur structure: le paysage énergétique est aléatoire et il a plusieursminima locaux de l’énergie. Quand on ajoute une petite perturbation externeau système à basse temprature, il est raisonnable d’attendre que la dynamiqueconduira le système d’un minimum à l’autre, et ça donne lieu à une réponsealéatoire et saccadé. Les sautes discontinus que l’on observe sont appelésavalanches, et l’intérêt de ce travail est le calcul de leur distribution. Undes résultats est en effet le développement d’un cadre pour calculer cettedistribution dans des systèmes en dimension infinie qui peuvent être décritsavec le replica symmetry breaking. Nous appliquons les résultats à l’un desmodèles les plus simples des verres structuraux, c’est à dire les empilementsdenses de sphères molles avec répulsion harmonique, avec une déformation(shear strain) du volume comme perturbation. Nous soutenons que, quandla déformation est suffisamment petite, une portion de la distribution desavalanches devient une loi de puissance, dont l’exposant peut être directementlié au paramètre d’ordre de la brisure de symétrie de replica. Cet exposant estégalement lié à la distribution des forces de contact (au moins entre certainessphères), dont le comportement asymptotique on sais que ne dpends pasfortement de la dimension spatiale; pour cette raison nous comparons lesprdictions de champ moyen en dimension infinie avec des simulation du mêmesystème en dimension trois et, remarquablement, on trouve un bon accord.Dans le reste de la thèse nous discutons aussi les similarités avec des travauxprécédents et quelques consquences que la distribution des avalanches donnesur les propriétés élastiques de la matière granulaire dense

Thèses en Ligne

HAL Descartes

Hal-Diderot

Mean-field avalanches in jammed spheres

Author: Franz Silvio
Spigler Stefano
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2017
Field of study

23 pages, 9 figuresInternational audienceDisordered systems are characterized by the existence of many sample- dependent local energy minima, that cause a stepwise response when the system is perturbed. In this article we use an approach based on elementary probabilistic methods to compute the complete probability distribution of the jumps (static avalanches) in the response of mean-field systems described by replica symmetry breaking; we find a precise condition for having a power-law behavior in the distribution of avalanches caused by small perturbations, and we show that our predictions are in remarkable agreement both with previous results and with what is found in simulations of three dimensional systems of soft-spheres, either at jamming or at slightly higher densities

Comparing Dynamics: Deep Neural Networks versus Glassy Systems

Author: Baity-Jesi Marco
Ben Arous Gerard
Biroli Giulio
Cammarota Chiara
Geiger Mario
LeCun Yann
Sagun Levent
Spigler Stefano
Wyart Matthieu
Publication venue
Publication date: 07/06/2018
Field of study

We analyze numerically the training dynamics of deep neural networks (DNN) by using methods developed in statistical physics of glassy systems. The two main issues we address are (1) the complexity of the loss landscape and of the dynamics within it, and (2) to what extent DNNs share similarities with glassy systems. Our findings, obtained for different architectures and datasets, suggest that during the training process the dynamics slows down because of an increasingly large number of flat directions. At large times, when the loss is approaching zero, the system diffuses at the bottom of the landscape. Despite some similarities with the dynamics of mean-field glassy systems, in particular, the absence of barrier crossing, we find distinctive dynamical behaviors in the two cases, showing that the statistical properties of the corresponding loss and energy landscapes are different. In contrast, when the network is under-parametrized we observe a typical glassy behavior, thus suggesting the existence of different phases depending on whether the network is under-parametrized or over-parametrized.Comment: 10 pages, 5 figures. Version accepted at ICML 201

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

King's Research Portal

HAL-CEA

Hal-Diderot

Random-diluted triangular plaquette model: Study of phase transitions in a kinetically constrained model

Author: Giacomo Gradenigo
Juan P. Garrahan
S. Eisinger
Silvio Franz
Stefano Spigler
Publication venue: 'American Physical Society (APS)'
Publication date
Field of study

Crossref