14 research outputs found
The Random-Diluted Triangular Plaquette Model: study of phase transitions in a Kinetically Constrained Model
We study how the thermodynamic properties of the Triangular Plaquette Model
(TPM) are influenced by the addition of extra interactions. The thermodynamics
of the original TPM is trivial, while its dynamics is glassy, as usual in
Kinetically Constrained Models. As soon as we generalize the model to include
additional interactions, a thermodynamic phase transition appears in the
system. The additional interactions we consider are either short ranged,
forming a regular lattice in the plane, or long ranged of the small-world kind.
In the case of long-range interactions we call the new model Random-Diluted
TPM. We provide arguments that the model so modified should undergo a
thermodynamic phase transition, and that in the long-range case this is a glass
transition of the "Random First-Order" kind. Finally, we give support to our
conjectures studying the finite temperature phase diagram of the Random-Diluted
TPM in the Bethe approximation. This corresponds to the exact calculation on
the random regular graph, where free-energy and configurational entropy can be
computed by means of the cavity equations.Comment: 20 pages, 7 figures; final version to appear on PR
Asymptotic learning curves of kernel methods: empirical data v.s. Teacher-Student paradigm
How many training data are needed to learn a supervised task? It is often
observed that the generalization error decreases as where is
the number of training examples and an exponent that depends on both
data and algorithm. In this work we measure when applying kernel
methods to real datasets. For MNIST we find and for CIFAR10
, for both regression and classification tasks, and for
Gaussian or Laplace kernels. To rationalize the existence of non-trivial
exponents that can be independent of the specific kernel used, we study the
Teacher-Student framework for kernels. In this scheme, a Teacher generates data
according to a Gaussian random field, and a Student learns them via kernel
regression. With a simplifying assumption -- namely that the data are sampled
from a regular lattice -- we derive analytically for translation
invariant kernels, using previous results from the kriging literature. Provided
that the Student is not too sensitive to high frequencies, depends only
on the smoothness and dimension of the training data. We confirm numerically
that these predictions hold when the training points are sampled at random on a
hypersphere. Overall, the test error is found to be controlled by the magnitude
of the projection of the true function on the kernel eigenvectors whose rank is
larger than . Using this idea we predict relate the exponent to an
exponent describing how the coefficients of the true function in the
eigenbasis of the kernel decay with rank. We extract from real data by
performing kernel PCA, leading to for MNIST and
for CIFAR10, in good agreement with observations. We argue
that these rather large exponents are possible due to the small effective
dimension of the data.Comment: We added (i) the prediction of the exponent for real data
using kernel PCA; (ii) the generalization of our results to non-Gaussian data
from reference [11] (Bordelon et al., "Spectrum Dependent Learning Curves in
Kernel Regression and Wide Neural Networks"
A jamming transition from under- to over-parametrization affects loss landscape and generalization
We argue that in fully-connected networks a phase transition delimits the
over- and under-parametrized regimes where fitting can or cannot be achieved.
Under some general conditions, we show that this transition is sharp for the
hinge loss. In the whole over-parametrized regime, poor minima of the loss are
not encountered during training since the number of constraints to satisfy is
too small to hamper minimization. Our findings support a link between this
transition and the generalization properties of the network: as we increase the
number of parameters of a given model, starting from an under-parametrized
network, we observe that the generalization error displays three phases: (i)
initial decay, (ii) increase until the transition point --- where it displays a
cusp --- and (iii) slow decay toward a constant for the rest of the
over-parametrized regime. Thereby we identify the region where the classical
phenomenon of over-fitting takes place, and the region where the model keeps
improving, in line with previous empirical observations for modern neural
networks.Comment: arXiv admin note: text overlap with arXiv:1809.0934
Disentangling feature and lazy training in deep neural networks
Two distinct limits for deep learning have been derived as the network width
, depending on how the weights of the last layer scale
with . In the Neural Tangent Kernel (NTK) limit, the dynamics becomes linear
in the weights and is described by a frozen kernel . By contrast, in
the Mean-Field limit, the dynamics can be expressed in terms of the
distribution of the parameters associated with a neuron, that follows a partial
differential equation. In this work we consider deep networks where the weights
in the last layer scale as at initialization. By varying
and , we probe the crossover between the two limits. We observe the
previously identified regimes of lazy training and feature training. In the
lazy-training regime, the dynamics is almost linear and the NTK barely changes
after initialization. The feature-training regime includes the mean-field
formulation as a limiting case and is characterized by a kernel that evolves in
time, and learns some features. We perform numerical experiments on MNIST,
Fashion-MNIST, EMNIST and CIFAR10 and consider various architectures. We find
that (i) The two regimes are separated by an that scales as
. (ii) Network architecture and data structure play an important role
in determining which regime is better: in our tests, fully-connected networks
perform generally better in the lazy-training regime, unlike convolutional
networks. (iii) In both regimes, the fluctuations induced on the
learned function by initial conditions decay as ,
leading to a performance that increases with . The same improvement can also
be obtained at an intermediate width by ensemble-averaging several networks.
(iv) In the feature-training regime we identify a time scale
, such that for the dynamics is linear.Comment: minor revision
Avalanches in glassy systems
Beaucoup de systèmes qui ont un certain degré de désordre ont des similaritésdans leur structure: le paysage énergétique est aléatoire et il a plusieursminima locaux de l’énergie. Quand on ajoute une petite perturbation externeau système à basse temprature, il est raisonnable d’attendre que la dynamiqueconduira le système d’un minimum à l’autre, et ça donne lieu à une réponsealéatoire et saccadé. Les sautes discontinus que l’on observe sont appelésavalanches, et l’intérêt de ce travail est le calcul de leur distribution. Undes résultats est en effet le développement d’un cadre pour calculer cettedistribution dans des systèmes en dimension infinie qui peuvent être décritsavec le replica symmetry breaking. Nous appliquons les résultats à l’un desmodèles les plus simples des verres structuraux, c’est à dire les empilementsdenses de sphères molles avec répulsion harmonique, avec une déformation(shear strain) du volume comme perturbation. Nous soutenons que, quandla déformation est suffisamment petite, une portion de la distribution desavalanches devient une loi de puissance, dont l’exposant peut être directementlié au paramètre d’ordre de la brisure de symétrie de replica. Cet exposant estégalement lié à la distribution des forces de contact (au moins entre certainessphères), dont le comportement asymptotique on sais que ne dpends pasfortement de la dimension spatiale; pour cette raison nous comparons lesprdictions de champ moyen en dimension infinie avec des simulation du mêmesystème en dimension trois et, remarquablement, on trouve un bon accord.Dans le reste de la thèse nous discutons aussi les similarités avec des travauxprécédents et quelques consquences que la distribution des avalanches donnesur les propriétés élastiques de la matière granulaire dense.Many systems that are somehow characterized by a degree of disorder sharea similar structure: the energy landscape has many sample-dependent localenergy minima. When a small external perturbation is applied to the systemat low temperature, it is reasonable to expect that the dynamics will leadthe system from a minimum to another, thus displaying a random and jerkyresponse. The discontinuous jumps that one observes are called avalanches,and the focus of this work is the computation of their distribution. Oneof the results is indeed the development of a framework that allows thecomputation of this distribution in infinite-dimensional systems that canbe described within a replica symmetry breaking ansatz. We apply theresults to one of the simplest models of structural glasses, namely densepackings of (harmonic) soft spheres, either at jamming or at larger densities,subject to a shear transformation that induces jumps both in the totalenergy and in the shear stress of the system. We argue that, when theshear strain is small enough, the avalanche distribution develops a power-lawbehavior, whose exponent can be directly related to the functional orderparameter of the replica symmetry breaking solution. This exponent is alsorelated to the distribution of contact forces (or at least of the contact forcesbetween some of the spheres), whose asymptotic behavior is known not todepend strongly on the spatial dimension; for this reason, we compare theinfinite-dimensional prediction with three dimensional simulations of thesame systems and, remarkably, we find a good agreement. In the rest of thethesis we compare our results with previous works, and we also discuss someof the consequences that the avalanche distribution lead to, concerning thestatistical elastic properties of dense granular media
Les avalanches dans les systèmes vitreux
Many systems that are somehow characterized by a degree of disorder sharea similar structure: the energy landscape has many sample-dependent localenergy minima. When a small external perturbation is applied to the systemat low temperature, it is reasonable to expect that the dynamics will leadthe system from a minimum to another, thus displaying a random and jerkyresponse. The discontinuous jumps that one observes are called avalanches,and the focus of this work is the computation of their distribution. Oneof the results is indeed the development of a framework that allows thecomputation of this distribution in infinite-dimensional systems that canbe described within a replica symmetry breaking ansatz. We apply theresults to one of the simplest models of structural glasses, namely densepackings of (harmonic) soft spheres, either at jamming or at larger densities,subject to a shear transformation that induces jumps both in the totalenergy and in the shear stress of the system. We argue that, when theshear strain is small enough, the avalanche distribution develops a power-lawbehavior, whose exponent can be directly related to the functional orderparameter of the replica symmetry breaking solution. This exponent is alsorelated to the distribution of contact forces (or at least of the contact forcesbetween some of the spheres), whose asymptotic behavior is known not todepend strongly on the spatial dimension; for this reason, we compare theinfinite-dimensional prediction with three dimensional simulations of thesame systems and, remarkably, we find a good agreement. In the rest of thethesis we compare our results with previous works, and we also discuss someof the consequences that the avalanche distribution lead to, concerning thestatistical elastic properties of dense granular media.Beaucoup de systèmes qui ont un certain degré de désordre ont des similaritésdans leur structure: le paysage énergétique est aléatoire et il a plusieursminima locaux de l’énergie. Quand on ajoute une petite perturbation externeau système à basse temprature, il est raisonnable d’attendre que la dynamiqueconduira le système d’un minimum à l’autre, et ça donne lieu à une réponsealéatoire et saccadé. Les sautes discontinus que l’on observe sont appelésavalanches, et l’intérêt de ce travail est le calcul de leur distribution. Undes résultats est en effet le développement d’un cadre pour calculer cettedistribution dans des systèmes en dimension infinie qui peuvent être décritsavec le replica symmetry breaking. Nous appliquons les résultats à l’un desmodèles les plus simples des verres structuraux, c’est à dire les empilementsdenses de sphères molles avec répulsion harmonique, avec une déformation(shear strain) du volume comme perturbation. Nous soutenons que, quandla déformation est suffisamment petite, une portion de la distribution desavalanches devient une loi de puissance, dont l’exposant peut être directementlié au paramètre d’ordre de la brisure de symétrie de replica. Cet exposant estégalement lié à la distribution des forces de contact (au moins entre certainessphères), dont le comportement asymptotique on sais que ne dpends pasfortement de la dimension spatiale; pour cette raison nous comparons lesprdictions de champ moyen en dimension infinie avec des simulation du mêmesystème en dimension trois et, remarquablement, on trouve un bon accord.Dans le reste de la thèse nous discutons aussi les similarités avec des travauxprécédents et quelques consquences que la distribution des avalanches donnesur les propriétés élastiques de la matière granulaire dense
Mean-field avalanches in jammed spheres
23 pages, 9 figuresInternational audienceDisordered systems are characterized by the existence of many sample- dependent local energy minima, that cause a stepwise response when the system is perturbed. In this article we use an approach based on elementary probabilistic methods to compute the complete probability distribution of the jumps (static avalanches) in the response of mean-field systems described by replica symmetry breaking; we find a precise condition for having a power-law behavior in the distribution of avalanches caused by small perturbations, and we show that our predictions are in remarkable agreement both with previous results and with what is found in simulations of three dimensional systems of soft-spheres, either at jamming or at slightly higher densities
Comparing Dynamics: Deep Neural Networks versus Glassy Systems
We analyze numerically the training dynamics of deep neural networks (DNN) by
using methods developed in statistical physics of glassy systems. The two main
issues we address are (1) the complexity of the loss landscape and of the
dynamics within it, and (2) to what extent DNNs share similarities with glassy
systems. Our findings, obtained for different architectures and datasets,
suggest that during the training process the dynamics slows down because of an
increasingly large number of flat directions. At large times, when the loss is
approaching zero, the system diffuses at the bottom of the landscape. Despite
some similarities with the dynamics of mean-field glassy systems, in
particular, the absence of barrier crossing, we find distinctive dynamical
behaviors in the two cases, showing that the statistical properties of the
corresponding loss and energy landscapes are different. In contrast, when the
network is under-parametrized we observe a typical glassy behavior, thus
suggesting the existence of different phases depending on whether the network
is under-parametrized or over-parametrized.Comment: 10 pages, 5 figures. Version accepted at ICML 201