75 research outputs found
Learning Interacting Theories from Data
One challenge of physics is to explain how collective properties arise from
microscopic interactions. Indeed, interactions form the building blocks of
almost all physical theories and are described by polynomial terms in the
action. The traditional approach is to derive these terms from elementary
processes and then use the resulting model to make predictions for the entire
system. But what if the underlying processes are unknown? Can we reverse the
approach and learn the microscopic action by observing the entire system? We
use invertible neural networks (INNs) to first learn the observed data
distribution. By the choice of a suitable nonlinearity for the neuronal
activation function, we are then able to compute the action from the weights of
the trained model; a diagrammatic language expresses the change of the action
from layer to layer. This process uncovers how the network hierarchically
constructs interactions via nonlinear transformations of pairwise relations. We
test this approach on simulated data sets of interacting theories. The network
consistently reproduces a broad class of unimodal distributions; outside this
class, it finds effective theories that approximate the data statistics up to
the third cumulant. We explicitly show how network depth and data quantity
jointly improve the agreement between the learned and the true model. This work
shows how to leverage the power of machine learning to transparently extract
microscopic models from data
New perspectives in statistical mechanics and high-dimensional inference
The main purpose of this thesis is to go beyond two usual assumptions that accompany theoretical analysis in spin-glasses and inference: the i.i.d. (independently and identically distributed) hypothesis on the noise elements and the finite rank regime. The first one appears since the early birth of spin-glasses. The second one instead concerns the inference viewpoint. Disordered systems and Bayesian inference have a well-established relation, evidenced by their continuous cross-fertilization. The thesis makes use of techniques coming both from the rigorous mathematical machinery of spin-glasses, such as the interpolation scheme, and from Statistical Physics, such as the replica method. The first chapter contains an introduction to the Sherrington-Kirkpatrick and spiked Wigner models. The first is a mean field spin-glass where the couplings are i.i.d. Gaussian random variables. The second instead amounts to establish the information theoretical limits in the reconstruction of a fixed low rank matrix, the “spike”, blurred by additive Gaussian noise. In chapters 2 and 3 the i.i.d. hypothesis on the noise is broken by assuming a noise with inhomogeneous variance profile. In spin-glasses this leads to multi-species models. The inferential counterpart is called spatial coupling. All the previous models are usually studied in the Bayes-optimal setting, where everything is known about the generating process of the data. In chapter 4 instead we study the spiked Wigner model where the prior on the signal to reconstruct is ignored. In chapter 5 we analyze the statistical limits of a spiked Wigner model where the noise is no longer Gaussian, but drawn from a random matrix ensemble, which makes its elements dependent. The thesis ends with chapter 6, where the challenging problem of high-rank probabilistic matrix factorization is tackled. Here we introduce a new procedure called "decimation" and we show that it is theoretically to perform matrix factorization through it
Unsupervised hierarchical clustering using the learning dynamics of RBMs
Datasets in the real world are often complex and to some degree hierarchical,
with groups and sub-groups of data sharing common characteristics at different
levels of abstraction. Understanding and uncovering the hidden structure of
these datasets is an important task that has many practical applications. To
address this challenge, we present a new and general method for building
relational data trees by exploiting the learning dynamics of the Restricted
Boltzmann Machine (RBM). Our method is based on the mean-field approach,
derived from the Plefka expansion, and developed in the context of disordered
systems. It is designed to be easily interpretable. We tested our method in an
artificially created hierarchical dataset and on three different real-world
datasets (images of digits, mutations in the human genome, and a homologous
family of proteins). The method is able to automatically identify the
hierarchical structure of the data. This could be useful in the study of
homologous protein sequences, where the relationships between proteins are
critical for understanding their function and evolution.Comment: Version accepted in Physical Review
Explaining the effects of non-convergent sampling in the training of Energy-Based Models
In this paper, we quantify the impact of using non-convergent Markov chains
to train Energy-Based models (EBMs). In particular, we show analytically that
EBMs trained with non-persistent short runs to estimate the gradient can
perfectly reproduce a set of empirical statistics of the data, not at the level
of the equilibrium measure, but through a precise dynamical process. Our
results provide a first-principles explanation for the observations of recent
works proposing the strategy of using short runs starting from random initial
conditions as an efficient way to generate high-quality samples in EBMs, and
lay the groundwork for using EBMs as diffusion models. After explaining this
effect in generic EBMs, we analyze two solvable models in which the effect of
the non-convergent sampling in the trained parameters can be described in
detail. Finally, we test these predictions numerically on the Boltzmann
machine.Comment: 13 pages, 3 figure
Dense Hebbian neural networks: a replica symmetric picture of supervised learning
We consider dense, associative neural-networks trained by a teacher (i.e.,
with supervision) and we investigate their computational capabilities
analytically, via statistical-mechanics of spin glasses, and numerically, via
Monte Carlo simulations. In particular, we obtain a phase diagram summarizing
their performance as a function of the control parameters such as quality and
quantity of the training dataset, network storage and noise, that is valid in
the limit of large network size and structureless datasets: these networks may
work in a ultra-storage regime (where they can handle a huge amount of
patterns, if compared with shallow neural networks) or in a ultra-detection
regime (where they can perform pattern recognition at prohibitive
signal-to-noise ratios, if compared with shallow neural networks). Guided by
the random theory as a reference framework, we also test numerically learning,
storing and retrieval capabilities shown by these networks on structured
datasets as MNist and Fashion MNist. As technical remarks, from the analytic
side, we implement large deviations and stability analysis within Guerra's
interpolation to tackle the not-Gaussian distributions involved in the
post-synaptic potentials while, from the computational counterpart, we insert
Plefka approximation in the Monte Carlo scheme, to speed up the evaluation of
the synaptic tensors, overall obtaining a novel and broad approach to
investigate supervised learning in neural networks, beyond the shallow limit,
in general.Comment: arXiv admin note: text overlap with arXiv:2211.1406
Recent Applications of Dynamical Mean-Field Methods
Rich out of equilibrium collective dynamics of strongly interacting large
assemblies emerge in many areas of science. Some intriguing and not fully
understood examples are the glassy arrest in atomic, molecular or colloidal
systems, flocking in natural or artificial active matter, and the organization
and subsistence of ecosystems. The learning process, and ensuing amazing
performance, of deep neural networks bears resemblance with some of the
before-mentioned examples. Quantum mechanical extensions are also of interest.
In exact or approximate manner the evolution of these systems can be expressed
in terms of a dynamical mean-field theory which not only captures many of their
peculiar effects but also has predictive power. This short review presents a
summary of recent developments of this approach with emphasis on applications
on the examples mentioned above.Comment: 39p, 6 figs, Annual Review of Condensed Matter Physics (to appear
Fast and Functional Structured Data Generators Rooted in Out-of-Equilibrium Physics
In this study, we address the challenge of using energy-based models to
produce high-quality, label-specific data in complex structured datasets, such
as population genetics, RNA or protein sequences data. Traditional training
methods encounter difficulties due to inefficient Markov chain Monte Carlo
mixing, which affects the diversity of synthetic data and increases generation
times. To address these issues, we use a novel training algorithm that exploits
non-equilibrium effects. This approach, applied on the Restricted Boltzmann
Machine, improves the model's ability to correctly classify samples and
generate high-quality synthetic data in only a few sampling steps. The
effectiveness of this method is demonstrated by its successful application to
four different types of data: handwritten digits, mutations of human genomes
classified by continental origin, functionally characterized sequences of an
enzyme protein family, and homologous RNA sequences from specific taxonomies.Comment: 15 page
Unsupervised inference methods for protein sequence data
L'abstract è presente nell'allegato / the abstract is in the attachmen
Balanced Training of Energy-Based Models with Adaptive Flow Sampling
Energy-based models (EBMs) are versatile density estimation models that
directly parameterize an unnormalized log density. Although very flexible, EBMs
lack a specified normalization constant of the model, making the likelihood of
the model computationally intractable. Several approximate samplers and
variational inference techniques have been proposed to estimate the likelihood
gradients for training. These techniques have shown promising results in
generating samples, but little attention has been paid to the statistical
accuracy of the estimated density, such as determining the relative importance
of different classes in a dataset. In this work, we propose a new maximum
likelihood training algorithm for EBMs that uses a different type of generative
model, normalizing flows (NF), which have recently been proposed to facilitate
sampling. Our method fits an NF to an EBM during training so that an
NF-assisted sampling scheme provides an accurate gradient for the EBMs at all
times, ultimately leading to a fast sampler for generating new data
Tradeoff of generalization error in unsupervised learning
Finding the optimal model complexity that minimizes the generalization error
(GE) is a key issue of machine learning. For the conventional supervised
learning, this task typically involves the bias-variance tradeoff: lowering the
bias by making the model more complex entails an increase in the variance.
Meanwhile, little has been studied about whether the same tradeoff exists for
unsupervised learning. In this study, we propose that unsupervised learning
generally exhibits a two-component tradeoff of the GE, namely the model error
and the data error -- using a more complex model reduces the model error at the
cost of the data error, with the data error playing a more significant role for
a smaller training dataset. This is corroborated by training the restricted
Boltzmann machine to generate the configurations of the two-dimensional Ising
model at a given temperature and the totally asymmetric simple exclusion
process with given entry and exit rates. Our results also indicate that the
optimal model tends to be more complex when the data to be learned are more
complex.Comment: 15 pages, 7 figure
- …