75 research outputs found

    Learning Interacting Theories from Data

    Full text link
    One challenge of physics is to explain how collective properties arise from microscopic interactions. Indeed, interactions form the building blocks of almost all physical theories and are described by polynomial terms in the action. The traditional approach is to derive these terms from elementary processes and then use the resulting model to make predictions for the entire system. But what if the underlying processes are unknown? Can we reverse the approach and learn the microscopic action by observing the entire system? We use invertible neural networks (INNs) to first learn the observed data distribution. By the choice of a suitable nonlinearity for the neuronal activation function, we are then able to compute the action from the weights of the trained model; a diagrammatic language expresses the change of the action from layer to layer. This process uncovers how the network hierarchically constructs interactions via nonlinear transformations of pairwise relations. We test this approach on simulated data sets of interacting theories. The network consistently reproduces a broad class of unimodal distributions; outside this class, it finds effective theories that approximate the data statistics up to the third cumulant. We explicitly show how network depth and data quantity jointly improve the agreement between the learned and the true model. This work shows how to leverage the power of machine learning to transparently extract microscopic models from data

    New perspectives in statistical mechanics and high-dimensional inference

    Get PDF
    The main purpose of this thesis is to go beyond two usual assumptions that accompany theoretical analysis in spin-glasses and inference: the i.i.d. (independently and identically distributed) hypothesis on the noise elements and the finite rank regime. The first one appears since the early birth of spin-glasses. The second one instead concerns the inference viewpoint. Disordered systems and Bayesian inference have a well-established relation, evidenced by their continuous cross-fertilization. The thesis makes use of techniques coming both from the rigorous mathematical machinery of spin-glasses, such as the interpolation scheme, and from Statistical Physics, such as the replica method. The first chapter contains an introduction to the Sherrington-Kirkpatrick and spiked Wigner models. The first is a mean field spin-glass where the couplings are i.i.d. Gaussian random variables. The second instead amounts to establish the information theoretical limits in the reconstruction of a fixed low rank matrix, the “spike”, blurred by additive Gaussian noise. In chapters 2 and 3 the i.i.d. hypothesis on the noise is broken by assuming a noise with inhomogeneous variance profile. In spin-glasses this leads to multi-species models. The inferential counterpart is called spatial coupling. All the previous models are usually studied in the Bayes-optimal setting, where everything is known about the generating process of the data. In chapter 4 instead we study the spiked Wigner model where the prior on the signal to reconstruct is ignored. In chapter 5 we analyze the statistical limits of a spiked Wigner model where the noise is no longer Gaussian, but drawn from a random matrix ensemble, which makes its elements dependent. The thesis ends with chapter 6, where the challenging problem of high-rank probabilistic matrix factorization is tackled. Here we introduce a new procedure called "decimation" and we show that it is theoretically to perform matrix factorization through it

    Unsupervised hierarchical clustering using the learning dynamics of RBMs

    Full text link
    Datasets in the real world are often complex and to some degree hierarchical, with groups and sub-groups of data sharing common characteristics at different levels of abstraction. Understanding and uncovering the hidden structure of these datasets is an important task that has many practical applications. To address this challenge, we present a new and general method for building relational data trees by exploiting the learning dynamics of the Restricted Boltzmann Machine (RBM). Our method is based on the mean-field approach, derived from the Plefka expansion, and developed in the context of disordered systems. It is designed to be easily interpretable. We tested our method in an artificially created hierarchical dataset and on three different real-world datasets (images of digits, mutations in the human genome, and a homologous family of proteins). The method is able to automatically identify the hierarchical structure of the data. This could be useful in the study of homologous protein sequences, where the relationships between proteins are critical for understanding their function and evolution.Comment: Version accepted in Physical Review

    Explaining the effects of non-convergent sampling in the training of Energy-Based Models

    Full text link
    In this paper, we quantify the impact of using non-convergent Markov chains to train Energy-Based models (EBMs). In particular, we show analytically that EBMs trained with non-persistent short runs to estimate the gradient can perfectly reproduce a set of empirical statistics of the data, not at the level of the equilibrium measure, but through a precise dynamical process. Our results provide a first-principles explanation for the observations of recent works proposing the strategy of using short runs starting from random initial conditions as an efficient way to generate high-quality samples in EBMs, and lay the groundwork for using EBMs as diffusion models. After explaining this effect in generic EBMs, we analyze two solvable models in which the effect of the non-convergent sampling in the trained parameters can be described in detail. Finally, we test these predictions numerically on the Boltzmann machine.Comment: 13 pages, 3 figure

    Dense Hebbian neural networks: a replica symmetric picture of supervised learning

    Get PDF
    We consider dense, associative neural-networks trained by a teacher (i.e., with supervision) and we investigate their computational capabilities analytically, via statistical-mechanics of spin glasses, and numerically, via Monte Carlo simulations. In particular, we obtain a phase diagram summarizing their performance as a function of the control parameters such as quality and quantity of the training dataset, network storage and noise, that is valid in the limit of large network size and structureless datasets: these networks may work in a ultra-storage regime (where they can handle a huge amount of patterns, if compared with shallow neural networks) or in a ultra-detection regime (where they can perform pattern recognition at prohibitive signal-to-noise ratios, if compared with shallow neural networks). Guided by the random theory as a reference framework, we also test numerically learning, storing and retrieval capabilities shown by these networks on structured datasets as MNist and Fashion MNist. As technical remarks, from the analytic side, we implement large deviations and stability analysis within Guerra's interpolation to tackle the not-Gaussian distributions involved in the post-synaptic potentials while, from the computational counterpart, we insert Plefka approximation in the Monte Carlo scheme, to speed up the evaluation of the synaptic tensors, overall obtaining a novel and broad approach to investigate supervised learning in neural networks, beyond the shallow limit, in general.Comment: arXiv admin note: text overlap with arXiv:2211.1406

    Recent Applications of Dynamical Mean-Field Methods

    Full text link
    Rich out of equilibrium collective dynamics of strongly interacting large assemblies emerge in many areas of science. Some intriguing and not fully understood examples are the glassy arrest in atomic, molecular or colloidal systems, flocking in natural or artificial active matter, and the organization and subsistence of ecosystems. The learning process, and ensuing amazing performance, of deep neural networks bears resemblance with some of the before-mentioned examples. Quantum mechanical extensions are also of interest. In exact or approximate manner the evolution of these systems can be expressed in terms of a dynamical mean-field theory which not only captures many of their peculiar effects but also has predictive power. This short review presents a summary of recent developments of this approach with emphasis on applications on the examples mentioned above.Comment: 39p, 6 figs, Annual Review of Condensed Matter Physics (to appear

    Fast and Functional Structured Data Generators Rooted in Out-of-Equilibrium Physics

    Full text link
    In this study, we address the challenge of using energy-based models to produce high-quality, label-specific data in complex structured datasets, such as population genetics, RNA or protein sequences data. Traditional training methods encounter difficulties due to inefficient Markov chain Monte Carlo mixing, which affects the diversity of synthetic data and increases generation times. To address these issues, we use a novel training algorithm that exploits non-equilibrium effects. This approach, applied on the Restricted Boltzmann Machine, improves the model's ability to correctly classify samples and generate high-quality synthetic data in only a few sampling steps. The effectiveness of this method is demonstrated by its successful application to four different types of data: handwritten digits, mutations of human genomes classified by continental origin, functionally characterized sequences of an enzyme protein family, and homologous RNA sequences from specific taxonomies.Comment: 15 page

    Unsupervised inference methods for protein sequence data

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Balanced Training of Energy-Based Models with Adaptive Flow Sampling

    Full text link
    Energy-based models (EBMs) are versatile density estimation models that directly parameterize an unnormalized log density. Although very flexible, EBMs lack a specified normalization constant of the model, making the likelihood of the model computationally intractable. Several approximate samplers and variational inference techniques have been proposed to estimate the likelihood gradients for training. These techniques have shown promising results in generating samples, but little attention has been paid to the statistical accuracy of the estimated density, such as determining the relative importance of different classes in a dataset. In this work, we propose a new maximum likelihood training algorithm for EBMs that uses a different type of generative model, normalizing flows (NF), which have recently been proposed to facilitate sampling. Our method fits an NF to an EBM during training so that an NF-assisted sampling scheme provides an accurate gradient for the EBMs at all times, ultimately leading to a fast sampler for generating new data

    Tradeoff of generalization error in unsupervised learning

    Full text link
    Finding the optimal model complexity that minimizes the generalization error (GE) is a key issue of machine learning. For the conventional supervised learning, this task typically involves the bias-variance tradeoff: lowering the bias by making the model more complex entails an increase in the variance. Meanwhile, little has been studied about whether the same tradeoff exists for unsupervised learning. In this study, we propose that unsupervised learning generally exhibits a two-component tradeoff of the GE, namely the model error and the data error -- using a more complex model reduces the model error at the cost of the data error, with the data error playing a more significant role for a smaller training dataset. This is corroborated by training the restricted Boltzmann machine to generate the configurations of the two-dimensional Ising model at a given temperature and the totally asymmetric simple exclusion process with given entry and exit rates. Our results also indicate that the optimal model tends to be more complex when the data to be learned are more complex.Comment: 15 pages, 7 figure
    • …
    corecore