11,912 research outputs found
To go deep or wide in learning?
To achieve acceptable performance for AI tasks, one can either use
sophisticated feature extraction methods as the first layer in a two-layered
supervised learning model, or learn the features directly using a deep
(multi-layered) model. While the first approach is very problem-specific, the
second approach has computational overheads in learning multiple layers and
fine-tuning of the model. In this paper, we propose an approach called wide
learning based on arc-cosine kernels, that learns a single layer of infinite
width. We propose exact and inexact learning strategies for wide learning and
show that wide learning with single layer outperforms single layer as well as
deep architectures of finite width for some benchmark datasets.Comment: 9 pages, 1 figure, Accepted for publication in Seventeenth
International Conference on Artificial Intelligence and Statistic
Free energies of Boltzmann Machines: self-averaging, annealed and replica symmetric approximations in the thermodynamic limit
Restricted Boltzmann machines (RBMs) constitute one of the main models for
machine statistical inference and they are widely employed in Artificial
Intelligence as powerful tools for (deep) learning. However, in contrast with
countless remarkable practical successes, their mathematical formalization has
been largely elusive: from a statistical-mechanics perspective these systems
display the same (random) Gibbs measure of bi-partite spin-glasses, whose
rigorous treatment is notoriously difficult. In this work, beyond providing a
brief review on RBMs from both the learning and the retrieval perspectives, we
aim to contribute to their analytical investigation, by considering two
distinct realizations of their weights (i.e., Boolean and Gaussian) and
studying the properties of their related free energies. More precisely,
focusing on a RBM characterized by digital couplings, we first extend the
Pastur-Shcherbina-Tirozzi method (originally developed for the Hopfield model)
to prove the self-averaging property for the free energy, over its quenched
expectation, in the infinite volume limit, then we explicitly calculate its
simplest approximation, namely its annealed bound. Next, focusing on a RBM
characterized by analogical weights, we extend Guerra's interpolating scheme to
obtain a control of the quenched free-energy under the assumption of replica
symmetry: we get self-consistencies for the order parameters (in full agreement
with the existing Literature) as well as the critical line for ergodicity
breaking that turns out to be the same obtained in AGS theory. As we discuss,
this analogy stems from the slow-noise universality. Finally, glancing beyond
replica symmetry, we analyze the fluctuations of the overlaps for an estimate
of the (slow) noise affecting the retrieval of the signal, and by a stability
analysis we recover the Aizenman-Contucci identities typical of glassy systems.Comment: 21 pages, 1 figur
Boosting Monte Carlo simulations of spin glasses using autoregressive neural networks
The autoregressive neural networks are emerging as a powerful computational
tool to solve relevant problems in classical and quantum mechanics. One of
their appealing functionalities is that, after they have learned a probability
distribution from a dataset, they allow exact and efficient sampling of typical
system configurations. Here we employ a neural autoregressive distribution
estimator (NADE) to boost Markov chain Monte Carlo (MCMC) simulations of a
paradigmatic classical model of spin-glass theory, namely the two-dimensional
Edwards-Anderson Hamiltonian. We show that a NADE can be trained to accurately
mimic the Boltzmann distribution using unsupervised learning from system
configurations generated using standard MCMC algorithms. The trained NADE is
then employed as smart proposal distribution for the Metropolis-Hastings
algorithm. This allows us to perform efficient MCMC simulations, which provide
unbiased results even if the expectation value corresponding to the probability
distribution learned by the NADE is not exact. Notably, we implement a
sequential tempering procedure, whereby a NADE trained at a higher temperature
is iteratively employed as proposal distribution in a MCMC simulation run at a
slightly lower temperature. This allows one to efficiently simulate the
spin-glass model even in the low-temperature regime, avoiding the divergent
correlation times that plague MCMC simulations driven by local-update
algorithms. Furthermore, we show that the NADE-driven simulations quickly
sample ground-state configurations, paving the way to their future utilization
to tackle binary optimization problems.Comment: 13 pages, 14 figure
- …