4,459 research outputs found
A Rigorous Framework for the Mean Field Limit of Multilayer Neural Networks
We develop a mathematically rigorous framework for multilayer neural networks
in the mean field regime. As the network's width increases, the network's
learning trajectory is shown to be well captured by a meaningful and
dynamically nonlinear limit (the \textit{mean field} limit), which is
characterized by a system of ODEs. Our framework applies to a broad range of
network architectures, learning dynamics and network initializations. Central
to the framework is the new idea of a \textit{neuronal embedding}, which
comprises of a non-evolving probability space that allows to embed neural
networks of arbitrary widths.
We demonstrate two applications of our framework. Firstly the framework gives
a principled way to study the simplifying effects that independent and
identically distributed initializations have on the mean field limit. Secondly
we prove a global convergence guarantee for two-layer and three-layer networks.
Unlike previous works that rely on convexity, our result requires a certain
universal approximation property, which is a distinctive feature of
infinite-width neural networks. To the best of our knowledge, this is the first
time global convergence is established for neural networks of more than two
layers in the mean field regime
Techniques of replica symmetry breaking and the storage problem of the McCulloch-Pitts neuron
In this article the framework for Parisi's spontaneous replica symmetry
breaking is reviewed, and subsequently applied to the example of the
statistical mechanical description of the storage properties of a
McCulloch-Pitts neuron. The technical details are reviewed extensively, with
regard to the wide range of systems where the method may be applied. Parisi's
partial differential equation and related differential equations are discussed,
and a Green function technique introduced for the calculation of replica
averages, the key to determining the averages of physical quantities. The
ensuing graph rules involve only tree graphs, as appropriate for a
mean-field-like model. The lowest order Ward-Takahashi identity is recovered
analytically and is shown to lead to the Goldstone modes in continuous replica
symmetry breaking phases. The need for a replica symmetry breaking theory in
the storage problem of the neuron has arisen due to the thermodynamical
instability of formerly given solutions. Variational forms for the neuron's
free energy are derived in terms of the order parameter function x(q), for
different prior distribution of synapses. Analytically in the high temperature
limit and numerically in generic cases various phases are identified, among
them one similar to the Parisi phase in the Sherrington-Kirkpatrick model.
Extensive quantities like the error per pattern change slightly with respect to
the known unstable solutions, but there is a significant difference in the
distribution of non-extensive quantities like the synaptic overlaps and the
pattern storage stability parameter. A simulation result is also reviewed and
compared to the prediction of the theory.Comment: 103 Latex pages (with REVTeX 3.0), including 15 figures (ps, epsi,
eepic), accepted for Physics Report
Techniques of replica symmetry breaking and the storage problem of the McCulloch-Pitts neuron
In this article the framework for Parisi's spontaneous replica symmetry
breaking is reviewed, and subsequently applied to the example of the
statistical mechanical description of the storage properties of a
McCulloch-Pitts neuron. The technical details are reviewed extensively, with
regard to the wide range of systems where the method may be applied. Parisi's
partial differential equation and related differential equations are discussed,
and a Green function technique introduced for the calculation of replica
averages, the key to determining the averages of physical quantities. The
ensuing graph rules involve only tree graphs, as appropriate for a
mean-field-like model. The lowest order Ward-Takahashi identity is recovered
analytically and is shown to lead to the Goldstone modes in continuous replica
symmetry breaking phases. The need for a replica symmetry breaking theory in
the storage problem of the neuron has arisen due to the thermodynamical
instability of formerly given solutions. Variational forms for the neuron's
free energy are derived in terms of the order parameter function x(q), for
different prior distribution of synapses. Analytically in the high temperature
limit and numerically in generic cases various phases are identified, among
them one similar to the Parisi phase in the Sherrington-Kirkpatrick model.
Extensive quantities like the error per pattern change slightly with respect to
the known unstable solutions, but there is a significant difference in the
distribution of non-extensive quantities like the synaptic overlaps and the
pattern storage stability parameter. A simulation result is also reviewed and
compared to the prediction of the theory.Comment: 103 Latex pages (with REVTeX 3.0), including 15 figures (ps, epsi,
eepic), accepted for Physics Report
Landscape connectivity and dropout stability of SGD solutions for over-parameterized neural networks
The optimization of multilayer neural networks typically leads to a solution
with zero training error, yet the landscape can exhibit spurious local minima
and the minima can be disconnected. In this paper, we shed light on this
phenomenon: we show that the combination of stochastic gradient descent (SGD)
and over-parameterization makes the landscape of multilayer neural networks
approximately connected and thus more favorable to optimization. More
specifically, we prove that SGD solutions are connected via a piecewise linear
path, and the increase in loss along this path vanishes as the number of
neurons grows large. This result is a consequence of the fact that the
parameters found by SGD are increasingly dropout stable as the network becomes
wider. We show that, if we remove part of the neurons (and suitably rescale the
remaining ones), the change in loss is independent of the total number of
neurons, and it depends only on how many neurons are left. Our results exhibit
a mild dependence on the input dimension: they are dimension-free for two-layer
networks and depend linearly on the dimension for multilayer networks. We
validate our theoretical findings with numerical experiments for different
architectures and classification tasks
Mean Field Theory for Sigmoid Belief Networks
We develop a mean field theory for sigmoid belief networks based on ideas
from statistical mechanics. Our mean field theory provides a tractable
approximation to the true probability distribution in these networks; it also
yields a lower bound on the likelihood of evidence. We demonstrate the utility
of this framework on a benchmark problem in statistical pattern
recognition---the classification of handwritten digits.Comment: See http://www.jair.org/ for any accompanying file
Neural Networks retrieving Boolean patterns in a sea of Gaussian ones
Restricted Boltzmann Machines are key tools in Machine Learning and are
described by the energy function of bipartite spin-glasses. From a statistical
mechanical perspective, they share the same Gibbs measure of Hopfield networks
for associative memory. In this equivalence, weights in the former play as
patterns in the latter. As Boltzmann machines usually require real weights to
be trained with gradient descent like methods, while Hopfield networks
typically store binary patterns to be able to retrieve, the investigation of a
mixed Hebbian network, equipped with both real (e.g., Gaussian) and discrete
(e.g., Boolean) patterns naturally arises. We prove that, in the challenging
regime of a high storage of real patterns, where retrieval is forbidden, an
extra load of Boolean patterns can still be retrieved, as long as the ratio
among the overall load and the network size does not exceed a critical
threshold, that turns out to be the same of the standard
Amit-Gutfreund-Sompolinsky theory. Assuming replica symmetry, we study the case
of a low load of Boolean patterns combining the stochastic stability and
Hamilton-Jacobi interpolating techniques. The result can be extended to the
high load by a non rigorous but standard replica computation argument.Comment: 16 pages, 1 figur
- …