6,755 research outputs found
Deep Residual Learning via Large Sample Mean-Field Stochastic Optimization
We study a class of stochastic optimization problems of the mean-field type
arising in the optimal training of a deep residual neural network. We consider
the sampling problem arising from a continuous layer idealization, and
establish the existence of optimal relaxed controls when the training set has
finite size. The core of our paper is to prove the Gamma-convergence of the
sequence of sampled objective functionals, i.e., show that as the size of the
training set grows large, the minimizer of the sampled relaxed problem
converges to that of the limiting optimization problem. We connect the limit of
the large sampled objective functional to the unique solution, in the
trajectory sense, of a nonlinear Fokker-Planck-Kolmogorov (FPK) equation in a
random environment. We construct an example to show that, under mild
assumptions, the optimal network weights can be numerically computed by solving
a second-order differential equation with Neumann boundary conditions in the
sense of distributions
Mean Field Analysis of Neural Networks: A Law of Large Numbers
Machine learning, and in particular neural network models, have
revolutionized fields such as image, text, and speech recognition. Today, many
important real-world applications in these areas are driven by neural networks.
There are also growing applications in engineering, robotics, medicine, and
finance. Despite their immense success in practice, there is limited
mathematical understanding of neural networks. This paper illustrates how
neural networks can be studied via stochastic analysis, and develops approaches
for addressing some of the technical challenges which arise. We analyze
one-layer neural networks in the asymptotic regime of simultaneously (A) large
network sizes and (B) large numbers of stochastic gradient descent training
iterations. We rigorously prove that the empirical distribution of the neural
network parameters converges to the solution of a nonlinear partial
differential equation. This result can be considered a law of large numbers for
neural networks. In addition, a consequence of our analysis is that the trained
parameters of the neural network asymptotically become independent, a property
which is commonly called "propagation of chaos"
A Broad Class of Discrete-Time Hypercomplex-Valued Hopfield Neural Networks
In this paper, we address the stability of a broad class of discrete-time
hypercomplex-valued Hopfield-type neural networks. To ensure the neural
networks belonging to this class always settle down at a stationary state, we
introduce novel hypercomplex number systems referred to as real-part
associative hypercomplex number systems. Real-part associative hypercomplex
number systems generalize the well-known Cayley-Dickson algebras and real
Clifford algebras and include the systems of real numbers, complex numbers,
dual numbers, hyperbolic numbers, quaternions, tessarines, and octonions as
particular instances. Apart from the novel hypercomplex number systems, we
introduce a family of hypercomplex-valued activation functions called
-projection functions. Broadly speaking, a
-projection function projects the activation potential onto the
set of all possible states of a hypercomplex-valued neuron. Using the theory
presented in this paper, we confirm the stability analysis of several
discrete-time hypercomplex-valued Hopfield-type neural networks from the
literature. Moreover, we introduce and provide the stability analysis of a
general class of Hopfield-type neural networks on Cayley-Dickson algebras
Large Deviations of a Spatially-Stationary Network of Interacting Neurons
In this work we determine a process-level Large Deviation Principle (LDP) for
a model of interacting neurons indexed by a lattice . The neurons
are subject to noise, which is modelled as a correlated martingale. The
probability law governing the noise is strictly stationary, and we are
therefore able to find a LDP for the probability laws governing the
stationary empirical measure generated by the neurons in a cube
of length . We use this LDP to determine an LDP for the neural network
model. The connection weights between the neurons evolve according to a
learning rule / neuronal plasticity, and these results are adaptable to a large
variety of neural network models. This LDP is of great use in the mathematical
modelling of neural networks, because it allows a quantification of the
likelihood of the system deviating from its limit, and also a determination of
which direction the system is likely to deviate. The work is also of interest
because there are nontrivial correlations between the neurons even in the
asymptotic limit, thereby presenting itself as a generalisation of traditional
mean-field models
Fixed-time Distributed Optimization under Time-Varying Communication Topology
This paper presents a method to solve distributed optimization problem within
a fixed time over a time-varying communication topology. Each agent in the
network can access its private objective function, while exchange of local
information is permitted between the neighbors. This study investigates first
nonlinear protocol for achieving distributed optimization for time-varying
communication topology within a fixed time independent of the initial
conditions. For the case when the global objective function is strictly convex,
a second-order Hessian based approach is developed for achieving fixed-time
convergence. In the special case of strongly convex global objective function,
it is shown that the requirement to transmit Hessians can be relaxed and an
equivalent first-order method is developed for achieving fixed-time convergence
to global optimum. Results are further extended to the case where the
underlying team objective function, possibly non-convex, satisfies only the
Polyak-\L ojasiewicz (PL) inequality, which is a relaxation of strong
convexity.Comment: 25 page
Dreaming neural networks: forgetting spurious memories and reinforcing pure ones
The standard Hopfield model for associative neural networks accounts for
biological Hebbian learning and acts as the harmonic oscillator for pattern
recognition, however its maximal storage capacity is , far
from the theoretical bound for symmetric networks, i.e. . Inspired
by sleeping and dreaming mechanisms in mammal brains, we propose an extension
of this model displaying the standard on-line (awake) learning mechanism (that
allows the storage of external information in terms of patterns) and an
off-line (sleep) unlearningconsolidating mechanism (that allows
spurious-pattern removal and pure-pattern reinforcement): this obtained daily
prescription is able to saturate the theoretical bound , remaining
also extremely robust against thermal noise. Both neural and synaptic features
are analyzed both analytically and numerically. In particular, beyond obtaining
a phase diagram for neural dynamics, we focus on synaptic plasticity and we
give explicit prescriptions on the temporal evolution of the synaptic matrix.
We analytically prove that our algorithm makes the Hebbian kernel converge with
high probability to the projection matrix built over the pure stored patterns.
Furthermore, we obtain a sharp and explicit estimate for the "sleep rate" in
order to ensure such a convergence. Finally, we run extensive numerical
simulations (mainly Monte Carlo sampling) to check the approximations
underlying the analytical investigations (e.g., we developed the whole theory
at the so called replica-symmetric level, as standard in the
Amit-Gutfreund-Sompolinsky reference framework) and possible finite-size
effects, finding overall full agreement with the theory.Comment: 31 pages, 12 figure
- …