6,755 research outputs found

    Deep Residual Learning via Large Sample Mean-Field Stochastic Optimization

    Full text link
    We study a class of stochastic optimization problems of the mean-field type arising in the optimal training of a deep residual neural network. We consider the sampling problem arising from a continuous layer idealization, and establish the existence of optimal relaxed controls when the training set has finite size. The core of our paper is to prove the Gamma-convergence of the sequence of sampled objective functionals, i.e., show that as the size of the training set grows large, the minimizer of the sampled relaxed problem converges to that of the limiting optimization problem. We connect the limit of the large sampled objective functional to the unique solution, in the trajectory sense, of a nonlinear Fokker-Planck-Kolmogorov (FPK) equation in a random environment. We construct an example to show that, under mild assumptions, the optimal network weights can be numerically computed by solving a second-order differential equation with Neumann boundary conditions in the sense of distributions

    Mean Field Analysis of Neural Networks: A Law of Large Numbers

    Full text link
    Machine learning, and in particular neural network models, have revolutionized fields such as image, text, and speech recognition. Today, many important real-world applications in these areas are driven by neural networks. There are also growing applications in engineering, robotics, medicine, and finance. Despite their immense success in practice, there is limited mathematical understanding of neural networks. This paper illustrates how neural networks can be studied via stochastic analysis, and develops approaches for addressing some of the technical challenges which arise. We analyze one-layer neural networks in the asymptotic regime of simultaneously (A) large network sizes and (B) large numbers of stochastic gradient descent training iterations. We rigorously prove that the empirical distribution of the neural network parameters converges to the solution of a nonlinear partial differential equation. This result can be considered a law of large numbers for neural networks. In addition, a consequence of our analysis is that the trained parameters of the neural network asymptotically become independent, a property which is commonly called "propagation of chaos"

    A Broad Class of Discrete-Time Hypercomplex-Valued Hopfield Neural Networks

    Full text link
    In this paper, we address the stability of a broad class of discrete-time hypercomplex-valued Hopfield-type neural networks. To ensure the neural networks belonging to this class always settle down at a stationary state, we introduce novel hypercomplex number systems referred to as real-part associative hypercomplex number systems. Real-part associative hypercomplex number systems generalize the well-known Cayley-Dickson algebras and real Clifford algebras and include the systems of real numbers, complex numbers, dual numbers, hyperbolic numbers, quaternions, tessarines, and octonions as particular instances. Apart from the novel hypercomplex number systems, we introduce a family of hypercomplex-valued activation functions called B\mathcal{B}-projection functions. Broadly speaking, a B\mathcal{B}-projection function projects the activation potential onto the set of all possible states of a hypercomplex-valued neuron. Using the theory presented in this paper, we confirm the stability analysis of several discrete-time hypercomplex-valued Hopfield-type neural networks from the literature. Moreover, we introduce and provide the stability analysis of a general class of Hopfield-type neural networks on Cayley-Dickson algebras

    Large Deviations of a Spatially-Stationary Network of Interacting Neurons

    Get PDF
    In this work we determine a process-level Large Deviation Principle (LDP) for a model of interacting neurons indexed by a lattice Zd\mathbb{Z}^d. The neurons are subject to noise, which is modelled as a correlated martingale. The probability law governing the noise is strictly stationary, and we are therefore able to find a LDP for the probability laws Πn\Pi^n governing the stationary empirical measure μ^n\hat{\mu}^n generated by the neurons in a cube of length (2n+1)(2n+1). We use this LDP to determine an LDP for the neural network model. The connection weights between the neurons evolve according to a learning rule / neuronal plasticity, and these results are adaptable to a large variety of neural network models. This LDP is of great use in the mathematical modelling of neural networks, because it allows a quantification of the likelihood of the system deviating from its limit, and also a determination of which direction the system is likely to deviate. The work is also of interest because there are nontrivial correlations between the neurons even in the asymptotic limit, thereby presenting itself as a generalisation of traditional mean-field models

    Fixed-time Distributed Optimization under Time-Varying Communication Topology

    Full text link
    This paper presents a method to solve distributed optimization problem within a fixed time over a time-varying communication topology. Each agent in the network can access its private objective function, while exchange of local information is permitted between the neighbors. This study investigates first nonlinear protocol for achieving distributed optimization for time-varying communication topology within a fixed time independent of the initial conditions. For the case when the global objective function is strictly convex, a second-order Hessian based approach is developed for achieving fixed-time convergence. In the special case of strongly convex global objective function, it is shown that the requirement to transmit Hessians can be relaxed and an equivalent first-order method is developed for achieving fixed-time convergence to global optimum. Results are further extended to the case where the underlying team objective function, possibly non-convex, satisfies only the Polyak-\L ojasiewicz (PL) inequality, which is a relaxation of strong convexity.Comment: 25 page

    Dreaming neural networks: forgetting spurious memories and reinforcing pure ones

    Full text link
    The standard Hopfield model for associative neural networks accounts for biological Hebbian learning and acts as the harmonic oscillator for pattern recognition, however its maximal storage capacity is α∼0.14\alpha \sim 0.14, far from the theoretical bound for symmetric networks, i.e. α=1\alpha =1. Inspired by sleeping and dreaming mechanisms in mammal brains, we propose an extension of this model displaying the standard on-line (awake) learning mechanism (that allows the storage of external information in terms of patterns) and an off-line (sleep) unlearning&\&consolidating mechanism (that allows spurious-pattern removal and pure-pattern reinforcement): this obtained daily prescription is able to saturate the theoretical bound α=1\alpha=1, remaining also extremely robust against thermal noise. Both neural and synaptic features are analyzed both analytically and numerically. In particular, beyond obtaining a phase diagram for neural dynamics, we focus on synaptic plasticity and we give explicit prescriptions on the temporal evolution of the synaptic matrix. We analytically prove that our algorithm makes the Hebbian kernel converge with high probability to the projection matrix built over the pure stored patterns. Furthermore, we obtain a sharp and explicit estimate for the "sleep rate" in order to ensure such a convergence. Finally, we run extensive numerical simulations (mainly Monte Carlo sampling) to check the approximations underlying the analytical investigations (e.g., we developed the whole theory at the so called replica-symmetric level, as standard in the Amit-Gutfreund-Sompolinsky reference framework) and possible finite-size effects, finding overall full agreement with the theory.Comment: 31 pages, 12 figure
    • …
    corecore