17,572 research outputs found
Exact solutions to the nonlinear dynamics of learning in deep linear neural networks
Despite the widespread practical success of deep learning methods, our
theoretical understanding of the dynamics of learning in deep neural networks
remains quite sparse. We attempt to bridge the gap between the theory and
practice of deep learning by systematically analyzing learning dynamics for the
restricted case of deep linear neural networks. Despite the linearity of their
input-output map, such networks have nonlinear gradient descent dynamics on
weights that change with the addition of each new hidden layer. We show that
deep linear networks exhibit nonlinear learning phenomena similar to those seen
in simulations of nonlinear networks, including long plateaus followed by rapid
transitions to lower error solutions, and faster convergence from greedy
unsupervised pretraining initial conditions than from random initial
conditions. We provide an analytical description of these phenomena by finding
new exact solutions to the nonlinear dynamics of deep learning. Our theoretical
analysis also reveals the surprising finding that as the depth of a network
approaches infinity, learning speed can nevertheless remain finite: for a
special class of initial conditions on the weights, very deep networks incur
only a finite, depth independent, delay in learning speed relative to shallow
networks. We show that, under certain conditions on the training data,
unsupervised pretraining can find this special class of initial conditions,
while scaled random Gaussian initializations cannot. We further exhibit a new
class of random orthogonal initial conditions on weights that, like
unsupervised pre-training, enjoys depth independent learning times. We further
show that these initial conditions also lead to faithful propagation of
gradients even in deep nonlinear networks, as long as they operate in a special
regime known as the edge of chaos.Comment: Submission to ICLR2014. Revised based on reviewer feedbac
Network Plasticity as Bayesian Inference
General results from statistical learning theory suggest to understand not
only brain computations, but also brain plasticity as probabilistic inference.
But a model for that has been missing. We propose that inherently stochastic
features of synaptic plasticity and spine motility enable cortical networks of
neurons to carry out probabilistic inference by sampling from a posterior
distribution of network configurations. This model provides a viable
alternative to existing models that propose convergence of parameters to
maximum likelihood values. It explains how priors on weight distributions and
connection probabilities can be merged optimally with learned experience, how
cortical networks can generalize learned information so well to novel
experiences, and how they can compensate continuously for unforeseen
disturbances of the network. The resulting new theory of network plasticity
explains from a functional perspective a number of experimental data on
stochastic aspects of synaptic plasticity that previously appeared to be quite
puzzling.Comment: 33 pages, 5 figures, the supplement is available on the author's web
page http://www.igi.tugraz.at/kappe
Synchronization and Redundancy: Implications for Robustness of Neural Learning and Decision Making
Learning and decision making in the brain are key processes critical to
survival, and yet are processes implemented by non-ideal biological building
blocks which can impose significant error. We explore quantitatively how the
brain might cope with this inherent source of error by taking advantage of two
ubiquitous mechanisms, redundancy and synchronization. In particular we
consider a neural process whose goal is to learn a decision function by
implementing a nonlinear gradient dynamics. The dynamics, however, are assumed
to be corrupted by perturbations modeling the error which might be incurred due
to limitations of the biology, intrinsic neuronal noise, and imperfect
measurements. We show that error, and the associated uncertainty surrounding a
learned solution, can be controlled in large part by trading off
synchronization strength among multiple redundant neural systems against the
noise amplitude. The impact of the coupling between such redundant systems is
quantified by the spectrum of the network Laplacian, and we discuss the role of
network topology in synchronization and in reducing the effect of noise. A
range of situations in which the mechanisms we model arise in brain science are
discussed, and we draw attention to experimental evidence suggesting that
cortical circuits capable of implementing the computations of interest here can
be found on several scales. Finally, simulations comparing theoretical bounds
to the relevant empirical quantities show that the theoretical estimates we
derive can be tight.Comment: Preprint, accepted for publication in Neural Computatio
- …