470,108 research outputs found
Mean Field Analysis of Neural Networks: A Law of Large Numbers
Machine learning, and in particular neural network models, have
revolutionized fields such as image, text, and speech recognition. Today, many
important real-world applications in these areas are driven by neural networks.
There are also growing applications in engineering, robotics, medicine, and
finance. Despite their immense success in practice, there is limited
mathematical understanding of neural networks. This paper illustrates how
neural networks can be studied via stochastic analysis, and develops approaches
for addressing some of the technical challenges which arise. We analyze
one-layer neural networks in the asymptotic regime of simultaneously (A) large
network sizes and (B) large numbers of stochastic gradient descent training
iterations. We rigorously prove that the empirical distribution of the neural
network parameters converges to the solution of a nonlinear partial
differential equation. This result can be considered a law of large numbers for
neural networks. In addition, a consequence of our analysis is that the trained
parameters of the neural network asymptotically become independent, a property
which is commonly called "propagation of chaos"
Truncated Variational EM for Semi-Supervised Neural Simpletrons
Inference and learning for probabilistic generative networks is often very
challenging and typically prevents scalability to as large networks as used for
deep discriminative approaches. To obtain efficiently trainable, large-scale
and well performing generative networks for semi-supervised learning, we here
combine two recent developments: a neural network reformulation of hierarchical
Poisson mixtures (Neural Simpletrons), and a novel truncated variational EM
approach (TV-EM). TV-EM provides theoretical guarantees for learning in
generative networks, and its application to Neural Simpletrons results in
particularly compact, yet approximately optimal, modifications of learning
equations. If applied to standard benchmarks, we empirically find, that
learning converges in fewer EM iterations, that the complexity per EM iteration
is reduced, and that final likelihood values are higher on average. For the
task of classification on data sets with few labels, learning improvements
result in consistently lower error rates if compared to applications without
truncation. Experiments on the MNIST data set herein allow for comparison to
standard and state-of-the-art models in the semi-supervised setting. Further
experiments on the NIST SD19 data set show the scalability of the approach when
a manifold of additional unlabeled data is available
- …