57,033 research outputs found
Learning from Minimum Entropy Queries in a Large Committee Machine
In supervised learning, the redundancy contained in random examples can be
avoided by learning from queries. Using statistical mechanics, we study
learning from minimum entropy queries in a large tree-committee machine. The
generalization error decreases exponentially with the number of training
examples, providing a significant improvement over the algebraic decay for
random examples. The connection between entropy and generalization error in
multi-layer networks is discussed, and a computationally cheap algorithm for
constructing queries is suggested and analysed.Comment: 4 pages, REVTeX, multicol, epsf, two postscript figures. To appear in
Physical Review E (Rapid Communications
Phase Transitions of Neural Networks
The cooperative behaviour of interacting neurons and synapses is studied
using models and methods from statistical physics. The competition between
training error and entropy may lead to discontinuous properties of the neural
network. This is demonstrated for a few examples: Perceptron, associative
memory, learning from examples, generalization, multilayer networks, structure
recognition, Bayesian estimate, on-line training, noise estimation and time
series generation.Comment: Plenary talk for MINERVA workshop on mesoscopics, fractals and neural
networks, Eilat, March 1997 Postscript Fil
Error correcting code using tree-like multilayer perceptron
An error correcting code using a tree-like multilayer perceptron is proposed.
An original message \mbi{s}^0 is encoded into a codeword \boldmath{y}_0
using a tree-like committee machine (committee tree) or a tree-like parity
machine (parity tree). Based on these architectures, several schemes featuring
monotonic or non-monotonic units are introduced. The codeword \mbi{y}_0 is
then transmitted via a Binary Asymmetric Channel (BAC) where it is corrupted by
noise. The analytical performance of these schemes is investigated using the
replica method of statistical mechanics. Under some specific conditions, some
of the proposed schemes are shown to saturate the Shannon bound at the infinite
codeword length limit. The influence of the monotonicity of the units on the
performance is also discussed.Comment: 23 pages, 3 figures, Content has been extended and revise
Learning and generalization theories of large committee--machines
The study of the distribution of volumes associated to the internal
representations of learning examples allows us to derive the critical learning
capacity () of large committee machines,
to verify the stability of the solution in the limit of a large number of
hidden units and to find a Bayesian generalization cross--over at .Comment: 14 pages, revte
Shaping the learning landscape in neural networks around wide flat minima
Learning in Deep Neural Networks (DNN) takes place by minimizing a non-convex
high-dimensional loss function, typically by a stochastic gradient descent
(SGD) strategy. The learning process is observed to be able to find good
minimizers without getting stuck in local critical points, and that such
minimizers are often satisfactory at avoiding overfitting. How these two
features can be kept under control in nonlinear devices composed of millions of
tunable connections is a profound and far reaching open question. In this paper
we study basic non-convex one- and two-layer neural network models which learn
random patterns, and derive a number of basic geometrical and algorithmic
features which suggest some answers. We first show that the error loss function
presents few extremely wide flat minima (WFM) which coexist with narrower
minima and critical points. We then show that the minimizers of the
cross-entropy loss function overlap with the WFM of the error loss. We also
show examples of learning devices for which WFM do not exist. From the
algorithmic perspective we derive entropy driven greedy and message passing
algorithms which focus their search on wide flat regions of minimizers. In the
case of SGD and cross-entropy loss, we show that a slow reduction of the norm
of the weights along the learning process also leads to WFM. We corroborate the
results by a numerical study of the correlations between the volumes of the
minimizers, their Hessian and their generalization performance on real data.Comment: 37 pages (16 main text), 10 figures (7 main text
- …