414,908 research outputs found
Phase transitions in soft-committee machines
Equilibrium statistical physics is applied to layered neural networks with
differentiable activation functions. A first analysis of off-line learning in
soft-committee machines with a finite number (K) of hidden units learning a
perfectly matching rule is performed. Our results are exact in the limit of
high training temperatures. For K=2 we find a second order phase transition
from unspecialized to specialized student configurations at a critical size P
of the training set, whereas for K > 2 the transition is first order. Monte
Carlo simulations indicate that our results are also valid for moderately low
temperatures qualitatively. The limit K to infinity can be performed
analytically, the transition occurs after presenting on the order of N K
examples. However, an unspecialized metastable state persists up to P= O (N
K^2).Comment: 8 pages, 4 figure
Learning and generalization theories of large committee--machines
The study of the distribution of volumes associated to the internal
representations of learning examples allows us to derive the critical learning
capacity () of large committee machines,
to verify the stability of the solution in the limit of a large number of
hidden units and to find a Bayesian generalization cross--over at .Comment: 14 pages, revte
Statistical physics and practical training of soft-committee machines
Equilibrium states of large layered neural networks with differentiable
activation function and a single, linear output unit are investigated using the
replica formalism. The quenched free energy of a student network with a very
large number of hidden units learning a rule of perfectly matching complexity
is calculated analytically. The system undergoes a first order phase transition
from unspecialized to specialized student configurations at a critical size of
the training set. Computer simulations of learning by stochastic gradient
descent from a fixed training set demonstrate that the equilibrium results
describe quantitatively the plateau states which occur in practical training
procedures at sufficiently small but finite learning rates.Comment: 11 pages, 4 figure
Finite size scaling in neural networks
We demonstrate that the fraction of pattern sets that can be stored in
single- and hidden-layer perceptrons exhibits finite size scaling. This feature
allows to estimate the critical storage capacity \alpha_c from simulations of
relatively small systems. We illustrate this approach by determining \alpha_c,
together with the finite size scaling exponent \nu, for storing Gaussian
patterns in committee and parity machines with binary couplings and up to K=5
hidden units.Comment: 4 pages, RevTex, 5 figures, uses multicol.sty and psfig.st
Learning in ultrametric committee machines
The problem of learning by examples in ultrametric committee machines (UCMs) is studied within the framework of statistical mechanics. Using the replica formalism we calculate the average generalization error in UCMs with L hidden layers and for a large enough number of units. In most of the regimes studied we find that the generalization error, as a function of the number of examples presented, develops a discontinuous drop at a critical value of the load parameter. We also find that when L>1 a number of teacher networks with the same number of hidden layers and different overlaps induce learning processes with the same critical points
Correlation of internal representations in feed-forward neural networks
Feed-forward multilayer neural networks implementing random input-output
mappings develop characteristic correlations between the activity of their
hidden nodes which are important for the understanding of the storage and
generalization performance of the network. It is shown how these correlations
can be calculated from the joint probability distribution of the aligning
fields at the hidden units for arbitrary decoder function between hidden layer
and output. Explicit results are given for the parity-, and-, and
committee-machines with arbitrary number of hidden nodes near saturation.Comment: 6 pages, latex, 1 figur
Functional Optimisation of Online Algorithms in Multilayer Neural Networks
We study the online dynamics of learning in fully connected soft committee
machines in the student-teacher scenario. The locally optimal modulation
function, which determines the learning algorithm, is obtained from a
variational argument in such a manner as to maximise the average generalisation
error decay per example. Simulations results for the resulting algorithm are
presented for a few cases. The symmetric phase plateaux are found to be vastly
reduced in comparison to those found when online backpropagation algorithms are
used. A discussion of the implementation of these ideas as practical algorithms
is given
Storage capacity of ultrametric committee machines
The problem of computing the storage capacity of a feed-forward network, with L hidden layers, N inputs, and K units in the first hidden layer, is analyzed using techniques from statistical mechanics. We found that the storage capacity strongly depends on the network architecture αc ⌠(log K)1-1/2L and that the number of units K limits the number of possible hidden layers L through the relationship 2L - 1 < 2log K
Computational capabilities of multilayer committee machines
We obtained an analytical expression for the computational complexity of many layered committee machines with a finite number of hidden layers (L < 8) using the generalization complexity measure introduced by Franco et al (2006) IEEE Trans. Neural Netw. 17 578. Although our result is valid in the large-size limit and for an overlap synaptic matrix that is ultrametric, it provides a useful tool for inferring the appropriate architecture a network must have to reproduce an arbitrary realizable Boolean function
Weight Space Structure and Internal Representations: a Direct Approach to Learning and Generalization in Multilayer Neural Network
We analytically derive the geometrical structure of the weight space in
multilayer neural networks (MLN), in terms of the volumes of couplings
associated to the internal representations of the training set. Focusing on the
parity and committee machines, we deduce their learning and generalization
capabilities both reinterpreting some known properties and finding new exact
results. The relationship between our approach and information theory as well
as the Mitchison--Durbin calculation is established. Our results are exact in
the limit of a large number of hidden units, showing that MLN are a class of
exactly solvable models with a simple interpretation of replica symmetry
breaking.Comment: 12 pages, 1 compressed ps figure (uufile), RevTeX fil
- âŠ