Search CORE

618 research outputs found

Learning with noise and regularizers in multilayer neural networks

Author: Saad David
Solla Sara A.
Publication venue
Publication date: 01/01/1996
Field of study

We study the effect of two types of noise, data noise and model noise, in an on-line gradient-descent learning scenario for general two-layer student network with an arbitrary number of hidden units. Training examples are randomly drawn input vectors labeled by a two-layer teacher network with an arbitrary number of hidden units. Data is then corrupted by Gaussian noise affecting either the output or the model itself. We examine the effect of both types of noise on the evolution of order parameters and the generalization error in various phases of the learning process

Aston Publications Explorer

Noise, regularizers, and unrealizable scenarios in online learning from restricted training sets

Author: A. Krogh
A.C.C. Coolen
A.C.C. Coolen
A.C.C. Coolen and
B. Lopez
C.M. Bishop
C.W.H. Mace
C.W.H. Mace
D. Saad
D. Saad
D. Saad
D. Saad
David Saad
H. Horner
H. Horner
J.A. Hertz
M. Biehl
M. Biehl
M. Rattray
M. Rattray
M. Rattray
M. Rattray
P. Sollich
S. Lee
W. Kinzel
Y. LeCun
Yuan-Sheng Xiong
Publication venue: 'American Physical Society (APS)'
Publication date: 27/06/2001
Field of study

We study the dynamics of on-line learning in multilayer neural networks where training examples are sampled with repetition and where the number of examples scales with the number of network weights. The analysis is carried out using the dynamical replica method aimed at obtaining a closed set of coupled equations for a set of macroscopic variables from which both training and generalization errors can be calculated. We focus on scenarios whereby training examples are corrupted by additive Gaussian output noise and regularizers are introduced to improve the network performance. The dependence of the dynamics on the noise level, with and without regularizers, is examined, as well as that of the asymptotic values obtained for both training and generalization errors. We also demonstrate the ability of the method to approximate the learning dynamics in structurally unrealizable scenarios. The theoretical results show good agreement with those obtained by computer simulations

Crossref

Aston Publications Explorer

A practical Bayesian framework for backpropagation networks

Author: MacKay David J. C.
Publication venue: 'MIT Press - Journals'
Publication date: 01/05/1992
Field of study

A quantitative and practical Bayesian framework is described for learning of mappings in feedforward networks. The framework makes possible (1) objective comparisons between solutions using alternative network architectures, (2) objective stopping rules for network pruning or growing procedures, (3) objective choice of magnitude and type of weight decay terms or additive regularizers (for penalizing large weights, etc.), (4) a measure of the effective number of well-determined parameters in a model, (5) quantified estimates of the error bars on network parameters and on network output, and (6) objective comparisons with alternative learning and interpolation models such as splines and radial basis functions. The Bayesian "evidence" automatically embodies "Occam's razor," penalizing overflexible and overcomplex models. The Bayesian approach helps detect poor underlying assumptions in learning models. For learning models well matched to a problem, a good correlation between generalization ability and the Bayesian evidence is obtained

Caltech Authors

The theory of on-line learning: a statistical physics approach

Author: Saad David
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2003
Field of study

In this paper we review recent theoretical approaches for analysing the dynamics of on-line learning in multilayer neural networks using methods adopted from statistical physics. The analysis is based on monitoring a set of macroscopic variables from which the generalisation error can be calculated. A closed set of dynamical equations for the macroscopic variables is derived analytically and solved numerically. The theoretical framework is then employed for defining optimal learning parameters and for analysing the incorporation of second order information into the learning process using natural gradient descent and matrix-momentum based methods. We will also briefly explain an extension of the original framework for analysing the case where training examples are sampled with repetition

Aston Publications Explorer

Final report on EPSRC research grant GR/L19232

Author: Saad David
Publication venue: Aston University
Publication date: 01/01/1999
Field of study

Aston Publications Explorer