Search CORE

153,629 research outputs found

Statistical Mechanics of Nonlinear On-line Learning for Ensemble Teachers

Author: Freund Y.
Hara K.
Krogh A.
Miyoshi S.
Miyoshi S.
Miyoshi S.
Nishimori H.
Saad D.
Urakami M.
Urbanczik R.
Publication venue: 'Japan Society of Applied Physics'
Publication date: 16/05/2007
Field of study

We analyze the generalization performance of a student in a model composed of nonlinear perceptrons: a true teacher, ensemble teachers, and the student. We calculate the generalization error of the student analytically or numerically using statistical mechanics in the framework of on-line learning. We treat two well-known learning rules: Hebbian learning and perceptron learning. As a result, it is proven that the nonlinear model shows qualitatively different behaviors from the linear model. Moreover, it is clarified that Hebbian learning and perceptron learning show qualitatively different behaviors from each other. In Hebbian learning, we can analytically obtain the solutions. In this case, the generalization error monotonically decreases. The steady value of the generalization error is independent of the learning rate. The larger the number of teachers is and the more variety the ensemble teachers have, the smaller the generalization error is. In perceptron learning, we have to numerically obtain the solutions. In this case, the dynamical behaviors of the generalization error are non-monotonic. The smaller the learning rate is, the larger the number of teachers is; and the more variety the ensemble teachers have, the smaller the minimum value of the generalization error is.Comment: 13 pages, 9 figure

arXiv.org e-Print Archive

Crossref

Optimisation of on-line principal component analysis

Author: Biehl M
Biehl M
Biehl M
Bishop C
D Saad
E Schlösser
Hertz J
Kinouchi O
M Biehl
Rattray M
Saad D
Sanger T
Publication venue: 'IOP Publishing'
Publication date: 08/12/1998
Field of study

Different techniques, used to optimise on-line principal component analysis, are investigated by methods of statistical mechanics. These include local and global optimisation of node-dependent learning-rates which are shown to be very efficient in speeding up the learning process. They are investigated further for gaining insight into the learning rates' time-dependence, which is then employed for devising simple practical methods to improve training performance. Simulations demonstrate the benefit gained from using the new methods.Comment: 10 pages, 5 figure

arXiv.org e-Print Archive

Crossref

University of Groningen Digital Archive

On-Line Learning Theory of Soft Committee Machines with Correlated Hidden Units - Steepest Gradient Descent and Natural Gradient Descent -

Author: Amari S.
Amari S.
Fukumizu K.
Rattray M.
Saad D.
Yang H. H.
Publication venue: 'Japan Society of Applied Physics'
Publication date: 30/11/2002
Field of study

The permutation symmetry of the hidden units in multilayer perceptrons causes the saddle structure and plateaus of the learning dynamics in gradient learning methods. The correlation of the weight vectors of hidden units in a teacher network is thought to affect this saddle structure, resulting in a prolonged learning time, but this mechanism is still unclear. In this paper, we discuss it with regard to soft committee machines and on-line learning using statistical mechanics. Conventional gradient descent needs more time to break the symmetry as the correlation of the teacher weight vectors rises. On the other hand, no plateaus occur with natural gradient descent regardless of the correlation for the limit of a low learning rate. Analytical results support these dynamics around the saddle point.Comment: 7 pages, 6 figure

arXiv.org e-Print Archive

Crossref

Statistical Mechanics of On-Line Learning Under Concept Drift

Author: Abadi Fthi
Biehl Michael
Göpfert Christina
Hammer Barbara
Straat Michiel
Publication venue
Publication date: 01/01/2018
Field of study

We introduce a modeling framework for the investigation of on-line machine learning processes in non-stationary environments. We exemplify the approach in terms of two specific model situations: In the first, we consider the learning of a classification scheme from clustered data by means of prototype-based Learning Vector Quantization (LVQ). In the second, we study the training of layered neural networks with sigmoidal activations for the purpose of regression. In both cases, the target, i.e., the classification or regression scheme, is considered to change continuously while the system is trained from a stream of labeled data. We extend and apply methods borrowed from statistical physics which have been used frequently for the exact description of training dynamics in stationary environments. Extensions of the approach allow for the computation of typical learning curves in the presence of concept drift in a variety of model situations. First results are presented and discussed for stochastic drift processes in classification and regression problems. They indicate that LVQ is capable of tracking a classification scheme under drift to a non-trivial extent. Furthermore, we show that concept drift can cause the persistence of sub-optimal plateau states in gradient based training of layered neural networks for regression

Multidisciplinary Digital Publishing Institute

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Directory of Open Access Journals

Publications at Bielefeld University

Dissertations of the University of Groningen

Globally optimal on-line learning rules for multi-layer neural networks

Author: Amari S
Biehl M
Biehl M
Bishop C M
David Saad
Kinouchi O
LeCun Y
Magnus Rattray
Orr G B
Vicente R
Publication venue: 'IOP Publishing'
Publication date: 01/01/1997
Field of study

We present a method for determining the globally optimal on-line learning rule for a soft committee machine under a statistical mechanics framework. This rule maximizes the total reduction in generalization error over the whole learning process. A simple example demonstrates that the locally optimal rule, which maximizes the rate of decrease in generalization error, may perform poorly in comparison

CiteSeerX

Crossref

Aston Publications Explorer

The University of Manchester - Institutional Repository

On-line Learning of an Unlearnable True Teacher through Mobile Ensemble Teachers

Author: Engel A.
Inoue J.
Miyoshi S.
Miyoshi S.
Miyoshi S.
Nishimori H.
Saad D.
Urakami M.
Urbanczik R.
Utsumi H.
Publication venue: 'Japan Society of Applied Physics'
Publication date: 10/05/2008
Field of study

On-line learning of a hierarchical learning model is studied by a method from statistical mechanics. In our model a student of a simple perceptron learns from not a true teacher directly, but ensemble teachers who learn from the true teacher with a perceptron learning rule. Since the true teacher and the ensemble teachers are expressed as non-monotonic perceptron and simple ones, respectively, the ensemble teachers go around the unlearnable true teacher with the distance between them fixed in an asymptotic steady state. The generalization performance of the student is shown to exceed that of the ensemble teachers in a transient state, as was shown in similar ensemble-teachers models. Further, it is found that moving the ensemble teachers even in the steady state, in contrast to the fixed ensemble teachers, is efficient for the performance of the student.Comment: 18 pages, 8 figure

arXiv.org e-Print Archive

Crossref

Optimization of the Asymptotic Property of Mutual Learning Involving an Integration Mechanism of Ensemble Learning

Author: Breiman L.
Freund Y.
Hara K.
Hara K.
Krogh A.
Lazarevic A.
Metzler R.
Mislovaty R.
Miyoshi S.
Saad D.
Publication venue: 'Japan Society of Applied Physics'
Publication date: 21/09/2007
Field of study

We propose an optimization method of mutual learning which converges into the identical state of optimum ensemble learning within the framework of on-line learning, and have analyzed its asymptotic property through the statistical mechanics method.The proposed model consists of two learning steps: two students independently learn from a teacher, and then the students learn from each other through the mutual learning. In mutual learning, students learn from each other and the generalization error is improved even if the teacher has not taken part in the mutual learning. However, in the case of different initial overlaps(direction cosine) between teacher and students, a student with a larger initial overlap tends to have a larger generalization error than that of before the mutual learning. To overcome this problem, our proposed optimization method of mutual learning optimizes the step sizes of two students to minimize the asymptotic property of the generalization error. Consequently, the optimized mutual learning converges to a generalization error identical to that of the optimal ensemble learning. In addition, we show the relationship between the optimum step size of the mutual learning and the integration mechanism of the ensemble learning.Comment: 13 pages, 3 figures, submitted to Journal of Physical Society of Japa

arXiv.org e-Print Archive

Crossref

The role of biases in on-line learning of two-layer networks

Author: A. H. L. West
A. H. L. West
A. van Ooyen
Ansgar H. L. West
C. Cybenko
C. Van den Broeck
D. Barber
D. E. Rumelhart
D. Nguyen
D. Saad
D. Saad
D. Saad
D. Saad
David Saad
G. J. Bex
M. Biehl
M. Biehl
M. Lehtokangas
P. Reimann
P. Riegler
P. Sollich
P. Sollich
T. Heskes
T. L. H. Watkin
Y. K. Kim
Publication venue: 'American Physical Society (APS)'
Publication date: 01/03/1998
Field of study

The influence of biases on the learning dynamics of a two-layer neural network, a normalized soft-committee machine, is studied for on-line gradient descent learning. Within a statistical mechanics framework, numerical studies show that the inclusion of adjustable biases dramatically alters the learning dynamics found previously. The symmetric phase which has often been predominant in the original model all but disappears for a non-degenerate bias task. The extended model furthermore exhibits a much richer dynamical behavior, e.g. attractive suboptimal symmetric phases even for realizable cases and noiseless data

Crossref

Aston Publications Explorer

Natural gradient matrix momentum

Author: Rattray Magnus
Saad David
Scarpetta Silvia
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/09/1999
Field of study

Natural gradient learning is an efficient and principled method for improving on-line learning. In practical applications there will be an increased cost required in estimating and inverting the Fisher information matrix. We propose to use the matrix momentum algorithm in order to carry out efficient inversion and study the efficacy of a single step estimation of the Fisher information matrix. We analyse the proposed algorithm in a two-layer network, using a statistical mechanics framework which allows us to describe analytically the learning dynamics, and compare performance with true natural gradient learning and standard gradient descent

Aston Publications Explorer