Search CORE

57,033 research outputs found

Learning from Minimum Entropy Queries in a Large Committee Machine

Author: E. Baum
G. J. Mitchison
H. S. Seung
H. Schwarze
J.-N. Hwang
M. Opper
P. Sollich
Peter Sollich
V. Vapnik
Y. Freund
Publication venue: 'American Physical Society (APS)'
Publication date: 11/04/1996
Field of study

In supervised learning, the redundancy contained in random examples can be avoided by learning from queries. Using statistical mechanics, we study learning from minimum entropy queries in a large tree-committee machine. The generalization error decreases exponentially with the number of training examples, providing a significant improvement over the algebraic decay for random examples. The connection between entropy and generalization error in multi-layer networks is discussed, and a computationally cheap algorithm for constructing queries is suggested and analysed.Comment: 4 pages, REVTeX, multicol, epsf, two postscript figures. To appear in Physical Review E (Rapid Communications

arXiv.org e-Print Archive

Crossref

Phase Transitions of Neural Networks

Author: Biehl M.
Opper M.
Wolfgang Kinzel
Yeomans J.
Publication venue: 'Informa UK Limited'
Publication date: 11/04/1997
Field of study

The cooperative behaviour of interacting neurons and synapses is studied using models and methods from statistical physics. The competition between training error and entropy may lead to discontinuous properties of the neural network. This is demonstrated for a few examples: Perceptron, associative memory, learning from examples, generalization, multilayer networks, structure recognition, Bayesian estimate, on-line training, noise estimation and time series generation.Comment: Plenary talk for MINERVA workshop on mesoscopics, fractals and neural networks, Eilat, March 1997 Postscript Fil

arXiv.org e-Print Archive

Crossref

Error correcting code using tree-like multilayer perceptron

Author: A. Engel
C. E. Shannon
D. J. C. MacKay
D. MacKay
Florent Cousseau
H. Nishimori
H. Nishimori
Kazushi Mimura
Masato Okada
Publication venue: 'American Physical Society (APS)'
Publication date: 16/01/2010
Field of study

An error correcting code using a tree-like multilayer perceptron is proposed. An original message \mbi{s}^0 is encoded into a codeword \boldmath{y}_0 using a tree-like committee machine (committee tree) or a tree-like parity machine (parity tree). Based on these architectures, several schemes featuring monotonic or non-monotonic units are introduced. The codeword \mbi{y}_0 is then transmitted via a Binary Asymmetric Channel (BAC) where it is corrupted by noise. The analytical performance of these schemes is investigated using the replica method of statistical mechanics. Under some specific conditions, some of the proposed schemes are shown to saturate the Shannon bound at the infinite codeword length limit. The influence of the monotonicity of the units on the performance is also discussed.Comment: 23 pages, 3 figures, Content has been extended and revise

arXiv.org e-Print Archive

Crossref

Learning and generalization theories of large committee--machines

Author: Monasson Remi
Zecchina Riccardo
Publication venue
Publication date: 01/01/1996
Field of study

The study of the distribution of volumes associated to the internal representations of learning examples allows us to derive the critical learning capacity (

\alpha_c=\frac{16}{\pi} \sqrt{\ln K}

) of large committee machines, to verify the stability of the solution in the limit of a large number

K

of hidden units and to find a Bayesian generalization cross--over at

\alpha=K

.Comment: 14 pages, revte

arXiv.org e-Print Archive

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Shaping the learning landscape in neural networks around wide flat minima

Author: Baldassi Carlo
Pittorino Fabrizio
Zecchina Riccardo
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 01/01/2020
Field of study

Learning in Deep Neural Networks (DNN) takes place by minimizing a non-convex high-dimensional loss function, typically by a stochastic gradient descent (SGD) strategy. The learning process is observed to be able to find good minimizers without getting stuck in local critical points, and that such minimizers are often satisfactory at avoiding overfitting. How these two features can be kept under control in nonlinear devices composed of millions of tunable connections is a profound and far reaching open question. In this paper we study basic non-convex one- and two-layer neural network models which learn random patterns, and derive a number of basic geometrical and algorithmic features which suggest some answers. We first show that the error loss function presents few extremely wide flat minima (WFM) which coexist with narrower minima and critical points. We then show that the minimizers of the cross-entropy loss function overlap with the WFM of the error loss. We also show examples of learning devices for which WFM do not exist. From the algorithmic perspective we derive entropy driven greedy and message passing algorithms which focus their search on wide flat regions of minimizers. In the case of SGD and cross-entropy loss, we show that a slow reduction of the norm of the weights along the learning process also leads to WFM. We corroborate the results by a numerical study of the correlations between the volumes of the minimizers, their Hessian and their generalization performance on real data.Comment: 37 pages (16 main text), 10 figures (7 main text

arXiv.org e-Print Archive

Archivio istituzionale della Ricerca - Bocconi

Archivio istituzionale della ricerca - Politecnico di Milano