Search CORE

3,036 research outputs found

On-line learning of non-monotonic rules by simple perceptron

Author: Barkai N
Biehl M
Biehl M
Boffetta G
Hertz J A
Hidetoshi Nishimori
Inoue J
Jun-ichi Inoue
Kabashima Y
Kabashima Y
Kinouchi O
Kinzel W
Monasson R
Morita M
Nishimori H
Opper M
Opper M
Vallet F
Yoshiyuki Kabashima
Publication venue: 'IOP Publishing'
Publication date: 02/03/1997
Field of study

We study the generalization ability of a simple perceptron which learns unlearnable rules. The rules are presented by a teacher perceptron with a non-monotonic transfer function. The student is trained in the on-line mode. The asymptotic behaviour of the generalization error is estimated under various conditions. Several learning strategies are proposed and improved to obtain the theoretical lower bound of the generalization error.Comment: LaTeX 20 pages using IOP LaTeX preprint style file, 14 figure

arXiv.org e-Print Archive

Crossref

Training a perceptron in a discrete weight space

Author: A. Buhot
B. Derrida
C. Van den Broeck
D. Barber
D. Malzahn
D. Saad
D. Saad
E. Gardner
F. Vallet
F. Vallet
H. Gutfreund
I. Kanter
Ido Kanter
J. Hertz
J. Schietse
M. Biehl
M. Biehl
M. Bouten
M. Bouten
M. Opper
M. Rosen-Zvi
M. Rosen-Zvi
Michal Rosen-Zvi
O. Kinouchi
P. Sollich
R. Meir
T. L. H. Watkin
W. Kinzel
W. Kinzel
Publication venue: 'American Physical Society (APS)'
Publication date: 27/02/2001
Field of study

On-line and batch learning of a perceptron in a discrete weight space, where each weight can take

2 L+1

different values, are examined analytically and numerically. The learning algorithm is based on the training of the continuous perceptron and prediction following the clipped weights. The learning is described by a new set of order parameters, composed of the overlaps between the teacher and the continuous/clipped students. Different scenarios are examined among them on-line learning with discrete/continuous transfer functions and off-line Hebb learning. The generalization error of the clipped weights decays asymptotically as

exp(-K \alpha^2)

exp(-e^{|\lambda| \alpha})

in the case of on-line learning with binary/continuous activation functions, respectively, where

\alpha

is the number of examples divided by N, the size of the input vector and

K

is a positive constant that decays linearly with 1/L. For finite

N

and

L

, a perfect agreement between the discrete student and the teacher is obtained for

\alpha \propto \sqrt{L \ln(NL)}

. A crossover to the generalization error

\propto 1/\alpha

, characterized continuous weights with binary output, is obtained for synaptic depth

L > O(\sqrt{N})

.Comment: 10 pages, 5 figs., submitted to PR

arXiv.org e-Print Archive

Crossref

On-Line AdaTron Learning of Unlearnable Rules

Author: G. Boffetta
H. Nishimori
Hidetoshi Nishimori
J. A. Hertz
J. Inoue
J. K. Anlauf
Jun-ichi Inoue
M. Biehl
M. Morita
M. Opper
M. Opper
O. Kinouchi
P. Riegler
R. Monasson
T. H. Watkin
T. H. Watkin
Y. Kabashima
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/1997
Field of study

We study the on-line AdaTron learning of linearly non-separable rules by a simple perceptron. Training examples are provided by a perceptron with a non-monotonic transfer function which reduces to the usual monotonic relation in a certain limit. We find that, although the on-line AdaTron learning is a powerful algorithm for the learnable rule, it does not give the best possible generalization error for unlearnable problems. Optimization of the learning rate is shown to greatly improve the performance of the AdaTron algorithm, leading to the best possible generalization error for a wide range of the parameter which controls the shape of the transfer function.)Comment: RevTeX 17 pages, 8 figures, to appear in Phys.Rev.

arXiv.org e-Print Archive

CiteSeerX

Crossref

HyperAdam: A Learnable Task-Adaptive Adam for Network Training

Author: Sun Jian
Wang Shipeng
Xu Zongben
Publication venue
Publication date: 21/11/2018
Field of study

Deep neural networks are traditionally trained using human-designed stochastic optimization algorithms, such as SGD and Adam. Recently, the approach of learning to optimize network parameters has emerged as a promising research topic. However, these learned black-box optimizers sometimes do not fully utilize the experience in human-designed optimizers, therefore have limitation in generalization ability. In this paper, a new optimizer, dubbed as \textit{HyperAdam}, is proposed that combines the idea of "learning to optimize" and traditional Adam optimizer. Given a network for training, its parameter update in each iteration generated by HyperAdam is an adaptive combination of multiple updates generated by Adam with varying decay rates. The combination weights and decay rates in HyperAdam are adaptively learned depending on the task. HyperAdam is modeled as a recurrent neural network with AdamCell, WeightCell and StateCell. It is justified to be state-of-the-art for various network training, such as multilayer perceptron, CNN and LSTM

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

A practical Bayesian framework for backpropagation networks

Author: MacKay David J. C.
Publication venue: 'MIT Press - Journals'
Publication date: 01/05/1992
Field of study

A quantitative and practical Bayesian framework is described for learning of mappings in feedforward networks. The framework makes possible (1) objective comparisons between solutions using alternative network architectures, (2) objective stopping rules for network pruning or growing procedures, (3) objective choice of magnitude and type of weight decay terms or additive regularizers (for penalizing large weights, etc.), (4) a measure of the effective number of well-determined parameters in a model, (5) quantified estimates of the error bars on network parameters and on network output, and (6) objective comparisons with alternative learning and interpolation models such as splines and radial basis functions. The Bayesian "evidence" automatically embodies "Occam's razor," penalizing overflexible and overcomplex models. The Bayesian approach helps detect poor underlying assumptions in learning models. For learning models well matched to a problem, a good correlation between generalization ability and the Bayesian evidence is obtained

Caltech Authors

Asking intelligent questions: the statistical mechanics of query learning

Author: Sollich P.
Publication venue: The University of Edinburgh
Publication date: 01/01/1995
Field of study

Edinburgh Research Archive

Adaptive Normalized Risk-Averting Training For Deep Neural Networks

Author: Lo James
Oates Tim
Wang Zhiguang
Publication venue
Publication date: 02/03/2016
Field of study

This paper proposes a set of new error criteria and learning approaches, Adaptive Normalized Risk-Averting Training (ANRAT), to attack the non-convex optimization problem in training deep neural networks (DNNs). Theoretically, we demonstrate its effectiveness on global and local convexity lower-bounded by the standard

L_p

-norm error. By analyzing the gradient on the convexity index

\lambda

, we explain the reason why to learn

\lambda

adaptively using gradient descent works. In practice, we show how this method improves training of deep neural networks to solve visual recognition tasks on the MNIST and CIFAR-10 datasets. Without using pretraining or other tricks, we obtain results comparable or superior to those reported in recent literature on the same tasks using standard ConvNets + MSE/cross entropy. Performance on deep/shallow multilayer perceptrons and Denoised Auto-encoders is also explored. ANRAT can be combined with other quasi-Newton training methods, innovative network variants, regularization techniques and other specific tricks in DNNs. Other than unsupervised pretraining, it provides a new perspective to address the non-convex optimization problem in DNNs.Comment: AAAI 2016, 0.39%~0.4% ER on MNIST with single 32-32-256-10 ConvNets, code available at https://github.com/cauchyturing/ANRA

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Towards Accurate One-Stage Object Detection with AP-Loss

Author: Chen Kean
Chen Zhibo
Duan Lingyu
He Changwei
Li Jianguo
Lin Weiyao
See John
Wang Ji
Zou Junni
Publication venue
Publication date: 01/01/2019
Field of study

One-stage object detectors are trained by optimizing classification-loss and localization-loss simultaneously, with the former suffering much from extreme foreground-background class imbalance issue due to the large number of anchors. This paper alleviates this issue by proposing a novel framework to replace the classification task in one-stage detectors with a ranking task, and adopting the Average-Precision loss (AP-loss) for the ranking problem. Due to its non-differentiability and non-convexity, the AP-loss cannot be optimized directly. For this purpose, we develop a novel optimization algorithm, which seamlessly combines the error-driven update scheme in perceptron learning and backpropagation algorithm in deep networks. We verify good convergence property of the proposed algorithm theoretically and empirically. Experimental results demonstrate notable performance improvement in state-of-the-art one-stage detectors based on AP-loss over different kinds of classification-losses on various benchmarks, without changing the network architectures. Code is available at https://github.com/cccorn/AP-loss.Comment: 13 pages, 7 figures, 4 tables, main paper + supplementary material, accepted to CVPR 201

arXiv.org e-Print Archive

Heriot Watt Pure

Crossref

SHDL@MMU Digital Repository