3,036 research outputs found
On-line learning of non-monotonic rules by simple perceptron
We study the generalization ability of a simple perceptron which learns
unlearnable rules. The rules are presented by a teacher perceptron with a
non-monotonic transfer function. The student is trained in the on-line mode.
The asymptotic behaviour of the generalization error is estimated under various
conditions. Several learning strategies are proposed and improved to obtain the
theoretical lower bound of the generalization error.Comment: LaTeX 20 pages using IOP LaTeX preprint style file, 14 figure
Training a perceptron in a discrete weight space
On-line and batch learning of a perceptron in a discrete weight space, where
each weight can take different values, are examined analytically and
numerically. The learning algorithm is based on the training of the continuous
perceptron and prediction following the clipped weights. The learning is
described by a new set of order parameters, composed of the overlaps between
the teacher and the continuous/clipped students. Different scenarios are
examined among them on-line learning with discrete/continuous transfer
functions and off-line Hebb learning. The generalization error of the clipped
weights decays asymptotically as / in the case of on-line learning with binary/continuous activation
functions, respectively, where is the number of examples divided by N,
the size of the input vector and is a positive constant that decays
linearly with 1/L. For finite and , a perfect agreement between the
discrete student and the teacher is obtained for . A crossover to the generalization error ,
characterized continuous weights with binary output, is obtained for synaptic
depth .Comment: 10 pages, 5 figs., submitted to PR
On-Line AdaTron Learning of Unlearnable Rules
We study the on-line AdaTron learning of linearly non-separable rules by a
simple perceptron. Training examples are provided by a perceptron with a
non-monotonic transfer function which reduces to the usual monotonic relation
in a certain limit. We find that, although the on-line AdaTron learning is a
powerful algorithm for the learnable rule, it does not give the best possible
generalization error for unlearnable problems. Optimization of the learning
rate is shown to greatly improve the performance of the AdaTron algorithm,
leading to the best possible generalization error for a wide range of the
parameter which controls the shape of the transfer function.)Comment: RevTeX 17 pages, 8 figures, to appear in Phys.Rev.
HyperAdam: A Learnable Task-Adaptive Adam for Network Training
Deep neural networks are traditionally trained using human-designed
stochastic optimization algorithms, such as SGD and Adam. Recently, the
approach of learning to optimize network parameters has emerged as a promising
research topic. However, these learned black-box optimizers sometimes do not
fully utilize the experience in human-designed optimizers, therefore have
limitation in generalization ability. In this paper, a new optimizer, dubbed as
\textit{HyperAdam}, is proposed that combines the idea of "learning to
optimize" and traditional Adam optimizer. Given a network for training, its
parameter update in each iteration generated by HyperAdam is an adaptive
combination of multiple updates generated by Adam with varying decay rates. The
combination weights and decay rates in HyperAdam are adaptively learned
depending on the task. HyperAdam is modeled as a recurrent neural network with
AdamCell, WeightCell and StateCell. It is justified to be state-of-the-art for
various network training, such as multilayer perceptron, CNN and LSTM
A practical Bayesian framework for backpropagation networks
A quantitative and practical Bayesian framework is described for learning of mappings in feedforward networks. The framework makes possible (1) objective comparisons between solutions using alternative network architectures, (2) objective stopping rules for network pruning or growing procedures, (3) objective choice of magnitude and type of weight decay terms or additive regularizers (for penalizing large weights, etc.), (4) a measure of the effective number of well-determined parameters in a model, (5) quantified estimates of the error bars on network parameters and on network output, and (6) objective comparisons with alternative learning and interpolation models such as splines and radial basis functions. The Bayesian "evidence" automatically embodies "Occam's razor," penalizing overflexible and overcomplex models. The Bayesian approach helps detect poor underlying assumptions in learning models. For learning models well matched to a problem, a good correlation between generalization ability and the Bayesian evidence is obtained
Adaptive Normalized Risk-Averting Training For Deep Neural Networks
This paper proposes a set of new error criteria and learning approaches,
Adaptive Normalized Risk-Averting Training (ANRAT), to attack the non-convex
optimization problem in training deep neural networks (DNNs). Theoretically, we
demonstrate its effectiveness on global and local convexity lower-bounded by
the standard -norm error. By analyzing the gradient on the convexity index
, we explain the reason why to learn adaptively using
gradient descent works. In practice, we show how this method improves training
of deep neural networks to solve visual recognition tasks on the MNIST and
CIFAR-10 datasets. Without using pretraining or other tricks, we obtain results
comparable or superior to those reported in recent literature on the same tasks
using standard ConvNets + MSE/cross entropy. Performance on deep/shallow
multilayer perceptrons and Denoised Auto-encoders is also explored. ANRAT can
be combined with other quasi-Newton training methods, innovative network
variants, regularization techniques and other specific tricks in DNNs. Other
than unsupervised pretraining, it provides a new perspective to address the
non-convex optimization problem in DNNs.Comment: AAAI 2016, 0.39%~0.4% ER on MNIST with single 32-32-256-10 ConvNets,
code available at https://github.com/cauchyturing/ANRA
Towards Accurate One-Stage Object Detection with AP-Loss
One-stage object detectors are trained by optimizing classification-loss and
localization-loss simultaneously, with the former suffering much from extreme
foreground-background class imbalance issue due to the large number of anchors.
This paper alleviates this issue by proposing a novel framework to replace the
classification task in one-stage detectors with a ranking task, and adopting
the Average-Precision loss (AP-loss) for the ranking problem. Due to its
non-differentiability and non-convexity, the AP-loss cannot be optimized
directly. For this purpose, we develop a novel optimization algorithm, which
seamlessly combines the error-driven update scheme in perceptron learning and
backpropagation algorithm in deep networks. We verify good convergence property
of the proposed algorithm theoretically and empirically. Experimental results
demonstrate notable performance improvement in state-of-the-art one-stage
detectors based on AP-loss over different kinds of classification-losses on
various benchmarks, without changing the network architectures. Code is
available at https://github.com/cccorn/AP-loss.Comment: 13 pages, 7 figures, 4 tables, main paper + supplementary material,
accepted to CVPR 201
- …