526,307 research outputs found

    Expressing linear equality constraints in feedforward neural networks

    Full text link
    We seek to impose linear, equality constraints in feedforward neural networks. As top layer predictors are usually nonlinear, this is a difficult task if we seek to deploy standard convex optimization methods and strong duality. To overcome this, we introduce a new saddle-point Lagrangian with auxiliary predictor variables on which constraints are imposed. Elimination of the auxiliary variables leads to a dual minimization problem on the Lagrange multipliers introduced to satisfy the linear constraints. This minimization problem is combined with the standard learning problem on the weight matrices. From this theoretical line of development, we obtain the surprising interpretation of Lagrange parameters as additional, penultimate layer hidden units with fixed weights stemming from the constraints. Consequently, standard minimization approaches can be used despite the inclusion of Lagrange parameters -- a very satisfying, albeit unexpected, discovery. Examples ranging from multi-label classification to constrained autoencoders are envisaged in the future

    Learning and generalization in radial basis function networks

    Get PDF
    The aim of supervised learning is to approximate an unknown target function by adjusting the parameters of a learning model in response to possibly noisy examples generated by the target function. The performance of the learning model at this task can be quantified by examining its generalization ability. Initially the concept of generalization is reviewed, and various methods of measuring it, such as generalization error, prediction error, PAC learning and the evidence, are discussed and the relations between them examined. Some of these relations are dependent on the architecture of the learning model.Two architectures are prevalent in practical supervised learning: the multi -layer perceptron (MLP) and the radial basis function network (RBF). While the RBF has previously been examined from a worst -case perspective, this gives little insight into the performance and phenomena that can be expected in the typical case. This thesis focusses on the properties of learning and generalization that can be expected on average in the RBF.There are two methods in use for training the RBF. The basis functions can be fixed in advance, utilising an unsupervised learning algorithm, or can adapt during the training process. For the case in which the basis functions are fixed, the typical generalization error given a data set of particular size is calculated by employing the Bayesian framework. The effects of noisy data and regularization are examined, the optimal settings of the parameters that control the learning process are calculated, and the consequences of a mismatch between the learning model and the data -generating mechanism are demonstrated.The second case, in which the basis functions are adapted, is studied utilising the on -line learning paradigm. The average evolution of generalization error is calculated in a manner which allows the phenomena of the learning process, such as the specialization of the basis functions, to be eludicated. The three most important stages of training: the symmetric phase, the symmetry- breaking phase and the convergence phase, are analyzed in detail; the convergence phase analysis allows the derivation of maximal and optimal learning rates. Noise on both the inputs and outputs of the data -generating mechanism is introduced, and the consequences examined. Regularization via weight decay is also studied, as are the effects of the learning model being poorly matched to the data generator

    On-line learning of non-monotonic rules by simple perceptron

    Full text link
    We study the generalization ability of a simple perceptron which learns unlearnable rules. The rules are presented by a teacher perceptron with a non-monotonic transfer function. The student is trained in the on-line mode. The asymptotic behaviour of the generalization error is estimated under various conditions. Several learning strategies are proposed and improved to obtain the theoretical lower bound of the generalization error.Comment: LaTeX 20 pages using IOP LaTeX preprint style file, 14 figure

    Effect of time-correlation of input patterns on the convergence of on-line learning

    Get PDF
    We studied the effects of time correlation of subsequent patterns on the convergence of on-line learning by a feedforward neural network with backpropagation algorithm. By using chaotic time series as sequences of correlated patterns, we found that the unexpected scaling of converging time with learning parameter emerges when time-correlated patterns accelerate learning process.Comment: 8 pages(Revtex), 5 figure

    Powerpropagation: A sparsity inducing weight reparameterisation

    Get PDF
    The training of sparse neural networks is becoming an increasingly important tool for reducing the computational footprint of models at training and evaluation, as well enabling the effective scaling up of models. Whereas much work over the years has been dedicated to specialised pruning techniques, little attention has been paid to the inherent effect of gradient based training on model sparsity. In this work, we introduce Powerpropagation, a new weight-parameterisation for neural networks that leads to inherently sparse models. Exploiting the behaviour of gradient descent, our method gives rise to weight updates exhibiting a “rich get richer” dynamic, leaving low-magnitude parameters largely unaffected by learning. Models trained in this manner exhibit similar performance, but have a distribution with markedly higher density at zero, allowing more parameters to be pruned safely. Powerpropagation is general, intuitive, cheap and straight-forward to implement and can readily be combined with various other techniques. To highlight its versatility, we explore it in two very different settings: Firstly, following a recent line of work, we investigate its effect on sparse training for resource-constrained settings. Here, we combine Powerpropagation with a traditional weight-pruning technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing superior performance on the ImageNet benchmark. Secondly, we advocate the use of sparsity in overcoming catastrophic forgetting, where compressed representations allow accommodating a large number of tasks at fixed model capacity. In all cases our reparameterisation considerably increases the efficacy of the off-the-shelf methods

    A mathematical analysis of the effects of Hebbian learning rules on the dynamics and structure of discrete-time random recurrent neural networks

    Get PDF
    We present a mathematical analysis of the effects of Hebbian learning in random recurrent neural networks, with a generic Hebbian learning rule including passive forgetting and different time scales for neuronal activity and learning dynamics. Previous numerical works have reported that Hebbian learning drives the system from chaos to a steady state through a sequence of bifurcations. Here, we interpret these results mathematically and show that these effects, involving a complex coupling between neuronal dynamics and synaptic graph structure, can be analyzed using Jacobian matrices, which introduce both a structural and a dynamical point of view on the neural network evolution. Furthermore, we show that the sensitivity to a learned pattern is maximal when the largest Lyapunov exponent is close to 0. We discuss how neural networks may take advantage of this regime of high functional interest

    The Importance of Clipping in Neurocontrol by Direct Gradient Descent on the Cost-to-Go Function and in Adaptive Dynamic Programming

    Full text link
    In adaptive dynamic programming, neurocontrol and reinforcement learning, the objective is for an agent to learn to choose actions so as to minimise a total cost function. In this paper we show that when discretized time is used to model the motion of the agent, it can be very important to do "clipping" on the motion of the agent in the final time step of the trajectory. By clipping we mean that the final time step of the trajectory is to be truncated such that the agent stops exactly at the first terminal state reached, and no distance further. We demonstrate that when clipping is omitted, learning performance can fail to reach the optimum; and when clipping is done properly, learning performance can improve significantly. The clipping problem we describe affects algorithms which use explicit derivatives of the model functions of the environment to calculate a learning gradient. These include Backpropagation Through Time for Control, and methods based on Dual Heuristic Dynamic Programming. However the clipping problem does not significantly affect methods based on Heuristic Dynamic Programming, Temporal Differences or Policy Gradient Learning algorithms. Similarly, the clipping problem does not affect fixed-length finite-horizon problems

    Optimization of the Asymptotic Property of Mutual Learning Involving an Integration Mechanism of Ensemble Learning

    Full text link
    We propose an optimization method of mutual learning which converges into the identical state of optimum ensemble learning within the framework of on-line learning, and have analyzed its asymptotic property through the statistical mechanics method.The proposed model consists of two learning steps: two students independently learn from a teacher, and then the students learn from each other through the mutual learning. In mutual learning, students learn from each other and the generalization error is improved even if the teacher has not taken part in the mutual learning. However, in the case of different initial overlaps(direction cosine) between teacher and students, a student with a larger initial overlap tends to have a larger generalization error than that of before the mutual learning. To overcome this problem, our proposed optimization method of mutual learning optimizes the step sizes of two students to minimize the asymptotic property of the generalization error. Consequently, the optimized mutual learning converges to a generalization error identical to that of the optimal ensemble learning. In addition, we show the relationship between the optimum step size of the mutual learning and the integration mechanism of the ensemble learning.Comment: 13 pages, 3 figures, submitted to Journal of Physical Society of Japa

    Ensemble learning of linear perceptron; Online learning theory

    Full text link
    Within the framework of on-line learning, we study the generalization error of an ensemble learning machine learning from a linear teacher perceptron. The generalization error achieved by an ensemble of linear perceptrons having homogeneous or inhomogeneous initial weight vectors is precisely calculated at the thermodynamic limit of a large number of input elements and shows rich behavior. Our main findings are as follows. For learning with homogeneous initial weight vectors, the generalization error using an infinite number of linear student perceptrons is equal to only half that of a single linear perceptron, and converges with that of the infinite case with O(1/K) for a finite number of K linear perceptrons. For learning with inhomogeneous initial weight vectors, it is advantageous to use an approach of weighted averaging over the output of the linear perceptrons, and we show the conditions under which the optimal weights are constant during the learning process. The optimal weights depend on only correlation of the initial weight vectors.Comment: 14 pages, 3 figures, submitted to Physical Review
    • 

    corecore