397 research outputs found

    Neural Networks and the Natural Gradient

    Get PDF
    Neural network training algorithms have always suffered from the problem of local minima. The advent of natural gradient algorithms promised to overcome this shortcoming by finding better local minima. However, they require additional training parameters and computational overhead. By using a new formulation for the natural gradient, an algorithm is described that uses less memory and processing time than previous algorithms with comparable performance

    Adaptive learning rates and parallelization for stochastic, sparse, non-smooth gradients

    Full text link
    Recent work has established an empirically successful framework for adapting learning rates for stochastic gradient descent (SGD). This effectively removes all needs for tuning, while automatically reducing learning rates over time on stationary problems, and permitting learning rates to grow appropriately in non-stationary tasks. Here, we extend the idea in three directions, addressing proper minibatch parallelization, including reweighted updates for sparse or orthogonal gradients, improving robustness on non-smooth loss functions, in the process replacing the diagonal Hessian estimation procedure that may not always be available by a robust finite-difference approximation. The final algorithm integrates all these components, has linear complexity and is hyper-parameter free.Comment: Published at the First International Conference on Learning Representations (ICLR-2013). Public reviews are available at http://openreview.net/document/c14f2204-fd66-4d91-bed4-153523694041#c14f2204-fd66-4d91-bed4-15352369404

    No More Pesky Learning Rates

    Full text link
    The performance of stochastic gradient descent (SGD) depends critically on how learning rates are tuned and decreased over time. We propose a method to automatically adjust multiple learning rates so as to minimize the expected error at any one time. The method relies on local gradient variations across samples. In our approach, learning rates can increase as well as decrease, making it suitable for non-stationary problems. Using a number of convex and non-convex learning tasks, we show that the resulting algorithm matches the performance of SGD or other adaptive approaches with their best settings obtained through systematic search, and effectively removes the need for learning rate tuning

    Online Natural Gradient as a Kalman Filter

    Full text link
    We cast Amari's natural gradient in statistical learning as a specific case of Kalman filtering. Namely, applying an extended Kalman filter to estimate a fixed unknown parameter of a probabilistic model from a series of observations, is rigorously equivalent to estimating this parameter via an online stochastic natural gradient descent on the log-likelihood of the observations. In the i.i.d. case, this relation is a consequence of the "information filter" phrasing of the extended Kalman filter. In the recurrent (state space, non-i.i.d.) case, we prove that the joint Kalman filter over states and parameters is a natural gradient on top of real-time recurrent learning (RTRL), a classical algorithm to train recurrent models. This exact algebraic correspondence provides relevant interpretations for natural gradient hyperparameters such as learning rates or initialization and regularization of the Fisher information matrix.Comment: 3rd version: expanded intr

    Spatio-temporal learning with the online finite and infinite echo-state Gaussian processes

    Get PDF
    Successful biological systems adapt to change. In this paper, we are principally concerned with adaptive systems that operate in environments where data arrives sequentially and is multivariate in nature, for example, sensory streams in robotic systems. We contribute two reservoir inspired methods: 1) the online echostate Gaussian process (OESGP) and 2) its infinite variant, the online infinite echostate Gaussian process (OIESGP) Both algorithms are iterative fixed-budget methods that learn from noisy time series. In particular, the OESGP combines the echo-state network with Bayesian online learning for Gaussian processes. Extending this to infinite reservoirs yields the OIESGP, which uses a novel recursive kernel with automatic relevance determination that enables spatial and temporal feature weighting. When fused with stochastic natural gradient descent, the kernel hyperparameters are iteratively adapted to better model the target system. Furthermore, insights into the underlying system can be gleamed from inspection of the resulting hyperparameters. Experiments on noisy benchmark problems (one-step prediction and system identification) demonstrate that our methods yield high accuracies relative to state-of-the-art methods, and standard kernels with sliding windows, particularly on problems with irrelevant dimensions. In addition, we describe two case studies in robotic learning-by-demonstration involving the Nao humanoid robot and the Assistive Robot Transport for Youngsters (ARTY) smart wheelchair

    Acceleration Strategies For The Backpropagation Neural Network Learning Algorithm

    Get PDF
    Algoritma perambatan balik telah terbukti sebagai salah satu algoritma rangkaian neural yang paling berjaya. Namun demikian, seperti kebanyakan kaedah pengoptimuman yang berasaskan kecerunan, ianya menumpu dengan lamb at dan keupayaannya berkurangan bagi tugas-tugas yang lebih besar dan kompleks. Dalam tesis ini, faktor-faktor yang menguasai kepantasan pembelajaran algoritma perambatan balik diselidik dan dianalisa secara matematik untuk membangunkan strategi-strategi bagi memperbaiki prestasi algoritma pembelajaran rangkaian neural ini. Faktor-faktor ini meliputi pilihan pemberat awal, pilihan fungsi pengaktifan dan nilai sasaran serta dua parameter perambatan, iaitu kadar pembelajaran dan faktor momentum. The backpropagation algorithm has proven to be one of the most successful neural network learning algorithms. However, as with many gradient based optimization methods, it converges slowly and it scales up poorly as tasks become larger and more complex. In this thesis, factors that govern the learning speed of the backpropagation algorithm are investigated and mathematically analyzed in order to develop strategies to improve the performance of this neural network learning algorithm. These factors include the choice of initial weights, the choice of activation function and target values, and the two backpropagation parameters, the learning rate and the momentum factor
    corecore