397 research outputs found
Neural Networks and the Natural Gradient
Neural network training algorithms have always suffered from the problem of local minima. The advent of natural gradient algorithms promised to overcome this shortcoming by finding better local minima. However, they require additional training parameters and computational overhead. By using a new formulation for the natural gradient, an algorithm is described that uses less memory and processing time than previous algorithms with comparable performance
Adaptive learning rates and parallelization for stochastic, sparse, non-smooth gradients
Recent work has established an empirically successful framework for adapting
learning rates for stochastic gradient descent (SGD). This effectively removes
all needs for tuning, while automatically reducing learning rates over time on
stationary problems, and permitting learning rates to grow appropriately in
non-stationary tasks. Here, we extend the idea in three directions, addressing
proper minibatch parallelization, including reweighted updates for sparse or
orthogonal gradients, improving robustness on non-smooth loss functions, in the
process replacing the diagonal Hessian estimation procedure that may not always
be available by a robust finite-difference approximation. The final algorithm
integrates all these components, has linear complexity and is hyper-parameter
free.Comment: Published at the First International Conference on Learning
Representations (ICLR-2013). Public reviews are available at
http://openreview.net/document/c14f2204-fd66-4d91-bed4-153523694041#c14f2204-fd66-4d91-bed4-15352369404
No More Pesky Learning Rates
The performance of stochastic gradient descent (SGD) depends critically on
how learning rates are tuned and decreased over time. We propose a method to
automatically adjust multiple learning rates so as to minimize the expected
error at any one time. The method relies on local gradient variations across
samples. In our approach, learning rates can increase as well as decrease,
making it suitable for non-stationary problems. Using a number of convex and
non-convex learning tasks, we show that the resulting algorithm matches the
performance of SGD or other adaptive approaches with their best settings
obtained through systematic search, and effectively removes the need for
learning rate tuning
Online Natural Gradient as a Kalman Filter
We cast Amari's natural gradient in statistical learning as a specific case
of Kalman filtering. Namely, applying an extended Kalman filter to estimate a
fixed unknown parameter of a probabilistic model from a series of observations,
is rigorously equivalent to estimating this parameter via an online stochastic
natural gradient descent on the log-likelihood of the observations.
In the i.i.d. case, this relation is a consequence of the "information
filter" phrasing of the extended Kalman filter. In the recurrent (state space,
non-i.i.d.) case, we prove that the joint Kalman filter over states and
parameters is a natural gradient on top of real-time recurrent learning (RTRL),
a classical algorithm to train recurrent models.
This exact algebraic correspondence provides relevant interpretations for
natural gradient hyperparameters such as learning rates or initialization and
regularization of the Fisher information matrix.Comment: 3rd version: expanded intr
Spatio-temporal learning with the online finite and infinite echo-state Gaussian processes
Successful biological systems adapt to change. In this paper, we are principally concerned with adaptive systems that operate in environments where data arrives sequentially and is multivariate in nature, for example, sensory streams in robotic systems. We contribute two reservoir inspired methods: 1) the online echostate Gaussian process (OESGP) and 2) its infinite variant, the online infinite echostate Gaussian process (OIESGP) Both algorithms are iterative fixed-budget methods that learn from noisy time series. In particular, the OESGP combines the echo-state network with Bayesian online learning for Gaussian processes. Extending this to infinite reservoirs yields the OIESGP, which uses a novel recursive kernel with automatic relevance determination that enables spatial and temporal feature weighting. When fused with stochastic natural gradient descent, the kernel hyperparameters are iteratively adapted to better model the target system. Furthermore, insights into the underlying system can be gleamed from inspection of the resulting hyperparameters. Experiments on noisy benchmark problems (one-step prediction and system identification) demonstrate that our methods yield high accuracies relative to state-of-the-art methods, and standard kernels with sliding windows, particularly on problems with irrelevant dimensions. In addition, we describe two case studies in robotic learning-by-demonstration involving the Nao humanoid robot and the Assistive Robot Transport for Youngsters (ARTY) smart wheelchair
Acceleration Strategies For The Backpropagation Neural Network Learning Algorithm
Algoritma perambatan balik telah terbukti sebagai salah satu algoritma rangkaian neural
yang paling berjaya. Namun demikian, seperti kebanyakan kaedah pengoptimuman
yang berasaskan kecerunan, ianya menumpu dengan lamb at dan keupayaannya
berkurangan bagi tugas-tugas yang lebih besar dan kompleks.
Dalam tesis ini, faktor-faktor yang menguasai kepantasan pembelajaran algoritma
perambatan balik diselidik dan dianalisa secara matematik untuk membangunkan
strategi-strategi bagi memperbaiki prestasi algoritma pembelajaran rangkaian neural ini.
Faktor-faktor ini meliputi pilihan pemberat awal, pilihan fungsi pengaktifan dan nilai
sasaran serta dua parameter perambatan, iaitu kadar pembelajaran dan faktor
momentum.
The backpropagation algorithm has proven to be one of the most successful neural
network learning algorithms. However, as with many gradient based optimization
methods, it converges slowly and it scales up poorly as tasks become larger and more
complex.
In this thesis, factors that govern the learning speed of the backpropagation algorithm
are investigated and mathematically analyzed in order to develop strategies to improve
the performance of this neural network learning algorithm. These factors include the
choice of initial weights, the choice of activation function and target values, and the two
backpropagation parameters, the learning rate and the momentum factor
- …