1,218 research outputs found
Efficient Calculation of the Gauss-Newton Approximation of the Hessian Matrix in Neural Networks
The Levenberg-Marquardt (LM) learning algorithm is a popular algorithm for training neural networks; however, for large neural networks, it becomes prohibitively expensive in terms of running time and memory requirements. The most time-critical step of the algorithm is the calculation of the Gauss-Newton matrix, which is formed by multiplying two large Jacobian matrices together. We propose a method that uses back-propagation to reduce the time of this matrix-matrix multiplication. This reduces the overall asymptotic running time of the LM algorithm by a factor of the order of the number of output nodes in the neural network
Practical Gauss-Newton Optimisation for Deep Learning
We present an efficient block-diagonal ap- proximation to the Gauss-Newton
matrix for feedforward neural networks. Our result- ing algorithm is
competitive against state- of-the-art first order optimisation methods, with
sometimes significant improvement in optimisation performance. Unlike
first-order methods, for which hyperparameter tuning of the optimisation
parameters is often a labo- rious process, our approach can provide good
performance even when used with default set- tings. A side result of our work
is that for piecewise linear transfer functions, the net- work objective
function can have no differ- entiable local maxima, which may partially explain
why such transfer functions facilitate effective optimisation.Comment: ICML 201
Do optimization methods in deep learning applications matter?
With advances in deep learning, exponential data growth and increasing model
complexity, developing efficient optimization methods are attracting much
research attention. Several implementations favor the use of Conjugate Gradient
(CG) and Stochastic Gradient Descent (SGD) as being practical and elegant
solutions to achieve quick convergence, however, these optimization processes
also present many limitations in learning across deep learning applications.
Recent research is exploring higher-order optimization functions as better
approaches, but these present very complex computational challenges for
practical use. Comparing first and higher-order optimization functions, in this
paper, our experiments reveal that Levemberg-Marquardt (LM) significantly
supersedes optimal convergence but suffers from very large processing time
increasing the training complexity of both, classification and reinforcement
learning problems. Our experiments compare off-the-shelf optimization
functions(CG, SGD, LM and L-BFGS) in standard CIFAR, MNIST, CartPole and
FlappyBird experiments.The paper presents arguments on which optimization
functions to use and further, which functions would benefit from
parallelization efforts to improve pretraining time and learning rate
convergence
Gauss-newton Based Learning For Fully Recurrent Neural Networks
The thesis discusses a novel off-line and on-line learning approach for Fully Recurrent Neural Networks (FRNNs). The most popular algorithm for training FRNNs, the Real Time Recurrent Learning (RTRL) algorithm, employs the gradient descent technique for finding the optimum weight vectors in the recurrent neural network. Within the framework of the research presented, a new off-line and on-line variation of RTRL is presented, that is based on the Gauss-Newton method. The method itself is an approximate Newton\u27s method tailored to the specific optimization problem, (non-linear least squares), which aims to speed up the process of FRNN training. The new approach stands as a robust and effective compromise between the original gradient-based RTRL (low computational complexity, slow convergence) and Newton-based variants of RTRL (high computational complexity, fast convergence). By gathering information over time in order to form Gauss-Newton search vectors, the new learning algorithm, GN-RTRL, is capable of converging faster to a better quality solution than the original algorithm. Experimental results reflect these qualities of GN-RTRL, as well as the fact that GN-RTRL may have in practice lower computational cost in comparison, again, to the original RTRL
Training Neural Networks with Stochastic Hessian-Free Optimization
Hessian-free (HF) optimization has been successfully used for training deep
autoencoders and recurrent networks. HF uses the conjugate gradient algorithm
to construct update directions through curvature-vector products that can be
computed on the same order of time as gradients. In this paper we exploit this
property and study stochastic HF with gradient and curvature mini-batches
independent of the dataset size. We modify Martens' HF for these settings and
integrate dropout, a method for preventing co-adaptation of feature detectors,
to guard against overfitting. Stochastic Hessian-free optimization gives an
intermediary between SGD and HF that achieves competitive performance on both
classification and deep autoencoder experiments.Comment: 11 pages, ICLR 201
- …