Search CORE

1,457 research outputs found

Small steps and giant leaps: Minimal Newton solvers for Deep Learning

Author: Albanie Samuel
Ehrhardt Sebastien
Henriques João F.
Vedaldi Andrea
Publication venue
Publication date: 21/05/2018
Field of study

We propose a fast second-order method that can be used as a drop-in replacement for current deep learning solvers. Compared to stochastic gradient descent (SGD), it only requires two additional forward-mode automatic differentiation operations per iteration, which has a computational cost comparable to two standard forward passes and is easy to implement. Our method addresses long-standing issues with current second-order solvers, which invert an approximate Hessian matrix every iteration exactly or by conjugate-gradient methods, a procedure that is both costly and sensitive to noise. Instead, we propose to keep a single estimate of the gradient projected by the inverse Hessian matrix, and update it once per iteration. This estimate has the same size and is similar to the momentum variable that is commonly used in SGD. No estimate of the Hessian is maintained. We first validate our method, called CurveBall, on small problems with known closed-form solutions (noisy Rosenbrock function and degenerate 2-layer linear networks), where current deep learning solvers seem to struggle. We then train several large models on CIFAR and ImageNet, including ResNet and VGG-f networks, where we demonstrate faster convergence with no hyperparameter tuning. Code is available

arXiv.org e-Print Archive

Oxford University Research Archive

The geometry of nonlinear least squares with applications to sloppy models and optimization

Author: A. Bakushinskii
Benjamin B. Machta
C. Igel
C. Misner
C. Rao
C. Rao
C. Udriste
D. Bates
D. Bates
D. Hilbert
E. Beale
H. Jeffreys
J. Hertz
J. Moré
J. Nocedal
J. Stoer
James P. Sethna
K. Levenberg
L. Eisenhart
L. Haines
M. Murray
M. Spivak
Mark K. Transtrum
P. Absil
R. Kass
R. Peeters
S. Amari
S. Smith
T. Ivancevic
W. Press
Publication venue: 'American Physical Society (APS)'
Publication date: 07/10/2010
Field of study

Parameter estimation by nonlinear least squares minimization is a common problem with an elegant geometric interpretation: the possible parameter values of a model induce a manifold in the space of data predictions. The minimization problem is then to find the point on the manifold closest to the data. We show that the model manifolds of a large class of models, known as sloppy models, have many universal features; they are characterized by a geometric series of widths, extrinsic curvatures, and parameter-effects curvatures. A number of common difficulties in optimizing least squares problems are due to this common structure. First, algorithms tend to run into the boundaries of the model manifold, causing parameters to diverge or become unphysical. We introduce the model graph as an extension of the model manifold to remedy this problem. We argue that appropriate priors can remove the boundaries and improve convergence rates. We show that typical fits will have many evaporated parameters. Second, bare model parameters are usually ill-suited to describing model behavior; cost contours in parameter space tend to form hierarchies of plateaus and canyons. Geometrically, we understand this inconvenient parametrization as an extremely skewed coordinate basis and show that it induces a large parameter-effects curvature on the manifold. Using coordinates based on geodesic motion, these narrow canyons are transformed in many cases into a single quadratic, isotropic basin. We interpret the modified Gauss-Newton and Levenberg-Marquardt fitting algorithms as an Euler approximation to geodesic motion in these natural coordinates on the model manifold and the model graph respectively. By adding a geodesic acceleration adjustment to these algorithms, we alleviate the difficulties from parameter-effects curvature, improving both efficiency and success rates at finding good fits.Comment: 40 pages, 29 Figure

arXiv.org e-Print Archive

Myoelectric Knee Angle Estimation Algorithms for Control of Active Transfemoral Leg Prostheses

Author: Adson F. da Rocha
Alberto L. Delis
Francisco A. O. Nascimento
Geovany A. Borges
Joao L. A. Carvalho
Publication venue: 'IntechOpen'
Publication date: 21/01/2011
Field of study

Using neural networks to obtain indirect information about the state variables in an alcoholic fermentation process

Author: Arsin M.
Fiore U.
Florea A.
Sipos A.
Publication venue: 'MDPI AG'
Publication date: 01/01/2021
Field of study

This work provides a manual design space exploration regarding the structure, type, and inputs of a multilayer neural network (NN) to obtain indirect information about the state variables in the alcoholic fermentation process. The main benefit of our application is to help experts reduce the time needed for making the relevant measurements and to increase the lifecycles of sensors in bioreactors. The novelty of this research is the flexibility of the developed application, the use of a great number of variables, and the comparative presentation of the results obtained with different NNs (feedback vs. feed-forward) and different learning algorithms (Back-Propagation vs. Levenberg–Marquardt). The simulation results show that the feedback neural network outperformed the feed-forward neural network. The NN configuration is relatively flexible (with hidden layers and a number of nodes on each of them), but the number of input and output nodes depends on the fermentation process parameters. After laborious simulations, we determined that using pH and CO2 as inputs reduces the prediction errors of the NN. Thus, besides the most commonly used process parameters like fermentation temperature, time, the initial concentration of the substrate, the substrate concentration, and the biomass concentration, by adding pH and CO2, we obtained the optimum number of input nodes for the network. The optimal configuration in our case was obtained after 1500 iterations using a NN with one hidden layer and 12 neurons on it, seven neurons on the input layer, and one neuron as the output. If properly trained and validated, this model can be used in future research to accurately predict steady-state and dynamic alcoholic fermentation process behaviour and thereby improve process control performance

Archivio della ricerca - Università degli studi di Napoli "Parthenope"

Archivio della Ricerca - Università di Salerno

Regularized Newton Method with Global $O(1/k^2)$ Convergence

Author: Mishchenko Konstantin
Publication venue
Publication date: 27/04/2022
Field of study

We present a Newton-type method that converges fast from any initialization and for arbitrary convex objectives with Lipschitz Hessians. We achieve this by merging the ideas of cubic regularization with a certain adaptive Levenberg--Marquardt penalty. In particular, we show that the iterates given by

x^{k+1}=x^k - \bigl(\nabla^2 f(x^k) + \sqrt{H\|\nabla f(x^k)\|} \mathbf{I}\bigr)^{-1}\nabla f(x^k)

, where

H>0

is a constant, converge globally with a

\mathcal{O}(\frac{1}{k^2})

rate. Our method is the first variant of Newton's method that has both cheap iterations and provably fast global convergence. Moreover, we prove that locally our method converges superlinearly when the objective is strongly convex. To boost the method's performance, we present a line search procedure that does not need hyperparameters and is provably efficient.Comment: 21 pages, 2 figure

arXiv.org e-Print Archive

Ultrasound Brain Tomography:Comparison of Deep Learning and Deterministic Methods

Author: Kłosowski Grzegorz
Rymarczyk T
Soleimani Manuchehr
Publication venue
Publication date: 06/11/2023
Field of study

OPUS