1,457 research outputs found
Small steps and giant leaps: Minimal Newton solvers for Deep Learning
We propose a fast second-order method that can be used as a drop-in
replacement for current deep learning solvers. Compared to stochastic gradient
descent (SGD), it only requires two additional forward-mode automatic
differentiation operations per iteration, which has a computational cost
comparable to two standard forward passes and is easy to implement. Our method
addresses long-standing issues with current second-order solvers, which invert
an approximate Hessian matrix every iteration exactly or by conjugate-gradient
methods, a procedure that is both costly and sensitive to noise. Instead, we
propose to keep a single estimate of the gradient projected by the inverse
Hessian matrix, and update it once per iteration. This estimate has the same
size and is similar to the momentum variable that is commonly used in SGD. No
estimate of the Hessian is maintained. We first validate our method, called
CurveBall, on small problems with known closed-form solutions (noisy Rosenbrock
function and degenerate 2-layer linear networks), where current deep learning
solvers seem to struggle. We then train several large models on CIFAR and
ImageNet, including ResNet and VGG-f networks, where we demonstrate faster
convergence with no hyperparameter tuning. Code is available
The geometry of nonlinear least squares with applications to sloppy models and optimization
Parameter estimation by nonlinear least squares minimization is a common
problem with an elegant geometric interpretation: the possible parameter values
of a model induce a manifold in the space of data predictions. The minimization
problem is then to find the point on the manifold closest to the data. We show
that the model manifolds of a large class of models, known as sloppy models,
have many universal features; they are characterized by a geometric series of
widths, extrinsic curvatures, and parameter-effects curvatures. A number of
common difficulties in optimizing least squares problems are due to this common
structure. First, algorithms tend to run into the boundaries of the model
manifold, causing parameters to diverge or become unphysical. We introduce the
model graph as an extension of the model manifold to remedy this problem. We
argue that appropriate priors can remove the boundaries and improve convergence
rates. We show that typical fits will have many evaporated parameters. Second,
bare model parameters are usually ill-suited to describing model behavior; cost
contours in parameter space tend to form hierarchies of plateaus and canyons.
Geometrically, we understand this inconvenient parametrization as an extremely
skewed coordinate basis and show that it induces a large parameter-effects
curvature on the manifold. Using coordinates based on geodesic motion, these
narrow canyons are transformed in many cases into a single quadratic, isotropic
basin. We interpret the modified Gauss-Newton and Levenberg-Marquardt fitting
algorithms as an Euler approximation to geodesic motion in these natural
coordinates on the model manifold and the model graph respectively. By adding a
geodesic acceleration adjustment to these algorithms, we alleviate the
difficulties from parameter-effects curvature, improving both efficiency and
success rates at finding good fits.Comment: 40 pages, 29 Figure
Using neural networks to obtain indirect information about the state variables in an alcoholic fermentation process
This work provides a manual design space exploration regarding the structure, type, and inputs of a multilayer neural network (NN) to obtain indirect information about the state variables in the alcoholic fermentation process. The main benefit of our application is to help experts reduce the time needed for making the relevant measurements and to increase the lifecycles of sensors in bioreactors. The novelty of this research is the flexibility of the developed application, the use of a great number of variables, and the comparative presentation of the results obtained with different NNs (feedback vs. feed-forward) and different learning algorithms (Back-Propagation vs. Levenberg–Marquardt). The simulation results show that the feedback neural network outperformed the feed-forward neural network. The NN configuration is relatively flexible (with hidden layers and a number of nodes on each of them), but the number of input and output nodes depends on the fermentation process parameters. After laborious simulations, we determined that using pH and CO2 as inputs reduces the prediction errors of the NN. Thus, besides the most commonly used process parameters like fermentation temperature, time, the initial concentration of the substrate, the substrate concentration, and the biomass concentration, by adding pH and CO2, we obtained the optimum number of input nodes for the network. The optimal configuration in our case was obtained after 1500 iterations using a NN with one hidden layer and 12 neurons on it, seven neurons on the input layer, and one neuron as the output. If properly trained and validated, this model can be used in future research to accurately predict steady-state and dynamic alcoholic fermentation process behaviour and thereby improve process control performance
Regularized Newton Method with Global Convergence
We present a Newton-type method that converges fast from any initialization
and for arbitrary convex objectives with Lipschitz Hessians. We achieve this by
merging the ideas of cubic regularization with a certain adaptive
Levenberg--Marquardt penalty. In particular, we show that the iterates given by
, where is a constant, converge
globally with a rate. Our method is the first
variant of Newton's method that has both cheap iterations and provably fast
global convergence. Moreover, we prove that locally our method converges
superlinearly when the objective is strongly convex. To boost the method's
performance, we present a line search procedure that does not need
hyperparameters and is provably efficient.Comment: 21 pages, 2 figure
- …