528 research outputs found
Training (Overparametrized) Neural Networks in Near-Linear Time
The slow convergence rate and pathological curvature issues of first-order
gradient methods for training deep neural networks, initiated an ongoing effort
for developing faster - optimization
algorithms beyond SGD, without compromising the generalization error. Despite
their remarkable convergence rate ( of the training batch
size ), second-order algorithms incur a daunting slowdown in the
(inverting the Hessian
matrix of the loss function), which renders them impractical. Very recently,
this computational overhead was mitigated by the works of [ZMG19,CGH+19},
yielding an -time second-order algorithm for training two-layer
overparametrized neural networks of polynomial width .
We show how to speed up the algorithm of [CGH+19], achieving an
-time backpropagation algorithm for training (mildly
overparametrized) ReLU networks, which is near-linear in the dimension ()
of the full gradient (Jacobian) matrix. The centerpiece of our algorithm is to
reformulate the Gauss-Newton iteration as an -regression problem, and
then use a Fast-JL type dimension reduction to the
underlying Gram matrix in time independent of , allowing to find a
sufficiently good approximate solution via -
conjugate gradient. Our result provides a proof-of-concept that advanced
machinery from randomized linear algebra -- which led to recent breakthroughs
in (ERM, LPs, Regression) -- can be
carried over to the realm of deep learning as well
SCORE: approximating curvature information under self-concordant regularization
In this paper, we propose the SCORE (self-concordant regularization)
framework for unconstrained minimization problems which incorporates
second-order information in the Newton decrement framework for convex
optimization. We propose the generalized Gauss-Newton with Self-Concordant
Regularization (GGN-SCORE) algorithm that updates the minimization variables
each time it receives a new input batch. The proposed algorithm exploits the
structure of the second-order information in the Hessian matrix, thereby
reducing computational overhead. GGN-SCORE demonstrates how we may speed up
convergence while also improving model generalization for problems that involve
regularized minimization under the SCORE framework. Numerical experiments show
the efficiency of our method and its fast convergence, which compare favorably
against baseline first-order and quasi-Newton methods. Additional experiments
involving non-convex (overparameterized) neural network training problems show
similar convergence behaviour thereby highlighting the promise of the proposed
algorithm for non-convex optimization.Comment: 21 pages, 12 figure
- …