Despite their popularity in the field of continuous optimisation,
second-order quasi-Newton methods are challenging to apply in machine learning,
as the Hessian matrix is intractably large. This computational burden is
exacerbated by the need to address non-convexity, for instance by modifying the
Hessian's eigenvalues as in Saddle-Free Newton methods. We propose an
optimisation algorithm which addresses both of these concerns - to our
knowledge, the first efficiently-scalable optimisation algorithm to
asymptotically use the exact (eigenvalue-modified) inverse Hessian. Our method
frames the problem as a series which principally square-roots and inverts the
squared Hessian, then uses it to precondition a gradient vector, all without
explicitly computing or eigendecomposing the Hessian. A truncation of this
infinite series provides a new optimisation algorithm which is scalable and
comparable to other first- and second-order optimisation methods in both
runtime and optimisation performance. We demonstrate this in a variety of
settings, including a ResNet-18 trained on CIFAR-10.Comment: 36 pages, 10 figures, 5 tables. Submitted to TMLR. First two authors'
order randomise