Series of Hessian-Vector Products for Tractable Saddle-Free Newton
  Optimisation of Neural Networks

Clarke, Ross M.; Hernández-Lobato, José Miguel; Oldewage, Elre T.

Series of Hessian-Vector Products for Tractable Saddle-Free Newton Optimisation of Neural Networks

Authors: Ross M. Clarke
José Miguel Hernández-Lobato
Elre T. Oldewage
Publication date: 23 October 2023
Publisher

Abstract

Despite their popularity in the field of continuous optimisation, second-order quasi-Newton methods are challenging to apply in machine learning, as the Hessian matrix is intractably large. This computational burden is exacerbated by the need to address non-convexity, for instance by modifying the Hessian's eigenvalues as in Saddle-Free Newton methods. We propose an optimisation algorithm which addresses both of these concerns - to our knowledge, the first efficiently-scalable optimisation algorithm to asymptotically use the exact (eigenvalue-modified) inverse Hessian. Our method frames the problem as a series which principally square-roots and inverts the squared Hessian, then uses it to precondition a gradient vector, all without explicitly computing or eigendecomposing the Hessian. A truncation of this infinite series provides a new optimisation algorithm which is scalable and comparable to other first- and second-order optimisation methods in both runtime and optimisation performance. We demonstrate this in a variety of settings, including a ResNet-18 trained on CIFAR-10.Comment: 36 pages, 10 figures, 5 tables. Submitted to TMLR. First two authors' order randomise

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2310.14901

Last time updated on 16/01/2024