Search CORE

3 research outputs found

Deep Neural Network Learning with Second-Order Optimizers -- a Practical Study with a Stochastic Quasi-Gauss-Newton Method

Author: Araya-Polo Mauricio
Hohl Detlef
Thiele Christopher
Publication venue
Publication date: 30/06/2020
Field of study

Training in supervised deep learning is computationally demanding, and the convergence behavior is usually not fully understood. We introduce and study a second-order stochastic quasi-Gauss-Newton (SQGN) optimization method that combines ideas from stochastic quasi-Newton methods, Gauss-Newton methods, and variance reduction to address this problem. SQGN provides excellent accuracy without the need for experimenting with many hyper-parameter configurations, which is often computationally prohibitive given the number of combinations and the cost of each training process. We discuss the implementation of SQGN with TensorFlow, and we compare its convergence and computational performance to selected first-order methods using the MNIST benchmark and a large-scale seismic tomography application from Earth science.Comment: 8 pages, 3 figures; added reference to code, fixed formatting of titl

arXiv.org e-Print Archive

Fast Approximation of the Gauss-Newton Hessian Matrix for the Multilayer Perceptron

Author: Biros George
Bungartz Hans-Joachim
Chen Chao
Reiz Severin
Yu Chenhan
Publication venue
Publication date: 14/11/2020
Field of study

We introduce a fast algorithm for entry-wise evaluation of the Gauss-Newton Hessian (GNH) matrix for the fully-connected feed-forward neural network. The algorithm has a precomputation step and a sampling step. While it generally requires

O(Nn)

work to compute an entry (and the entire column) in the GNH matrix for a neural network with

N

parameters and

n

data points, our fast sampling algorithm reduces the cost to

O(n+d/\epsilon^2)

work, where

d

is the output dimension of the network and

\epsilon

is a prescribed accuracy (independent of

N

). One application of our algorithm is constructing the hierarchical-matrix (H-matrix) approximation of the GNH matrix for solving linear systems and eigenvalue problems. It generally requires

O(N^2)

memory and

O(N^3)

work to store and factorize the GNH matrix, respectively. The H-matrix approximation requires only

O(N r_o)

memory footprint and

O(N r_o^2)

work to be factorized, where

r_o \ll N

is the maximum rank of off-diagonal blocks in the GNH matrix. We demonstrate the performance of our fast algorithm and the H-matrix approximation on classification and autoencoder neural networks

arXiv.org e-Print Archive

A literature survey of matrix methods for data science

Author: Stoll Martin
Publication venue
Publication date: 29/05/2020
Field of study

Efficient numerical linear algebra is a core ingredient in many applications across almost all scientific and industrial disciplines. With this survey we want to illustrate that numerical linear algebra has played and is playing a crucial role in enabling and improving data science computations with many new developments being fueled by the availability of data and computing resources. We highlight the role of various different factorizations and the power of changing the representation of the data as well as discussing topics such as randomized algorithms, functions of matrices, and high-dimensional problems. We briefly touch upon the role of techniques from numerical linear algebra used within deep learning

arXiv.org e-Print Archive