3 research outputs found
Deep Neural Network Learning with Second-Order Optimizers -- a Practical Study with a Stochastic Quasi-Gauss-Newton Method
Training in supervised deep learning is computationally demanding, and the
convergence behavior is usually not fully understood. We introduce and study a
second-order stochastic quasi-Gauss-Newton (SQGN) optimization method that
combines ideas from stochastic quasi-Newton methods, Gauss-Newton methods, and
variance reduction to address this problem. SQGN provides excellent accuracy
without the need for experimenting with many hyper-parameter configurations,
which is often computationally prohibitive given the number of combinations and
the cost of each training process. We discuss the implementation of SQGN with
TensorFlow, and we compare its convergence and computational performance to
selected first-order methods using the MNIST benchmark and a large-scale
seismic tomography application from Earth science.Comment: 8 pages, 3 figures; added reference to code, fixed formatting of
titl
Fast Approximation of the Gauss-Newton Hessian Matrix for the Multilayer Perceptron
We introduce a fast algorithm for entry-wise evaluation of the Gauss-Newton
Hessian (GNH) matrix for the fully-connected feed-forward neural network. The
algorithm has a precomputation step and a sampling step. While it generally
requires work to compute an entry (and the entire column) in the GNH
matrix for a neural network with parameters and data points, our fast
sampling algorithm reduces the cost to work, where is
the output dimension of the network and is a prescribed accuracy
(independent of ). One application of our algorithm is constructing the
hierarchical-matrix (H-matrix) approximation of the GNH matrix for solving
linear systems and eigenvalue problems. It generally requires memory
and work to store and factorize the GNH matrix, respectively. The
H-matrix approximation requires only memory footprint and work to be factorized, where is the maximum rank of
off-diagonal blocks in the GNH matrix. We demonstrate the performance of our
fast algorithm and the H-matrix approximation on classification and autoencoder
neural networks
A literature survey of matrix methods for data science
Efficient numerical linear algebra is a core ingredient in many applications
across almost all scientific and industrial disciplines. With this survey we
want to illustrate that numerical linear algebra has played and is playing a
crucial role in enabling and improving data science computations with many new
developments being fueled by the availability of data and computing resources.
We highlight the role of various different factorizations and the power of
changing the representation of the data as well as discussing topics such as
randomized algorithms, functions of matrices, and high-dimensional problems. We
briefly touch upon the role of techniques from numerical linear algebra used
within deep learning