3 research outputs found

    Deep Neural Network Learning with Second-Order Optimizers -- a Practical Study with a Stochastic Quasi-Gauss-Newton Method

    Full text link
    Training in supervised deep learning is computationally demanding, and the convergence behavior is usually not fully understood. We introduce and study a second-order stochastic quasi-Gauss-Newton (SQGN) optimization method that combines ideas from stochastic quasi-Newton methods, Gauss-Newton methods, and variance reduction to address this problem. SQGN provides excellent accuracy without the need for experimenting with many hyper-parameter configurations, which is often computationally prohibitive given the number of combinations and the cost of each training process. We discuss the implementation of SQGN with TensorFlow, and we compare its convergence and computational performance to selected first-order methods using the MNIST benchmark and a large-scale seismic tomography application from Earth science.Comment: 8 pages, 3 figures; added reference to code, fixed formatting of titl

    Fast Approximation of the Gauss-Newton Hessian Matrix for the Multilayer Perceptron

    Full text link
    We introduce a fast algorithm for entry-wise evaluation of the Gauss-Newton Hessian (GNH) matrix for the fully-connected feed-forward neural network. The algorithm has a precomputation step and a sampling step. While it generally requires O(Nn)O(Nn) work to compute an entry (and the entire column) in the GNH matrix for a neural network with NN parameters and nn data points, our fast sampling algorithm reduces the cost to O(n+d/ϵ2)O(n+d/\epsilon^2) work, where dd is the output dimension of the network and ϵ\epsilon is a prescribed accuracy (independent of NN). One application of our algorithm is constructing the hierarchical-matrix (H-matrix) approximation of the GNH matrix for solving linear systems and eigenvalue problems. It generally requires O(N2)O(N^2) memory and O(N3)O(N^3) work to store and factorize the GNH matrix, respectively. The H-matrix approximation requires only O(Nro)O(N r_o) memory footprint and O(Nro2)O(N r_o^2) work to be factorized, where ro≪Nr_o \ll N is the maximum rank of off-diagonal blocks in the GNH matrix. We demonstrate the performance of our fast algorithm and the H-matrix approximation on classification and autoencoder neural networks

    A literature survey of matrix methods for data science

    Full text link
    Efficient numerical linear algebra is a core ingredient in many applications across almost all scientific and industrial disciplines. With this survey we want to illustrate that numerical linear algebra has played and is playing a crucial role in enabling and improving data science computations with many new developments being fueled by the availability of data and computing resources. We highlight the role of various different factorizations and the power of changing the representation of the data as well as discussing topics such as randomized algorithms, functions of matrices, and high-dimensional problems. We briefly touch upon the role of techniques from numerical linear algebra used within deep learning
    corecore