13 research outputs found
A Trace-restricted Kronecker-Factored Approximation to Natural Gradient
Second-order optimization methods have the ability to accelerate convergence
by modifying the gradient through the curvature matrix. There have been many
attempts to use second-order optimization methods for training deep neural
networks. Inspired by diagonal approximations and factored approximations such
as Kronecker-Factored Approximate Curvature (KFAC), we propose a new
approximation to the Fisher information matrix (FIM) called Trace-restricted
Kronecker-factored Approximate Curvature (TKFAC) in this work, which can hold
the certain trace relationship between the exact and the approximate FIM. In
TKFAC, we decompose each block of the approximate FIM as a Kronecker product of
two smaller matrices and scaled by a coefficient related to trace. We
theoretically analyze TKFAC's approximation error and give an upper bound of
it. We also propose a new damping technique for TKFAC on convolutional neural
networks to maintain the superiority of second-order optimization methods
during training. Experiments show that our method has better performance
compared with several state-of-the-art algorithms on some deep network
architectures
Automatic Differentiable Monte Carlo: Theory and Application
Differentiable programming has emerged as a key programming paradigm
empowering rapid developments of deep learning while its applications to
important computational methods such as Monte Carlo remain largely unexplored.
Here we present the general theory enabling infinite-order automatic
differentiation on expectations computed by Monte Carlo with unnormalized
probability distributions, which we call "automatic differentiable Monte Carlo"
(ADMC). By implementing ADMC algorithms on computational graphs, one can also
leverage state-of-the-art machine learning frameworks and techniques to
traditional Monte Carlo applications in statistics and physics. We illustrate
the versatility of ADMC by showing some applications: fast search of phase
transitions and accurately finding ground states of interacting many-body
models in two dimensions. ADMC paves a promising way to innovate Monte Carlo in
various aspects to achieve higher accuracy and efficiency, e.g. easing or
solving the sign problem of quantum many-body models through ADMC.Comment: 11.5 pages + supplemental materials, 4 figure
WoodFisher: Efficient Second-Order Approximation for Neural Network Compression
Second-order information, in the form of Hessian- or Inverse-Hessian-vector
products, is a fundamental tool for solving optimization problems. Recently,
there has been significant interest in utilizing this information in the
context of deep neural networks; however, relatively little is known about the
quality of existing approximations in this context. Our work examines this
question, identifies issues with existing approaches, and proposes a method
called WoodFisher to compute a faithful and efficient estimate of the inverse
Hessian.
Our main application is to neural network compression, where we build on the
classic Optimal Brain Damage/Surgeon framework. We demonstrate that WoodFisher
significantly outperforms popular state-of-the-art methods for one-shot
pruning. Further, even when iterative, gradual pruning is considered, our
method results in a gain in test accuracy over the state-of-the-art approaches,
for pruning popular neural networks (like ResNet-50, MobileNetV1) trained on
standard image classification datasets such as ImageNet ILSVRC. We examine how
our method can be extended to take into account first-order information, as
well as illustrate its ability to automatically set layer-wise pruning
thresholds and perform compression in the limited-data regime. The code is
available at the following link, https://github.com/IST-DASLab/WoodFisher.Comment: NeurIPS 202