Search CORE

272 research outputs found

Online Structured Laplace Approximations For Overcoming Catastrophic Forgetting

Author: Barber David
Botev Aleksandar
Ritter Hippolyt
Publication venue
Publication date: 20/05/2018
Field of study

We introduce the Kronecker factored online Laplace approximation for overcoming catastrophic forgetting in neural networks. The method is grounded in a Bayesian online learning framework, where we recursively approximate the posterior after every task with a Gaussian, leading to a quadratic penalty on changes to the weights. The Laplace approximation requires calculating the Hessian around a mode, which is typically intractable for modern architectures. In order to make our method scalable, we leverage recent block-diagonal Kronecker factored approximations to the curvature. Our algorithm achieves over 90% test accuracy across a sequence of 50 instantiations of the permuted MNIST dataset, substantially outperforming related methods for overcoming catastrophic forgetting.Comment: 13 pages, 6 figure

arXiv.org e-Print Archive

UCL Discovery

A Trace-restricted Kronecker-Factored Approximation to Natural Gradient

Author: Gao Kai-Xin
Huang Zheng-Hai
Liu Xiao-Lei
Wang Min
Wang Zidong
Xu Dachuan
Yu Fan
Publication venue
Publication date: 21/11/2020
Field of study

Second-order optimization methods have the ability to accelerate convergence by modifying the gradient through the curvature matrix. There have been many attempts to use second-order optimization methods for training deep neural networks. Inspired by diagonal approximations and factored approximations such as Kronecker-Factored Approximate Curvature (KFAC), we propose a new approximation to the Fisher information matrix (FIM) called Trace-restricted Kronecker-factored Approximate Curvature (TKFAC) in this work, which can hold the certain trace relationship between the exact and the approximate FIM. In TKFAC, we decompose each block of the approximate FIM as a Kronecker product of two smaller matrices and scaled by a coefficient related to trace. We theoretically analyze TKFAC's approximation error and give an upper bound of it. We also propose a new damping technique for TKFAC on convolutional neural networks to maintain the superiority of second-order optimization methods during training. Experiments show that our method has better performance compared with several state-of-the-art algorithms on some deep network architectures

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications