10,372 research outputs found
Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence
Incremental learning (IL) has received a lot of attention recently, however,
the literature lacks a precise problem definition, proper evaluation settings,
and metrics tailored specifically for the IL problem. One of the main
objectives of this work is to fill these gaps so as to provide a common ground
for better understanding of IL. The main challenge for an IL algorithm is to
update the classifier whilst preserving existing knowledge. We observe that, in
addition to forgetting, a known issue while preserving knowledge, IL also
suffers from a problem we call intransigence, inability of a model to update
its knowledge. We introduce two metrics to quantify forgetting and
intransigence that allow us to understand, analyse, and gain better insights
into the behaviour of IL algorithms. We present RWalk, a generalization of
EWC++ (our efficient version of EWC [Kirkpatrick2016EWC]) and Path Integral
[Zenke2017Continual] with a theoretically grounded KL-divergence based
perspective. We provide a thorough analysis of various IL algorithms on MNIST
and CIFAR-100 datasets. In these experiments, RWalk obtains superior results in
terms of accuracy, and also provides a better trade-off between forgetting and
intransigence
An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks
Catastrophic forgetting is a problem faced by many machine learning models
and algorithms. When trained on one task, then trained on a second task, many
machine learning models "forget" how to perform the first task. This is widely
believed to be a serious problem for neural networks. Here, we investigate the
extent to which the catastrophic forgetting problem occurs for modern neural
networks, comparing both established and recent gradient-based training
algorithms and activation functions. We also examine the effect of the
relationship between the first task and the second task on catastrophic
forgetting. We find that it is always best to train using the dropout
algorithm--the dropout algorithm is consistently best at adapting to the new
task, remembering the old task, and has the best tradeoff curve between these
two extremes. We find that different tasks and relationships between tasks
result in very different rankings of activation function performance. This
suggests the choice of activation function should always be cross-validated
- …