17 research outputs found
Orthogonal Gradient Descent for Continual Learning
Neural networks are achieving state of the art and sometimes super-human performance on learning tasks across a variety of domains. Whenever these problems require learning in a continual or sequential manner, however, neural networks suffer from the problem of catastrophic forgetting; they forget how to solve previous tasks after being trained on a new task, despite having the essential capacity to solve both tasks if they were trained on both simultaneously. In this paper, we propose to address this issue from a parameter space perspective and study an approach to restrict the direction of the gradient updates to avoid forgetting previously-learned data. We present the Orthogonal Gradient Descent (OGD) method, which accomplishes this goal by projecting the gradients from new tasks onto a subspace in which the neural network output on previous task does not change and the projected gradient is still in a useful direction for learning the new task. Our approach utilizes the high capacity of a neural network more efficiently and does not require storing the previously learned data that might raise privacy concerns. Experiments on common benchmarks reveal the effectiveness of the proposed OGD method
The general framework for few-shot learning by kernel HyperNetworks
Few-shot models aim at making predictions using a minimal number of labeled examples from a given task. The main challenge in this area is the one-shot setting, where only one element represents each class. We propose the general framework for few-shot learning via kernel HyperNetworks—the fusion of kernels and hypernetwork paradigm. Firstly, we introduce the classical realization of this framework, dubbed HyperShot. Compared to reference approaches that apply a gradient-based adjustment of the parameters, our models aim to switch the classification module parameters depending on the task’s embedding. In practice, we utilize a hypernetwork, which takes the aggregated information from support data and returns the classifier’s parameters handcrafted for the considered problem. Moreover, we introduce the kernel-based representation of the support examples delivered to hypernetwork to create the parameters of the classification module. Consequently, we rely on relations between the support examples’ embeddings instead of the backbone models’ direct feature values. Thanks to this approach, our model can adapt to highly different tasks. While such a method obtains very good results, it is limited by typical problems such as poorly quantified uncertainty due to limited data size. We further show that incorporating Bayesian neural networks into our general framework, an approach we call BayesHyperShot, solves this issue
Orthogonal Gradient Descent for Continual Learning
Neural networks are achieving state of the art and sometimes super-human
performance on learning tasks across a variety of domains. Whenever these
problems require learning in a continual or sequential manner, however, neural
networks suffer from the problem of catastrophic forgetting; they forget how to
solve previous tasks after being trained on a new task, despite having the
essential capacity to solve both tasks if they were trained on both
simultaneously. In this paper, we propose to address this issue from a
parameter space perspective and study an approach to restrict the direction of
the gradient updates to avoid forgetting previously-learned data. We present
the Orthogonal Gradient Descent (OGD) method, which accomplishes this goal by
projecting the gradients from new tasks onto a subspace in which the neural
network output on previous task does not change and the projected gradient is
still in a useful direction for learning the new task. Our approach utilizes
the high capacity of a neural network more efficiently and does not require
storing the previously learned data that might raise privacy concerns.
Experiments on common benchmarks reveal the effectiveness of the proposed OGD
method