6 research outputs found

    Orthogonal Gradient Descent for Continual Learning

    Get PDF
    Neural networks are achieving state of the art and sometimes super-human performance on learning tasks across a variety of domains. Whenever these problems require learning in a continual or sequential manner, however, neural networks suffer from the problem of catastrophic forgetting; they forget how to solve previous tasks after being trained on a new task, despite having the essential capacity to solve both tasks if they were trained on both simultaneously. In this paper, we propose to address this issue from a parameter space perspective and study an approach to restrict the direction of the gradient updates to avoid forgetting previously-learned data. We present the Orthogonal Gradient Descent (OGD) method, which accomplishes this goal by projecting the gradients from new tasks onto a subspace in which the neural network output on previous task does not change and the projected gradient is still in a useful direction for learning the new task. Our approach utilizes the high capacity of a neural network more efficiently and does not require storing the previously learned data that might raise privacy concerns. Experiments on common benchmarks reveal the effectiveness of the proposed OGD method

    Orthogonal Gradient Descent for Continual Learning

    Get PDF
    Neural networks are achieving state of the art and sometimes super-human performance on learning tasks across a variety of domains. Whenever these problems require learning in a continual or sequential manner, however, neural networks suffer from the problem of catastrophic forgetting; they forget how to solve previous tasks after being trained on a new task, despite having the essential capacity to solve both tasks if they were trained on both simultaneously. In this paper, we propose to address this issue from a parameter space perspective and study an approach to restrict the direction of the gradient updates to avoid forgetting previously-learned data. We present the Orthogonal Gradient Descent (OGD) method, which accomplishes this goal by projecting the gradients from new tasks onto a subspace in which the neural network output on previous task does not change and the projected gradient is still in a useful direction for learning the new task. Our approach utilizes the high capacity of a neural network more efficiently and does not require storing the previously learned data that might raise privacy concerns. Experiments on common benchmarks reveal the effectiveness of the proposed OGD method

    Embrace concept drift : a novel solution for online continual learning

    Get PDF
    Continual learning is a critical area of research in machine learning that aims to enable models to learn new information without forgetting the old knowledge. Online continual learning, in particular, addresses the challenges of learning from a stream of data in real-world environments where data can be unbounded and heterogeneous. There are two main problems to be addressed in online continual learning: the first one is catastrophic forgetting, a phenomenon where the model forgets the previously learned knowledge while learning new tasks; the second one is concept drift, a situation where the distribution of the data changes over time. These issues can further complicate the learning process, compared to traditional machine learning. In this thesis, we propose a general framework for online continual learning that leverages both regularization-based and memory-based methods to mitigate catastrophic forgetting and handle concept drift. Specifically, we introduce a novel concept drift detection algorithm based on the confidence values of the samples. We present a novel online continual learning paradigm, which utilizes concept drift as a rehearsal signal to improve performance by consolidating or expanding the memory center. We also apply data condensation approaches to online continual learning in order to perform memory efficient rehearsal. Furthermore, we evaluate the accuracy of old tasks and new tasks, comparing with many benchmark models. We present a novel evaluation metric - Stability and Plasticity Balance to measure the balance between old and new accuracy. We evaluate our proposed approach on a new benchmark dataset framework, Continual Online Learning (COnL), which consists of two scenarios of online continual learning: class-incremental learning and instance-incremental learning. In this thesis, the benchmark dataset framework randomly selects a number of incremental classes from 3 different datasets: TinyImageNet, Germany Traffic Sign and Landmarks. Our primary results demonstrate that concept drift can be a useful tool in memory rehearsal in the online continual learning setting. Our proposed approaches provide a promising direction for future research in online continual learning and have the potential to enable models to learn continuously from unbounded and heterogeneous data streams in real-world environments
    corecore