149 research outputs found

    Scalable Recollections for Continual Lifelong Learning

    Full text link
    Given the recent success of Deep Learning applied to a variety of single tasks, it is natural to consider more human-realistic settings. Perhaps the most difficult of these settings is that of continual lifelong learning, where the model must learn online over a continuous stream of non-stationary data. A successful continual lifelong learning system must have three key capabilities: it must learn and adapt over time, it must not forget what it has learned, and it must be efficient in both training time and memory. Recent techniques have focused their efforts primarily on the first two capabilities while questions of efficiency remain largely unexplored. In this paper, we consider the problem of efficient and effective storage of experiences over very large time-frames. In particular we consider the case where typical experiences are O(n) bits and memories are limited to O(k) bits for k << n. We present a novel scalable architecture and training algorithm in this challenging domain and provide an extensive evaluation of its performance. Our results show that we can achieve considerable gains on top of state-of-the-art methods such as GEM.Comment: AAAI 201

    Towards Continual Reinforcement Learning: A Review and Perspectives

    Full text link
    In this article, we aim to provide a literature review of different formulations and approaches to continual reinforcement learning (RL), also known as lifelong or non-stationary RL. We begin by discussing our perspective on why RL is a natural fit for studying continual learning. We then provide a taxonomy of different continual RL formulations and mathematically characterize the non-stationary dynamics of each setting. We go on to discuss evaluation of continual RL agents, providing an overview of benchmarks used in the literature and important metrics for understanding agent performance. Finally, we highlight open problems and challenges in bridging the gap between the current state of continual RL and findings in neuroscience. While still in its early days, the study of continual RL has the promise to develop better incremental reinforcement learners that can function in increasingly realistic applications where non-stationarity plays a vital role. These include applications such as those in the fields of healthcare, education, logistics, and robotics.Comment: Preprint, 52 pages, 8 figure

    On impact of mixing times in continual reinforcement learning

    Full text link
    Le temps de mélange de la chaîne de Markov induite par une politique limite ses performances dans les scénarios réels d'apprentissage continu. Pourtant, l'effet des temps de mélange sur l'apprentissage dans l'apprentissage par renforcement (RL) continu reste peu exploré. Dans cet article, nous caractérisons des problèmes qui sont d'un intérêt à long terme pour le développement de l'apprentissage continu, que nous appelons processus de décision markoviens (MDP) « extensibles » (scalable), à travers le prisme des temps de mélange. En particulier, nous établissons théoriquement que les MDP extensibles ont des temps de mélange qui varient de façon polynomiale avec la taille du problème. Nous démontrons ensuite que les temps de mélange polynomiaux présentent des difficultés importantes pour les approches existantes, qui souffrent d'un biais myope et d'estimations à base de ré-échantillonnage avec remise ensembliste (bootstrapping) périmées. Pour valider notre théorie, nous étudions la complexité des temps de mélange en fonction du nombre de tâches et de la durée des tâches pour des politiques très performantes déployées sur plusieurs jeux Atari. Notre analyse démontre à la fois que des temps de mélange polynomiaux apparaissent en pratique et que leur existence peut conduire à un comportement d'apprentissage instable, comme l'oubli catastrophique dans des contextes d'apprentissage continu.The mixing time of the Markov chain induced by a policy limits performance in real-world continual learning scenarios. Yet, the effect of mixing times on learning in continual reinforcement learning (RL) remains underexplored. In this paper, we characterize problems that are of long-term interest to the development of continual RL, which we call scalable MDPs, through the lens of mixing times. In particular, we theoretically establish that scalable MDPs have mixing times that scale polynomially with the size of the problem. We go on to demonstrate that polynomial mixing times present significant difficulties for existing approaches, which suffer from myopic bias and stale bootstrapped estimates. To validate our theory, we study the empirical scaling behavior of mixing times with respect to the number of tasks and task duration for high performing policies deployed across multiple Atari games. Our analysis demonstrates both that polynomial mixing times do emerge in practice and how their existence may lead to unstable learning behavior like catastrophic forgetting in continual learning settings

    System Design for an Integrated Lifelong Reinforcement Learning Agent for Real-Time Strategy Games

    Full text link
    As Artificial and Robotic Systems are increasingly deployed and relied upon for real-world applications, it is important that they exhibit the ability to continually learn and adapt in dynamically-changing environments, becoming Lifelong Learning Machines. Continual/lifelong learning (LL) involves minimizing catastrophic forgetting of old tasks while maximizing a model's capability to learn new tasks. This paper addresses the challenging lifelong reinforcement learning (L2RL) setting. Pushing the state-of-the-art forward in L2RL and making L2RL useful for practical applications requires more than developing individual L2RL algorithms; it requires making progress at the systems-level, especially research into the non-trivial problem of how to integrate multiple L2RL algorithms into a common framework. In this paper, we introduce the Lifelong Reinforcement Learning Components Framework (L2RLCF), which standardizes L2RL systems and assimilates different continual learning components (each addressing different aspects of the lifelong learning problem) into a unified system. As an instantiation of L2RLCF, we develop a standard API allowing easy integration of novel lifelong learning components. We describe a case study that demonstrates how multiple independently-developed LL components can be integrated into a single realized system. We also introduce an evaluation environment in order to measure the effect of combining various system components. Our evaluation environment employs different LL scenarios (sequences of tasks) consisting of Starcraft-2 minigames and allows for the fair, comprehensive, and quantitative comparison of different combinations of components within a challenging common evaluation environment.Comment: The Second International Conference on AIML Systems, October 12--15, 2022, Bangalore, Indi

    Towards General AI using Continual, Active Learning in Large and Few Shot Domains

    Get PDF
    Lifelong learning a.k.a Continual Learning is an advanced machine learning paradigm in which a system learns continuously, assembling the knowledge of prior skills in the process. The system becomes more proficient at acquiring new skill using its accumulated knowledge. This type of learning is one of the hallmarks of human intelligence. However, in the prevail- ing machine learning paradigm, each task is learned in isolation: given a dataset for a task, the system tries to find a machine learning model which performs well on the given dataset. Isolated learning paradigm has led to deep neural networks achieving the state-of-the-art performance on a wide variety of individual tasks. Although isolated learning has achieved much success in a number of applications, it has wide range of struggles while learning mul- tiple tasks in sequence. When trained on a new task using the isolated network performing well on prior task, standard neural network forget most of the information related to previous task by overwriting the old parameters for learning the new task at hand, a phenomenon often referred to as “catastrophic forgetting”. In comparison, humans can learn effectively new task without forgetting the old task and we can learn the new task quickly because we have gained so much knowledge in the past, which allows us to learn the new task with little data and lesser effort. This enables us to learn more and more continually in a self-motivated manner. We can also adapt our previous knowledge to solve unfamiliar problems, an ability beyond current machine learning systems

    Class-incremental learning: survey and performance evaluation

    Full text link
    For future learning systems incremental learning is desirable, because it allows for: efficient resource usage by eliminating the need to retrain from scratch at the arrival of new data; reduced memory usage by preventing or limiting the amount of data required to be stored -- also important when privacy limitations are imposed; and learning that more closely resembles human learning. The main challenge for incremental learning is catastrophic forgetting, which refers to the precipitous drop in performance on previously learned tasks after learning a new one. Incremental learning of deep neural networks has seen explosive growth in recent years. Initial work focused on task incremental learning, where a task-ID is provided at inference time. Recently we have seen a shift towards class-incremental learning where the learner must classify at inference time between all classes seen in previous tasks without recourse to a task-ID. In this paper, we provide a complete survey of existing methods for incremental learning, and in particular we perform an extensive experimental evaluation on twelve class-incremental methods. We consider several new experimental scenarios, including a comparison of class-incremental methods on multiple large-scale datasets, investigation into small and large domain shifts, and comparison on various network architectures
    • …
    corecore