149 research outputs found
Scalable Recollections for Continual Lifelong Learning
Given the recent success of Deep Learning applied to a variety of single
tasks, it is natural to consider more human-realistic settings. Perhaps the
most difficult of these settings is that of continual lifelong learning, where
the model must learn online over a continuous stream of non-stationary data. A
successful continual lifelong learning system must have three key capabilities:
it must learn and adapt over time, it must not forget what it has learned, and
it must be efficient in both training time and memory. Recent techniques have
focused their efforts primarily on the first two capabilities while questions
of efficiency remain largely unexplored. In this paper, we consider the problem
of efficient and effective storage of experiences over very large time-frames.
In particular we consider the case where typical experiences are O(n) bits and
memories are limited to O(k) bits for k << n. We present a novel scalable
architecture and training algorithm in this challenging domain and provide an
extensive evaluation of its performance. Our results show that we can achieve
considerable gains on top of state-of-the-art methods such as GEM.Comment: AAAI 201
Towards Continual Reinforcement Learning: A Review and Perspectives
In this article, we aim to provide a literature review of different
formulations and approaches to continual reinforcement learning (RL), also
known as lifelong or non-stationary RL. We begin by discussing our perspective
on why RL is a natural fit for studying continual learning. We then provide a
taxonomy of different continual RL formulations and mathematically characterize
the non-stationary dynamics of each setting. We go on to discuss evaluation of
continual RL agents, providing an overview of benchmarks used in the literature
and important metrics for understanding agent performance. Finally, we
highlight open problems and challenges in bridging the gap between the current
state of continual RL and findings in neuroscience. While still in its early
days, the study of continual RL has the promise to develop better incremental
reinforcement learners that can function in increasingly realistic applications
where non-stationarity plays a vital role. These include applications such as
those in the fields of healthcare, education, logistics, and robotics.Comment: Preprint, 52 pages, 8 figure
On impact of mixing times in continual reinforcement learning
Le temps de mélange de la chaîne de Markov induite par une politique limite ses performances dans les scénarios réels d'apprentissage continu. Pourtant, l'effet des temps de mélange sur l'apprentissage dans l'apprentissage par renforcement (RL) continu reste peu exploré. Dans cet article, nous caractérisons des problèmes qui sont d'un intérêt à long terme pour le développement de l'apprentissage continu, que nous appelons processus de décision markoviens (MDP) « extensibles » (scalable), à travers le prisme des temps de mélange. En particulier, nous établissons théoriquement que les MDP extensibles ont des temps de mélange qui varient de façon polynomiale avec la taille du problème. Nous démontrons ensuite que les temps de mélange polynomiaux présentent des difficultés importantes pour les approches existantes, qui souffrent d'un biais myope et d'estimations à base de ré-échantillonnage avec remise ensembliste (bootstrapping) périmées. Pour valider notre théorie, nous étudions la complexité des temps de mélange en fonction du nombre de tâches et de la durée des tâches pour des politiques très performantes déployées sur plusieurs jeux Atari. Notre analyse démontre à la fois que des temps de mélange polynomiaux apparaissent en pratique et que leur existence peut conduire à un comportement d'apprentissage instable, comme l'oubli catastrophique dans des contextes d'apprentissage continu.The mixing time of the Markov chain induced by a policy limits performance in real-world continual learning scenarios. Yet, the effect of mixing times on learning in continual reinforcement learning (RL) remains underexplored. In this paper, we characterize problems that are of long-term interest to the development of continual RL, which we call scalable MDPs, through the lens of mixing times. In particular, we theoretically establish that scalable MDPs have mixing times that scale polynomially with the size of the problem. We go on to demonstrate that polynomial mixing times present significant difficulties for existing approaches, which suffer from myopic bias and stale bootstrapped estimates. To validate our theory, we study the empirical scaling behavior of mixing times with respect to the number of tasks and task duration for high performing policies deployed across multiple Atari games. Our analysis demonstrates both that polynomial mixing times do emerge in practice and how their existence may lead to unstable learning behavior like catastrophic forgetting in continual learning settings
System Design for an Integrated Lifelong Reinforcement Learning Agent for Real-Time Strategy Games
As Artificial and Robotic Systems are increasingly deployed and relied upon
for real-world applications, it is important that they exhibit the ability to
continually learn and adapt in dynamically-changing environments, becoming
Lifelong Learning Machines. Continual/lifelong learning (LL) involves
minimizing catastrophic forgetting of old tasks while maximizing a model's
capability to learn new tasks. This paper addresses the challenging lifelong
reinforcement learning (L2RL) setting. Pushing the state-of-the-art forward in
L2RL and making L2RL useful for practical applications requires more than
developing individual L2RL algorithms; it requires making progress at the
systems-level, especially research into the non-trivial problem of how to
integrate multiple L2RL algorithms into a common framework. In this paper, we
introduce the Lifelong Reinforcement Learning Components Framework (L2RLCF),
which standardizes L2RL systems and assimilates different continual learning
components (each addressing different aspects of the lifelong learning problem)
into a unified system. As an instantiation of L2RLCF, we develop a standard API
allowing easy integration of novel lifelong learning components. We describe a
case study that demonstrates how multiple independently-developed LL components
can be integrated into a single realized system. We also introduce an
evaluation environment in order to measure the effect of combining various
system components. Our evaluation environment employs different LL scenarios
(sequences of tasks) consisting of Starcraft-2 minigames and allows for the
fair, comprehensive, and quantitative comparison of different combinations of
components within a challenging common evaluation environment.Comment: The Second International Conference on AIML Systems, October 12--15,
2022, Bangalore, Indi
Towards General AI using Continual, Active Learning in Large and Few Shot Domains
Lifelong learning a.k.a Continual Learning is an advanced machine learning paradigm in which a system learns continuously, assembling the knowledge of prior skills in the process. The system becomes more proficient at acquiring new skill using its accumulated knowledge. This type of learning is one of the hallmarks of human intelligence. However, in the prevail- ing machine learning paradigm, each task is learned in isolation: given a dataset for a task, the system tries to find a machine learning model which performs well on the given dataset. Isolated learning paradigm has led to deep neural networks achieving the state-of-the-art performance on a wide variety of individual tasks. Although isolated learning has achieved much success in a number of applications, it has wide range of struggles while learning mul- tiple tasks in sequence. When trained on a new task using the isolated network performing well on prior task, standard neural network forget most of the information related to previous task by overwriting the old parameters for learning the new task at hand, a phenomenon often referred to as “catastrophic forgetting”. In comparison, humans can learn effectively new task without forgetting the old task and we can learn the new task quickly because we have gained so much knowledge in the past, which allows us to learn the new task with little data and lesser effort. This enables us to learn more and more continually in a self-motivated manner. We can also adapt our previous knowledge to solve unfamiliar problems, an ability beyond current machine learning systems
Class-incremental learning: survey and performance evaluation
For future learning systems incremental learning is desirable, because it
allows for: efficient resource usage by eliminating the need to retrain from
scratch at the arrival of new data; reduced memory usage by preventing or
limiting the amount of data required to be stored -- also important when
privacy limitations are imposed; and learning that more closely resembles human
learning. The main challenge for incremental learning is catastrophic
forgetting, which refers to the precipitous drop in performance on previously
learned tasks after learning a new one. Incremental learning of deep neural
networks has seen explosive growth in recent years. Initial work focused on
task incremental learning, where a task-ID is provided at inference time.
Recently we have seen a shift towards class-incremental learning where the
learner must classify at inference time between all classes seen in previous
tasks without recourse to a task-ID. In this paper, we provide a complete
survey of existing methods for incremental learning, and in particular we
perform an extensive experimental evaluation on twelve class-incremental
methods. We consider several new experimental scenarios, including a comparison
of class-incremental methods on multiple large-scale datasets, investigation
into small and large domain shifts, and comparison on various network
architectures
- …