25,593 research outputs found
Expert Gate: Lifelong Learning with a Network of Experts
In this paper we introduce a model of lifelong learning, based on a Network
of Experts. New tasks / experts are learned and added to the model
sequentially, building on what was learned before. To ensure scalability of
this process,data from previous tasks cannot be stored and hence is not
available when learning a new task. A critical issue in such context, not
addressed in the literature so far, relates to the decision which expert to
deploy at test time. We introduce a set of gating autoencoders that learn a
representation for the task at hand, and, at test time, automatically forward
the test sample to the relevant expert. This also brings memory efficiency as
only one expert network has to be loaded into memory at any given time.
Further, the autoencoders inherently capture the relatedness of one task to
another, based on which the most relevant prior model to be used for training a
new expert, with finetuning or learning without-forgetting, can be selected. We
evaluate our method on image classification and video prediction problems.Comment: CVPR 2017 pape
Lifelong Learning of Spatiotemporal Representations with Dual-Memory Recurrent Self-Organization
Artificial autonomous agents and robots interacting in complex environments
are required to continually acquire and fine-tune knowledge over sustained
periods of time. The ability to learn from continuous streams of information is
referred to as lifelong learning and represents a long-standing challenge for
neural network models due to catastrophic forgetting. Computational models of
lifelong learning typically alleviate catastrophic forgetting in experimental
scenarios with given datasets of static images and limited complexity, thereby
differing significantly from the conditions artificial agents are exposed to.
In more natural settings, sequential information may become progressively
available over time and access to previous experience may be restricted. In
this paper, we propose a dual-memory self-organizing architecture for lifelong
learning scenarios. The architecture comprises two growing recurrent networks
with the complementary tasks of learning object instances (episodic memory) and
categories (semantic memory). Both growing networks can expand in response to
novel sensory experience: the episodic memory learns fine-grained
spatiotemporal representations of object instances in an unsupervised fashion
while the semantic memory uses task-relevant signals to regulate structural
plasticity levels and develop more compact representations from episodic
experience. For the consolidation of knowledge in the absence of external
sensory input, the episodic memory periodically replays trajectories of neural
reactivations. We evaluate the proposed model on the CORe50 benchmark dataset
for continuous object recognition, showing that we significantly outperform
current methods of lifelong learning in three different incremental learning
scenario
Lifetime policy reuse and the importance of task capacity
A long-standing challenge in artificial intelligence is lifelong learning. In
lifelong learning, many tasks are presented in sequence and learners must
efficiently transfer knowledge between tasks while avoiding catastrophic
forgetting over long lifetimes. On these problems, policy reuse and other
multi-policy reinforcement learning techniques can learn many tasks. However,
they can generate many temporary or permanent policies, resulting in memory
issues. Consequently, there is a need for lifetime-scalable methods that
continually refine a policy library of a pre-defined size. This paper presents
a first approach to lifetime-scalable policy reuse. To pre-select the number of
policies, a notion of task capacity, the maximal number of tasks that a policy
can accurately solve, is proposed. To evaluate lifetime policy reuse using this
method, two state-of-the-art single-actor base-learners are compared: 1) a
value-based reinforcement learner, Deep Q-Network (DQN) or Deep Recurrent
Q-Network (DRQN); and 2) an actor-critic reinforcement learner, Proximal Policy
Optimisation (PPO) with or without Long Short-Term Memory layer. By selecting
the number of policies based on task capacity, D(R)QN achieves near-optimal
performance with 6 policies in a 27-task MDP domain and 9 policies in an
18-task POMDP domain; with fewer policies, catastrophic forgetting and negative
transfer are observed. Due to slow, monotonic improvement, PPO requires fewer
policies, 1 policy for the 27-task domain and 4 policies for the 18-task
domain, but it learns the tasks with lower accuracy than D(R)QN. These findings
validate lifetime-scalable policy reuse and suggest using D(R)QN for larger and
PPO for smaller library sizes
- …