43,033 research outputs found
A Model-based Approach for Sample-efficient Multi-task Reinforcement Learning
The aim of multi-task reinforcement learning is two-fold: (1) efficiently
learn by training against multiple tasks and (2) quickly adapt, using limited
samples, to a variety of new tasks. In this work, the tasks correspond to
reward functions for environments with the same (or similar) dynamical models.
We propose to learn a dynamical model during the training process and use this
model to perform sample-efficient adaptation to new tasks at test time. We use
significantly fewer samples by performing policy optimization only in a
"virtual" environment whose transitions are given by our learned dynamical
model. Our algorithm sequentially trains against several tasks. Upon
encountering a new task, we first warm-up a policy on our learned dynamical
model, which requires no new samples from the environment. We then adapt the
dynamical model with samples from this policy in the real environment. We
evaluate our approach on several continuous control benchmarks and demonstrate
its efficacy over MAML, a state-of-the-art meta-learning algorithm, on these
tasks.Comment: 13 pages, 3 figure
Double Meta-Learning for Data Efficient Policy Optimization in Non-Stationary Environments
We are interested in learning models of non-stationary environments, which
can be framed as a multi-task learning problem. Model-free reinforcement
learning algorithms can achieve good asymptotic performance in multi-task
learning at a cost of extensive sampling, due to their approach, which requires
learning from scratch. While model-based approaches are among the most data
efficient learning algorithms, they still struggle with complex tasks and model
uncertainties. Meta-reinforcement learning addresses the efficiency and
generalization challenges on multi task learning by quickly leveraging the
meta-prior policy for a new task. In this paper, we propose a
meta-reinforcement learning approach to learn the dynamic model of a
non-stationary environment to be used for meta-policy optimization later. Due
to the sample efficiency of model-based learning methods, we are able to
simultaneously train both the meta-model of the non-stationary environment and
the meta-policy until dynamic model convergence. Then, the meta-learned dynamic
model of the environment will generate simulated data for meta-policy
optimization. Our experiment demonstrates that our proposed method can
meta-learn the policy in a non-stationary environment with the data efficiency
of model-based learning approaches while achieving the high asymptotic
performance of model-free meta-reinforcement learning.Comment: 8 pages, 4 figure
Hierarchical and Interpretable Skill Acquisition in Multi-task Reinforcement Learning
Learning policies for complex tasks that require multiple different skills is
a major challenge in reinforcement learning (RL). It is also a requirement for
its deployment in real-world scenarios. This paper proposes a novel framework
for efficient multi-task reinforcement learning. Our framework trains agents to
employ hierarchical policies that decide when to use a previously learned
policy and when to learn a new skill. This enables agents to continually
acquire new skills during different stages of training. Each learned task
corresponds to a human language description. Because agents can only access
previously learned skills through these descriptions, the agent can always
provide a human-interpretable description of its choices. In order to help the
agent learn the complex temporal dependencies necessary for the hierarchical
policy, we provide it with a stochastic temporal grammar that modulates when to
rely on previously learned skills and when to execute new skills. We validate
our approach on Minecraft games designed to explicitly test the ability to
reuse previously learned skills while simultaneously learning new skills.Comment: 14 pages, 6 figure
Visual Reinforcement Learning with Imagined Goals
For an autonomous agent to fulfill a wide range of user-specified goals at
test time, it must be able to learn broadly applicable and general-purpose
skill repertoires. Furthermore, to provide the requisite level of generality,
these skills must handle raw sensory input such as images. In this paper, we
propose an algorithm that acquires such general-purpose skills by combining
unsupervised representation learning and reinforcement learning of
goal-conditioned policies. Since the particular goals that might be required at
test-time are not known in advance, the agent performs a self-supervised
"practice" phase where it imagines goals and attempts to achieve them. We learn
a visual representation with three distinct purposes: sampling goals for
self-supervised practice, providing a structured transformation of raw sensory
inputs, and computing a reward signal for goal reaching. We also propose a
retroactive goal relabeling scheme to further improve the sample-efficiency of
our method. Our off-policy algorithm is efficient enough to learn policies that
operate on raw image observations and goals for a real-world robotic system,
and substantially outperforms prior techniques.Comment: 15 pages, NeurIPS 201
Continuous Deep Q-Learning with Model-based Acceleration
Model-free reinforcement learning has been successfully applied to a range of
challenging problems, and has recently been extended to handle large neural
network policies and value functions. However, the sample complexity of
model-free algorithms, particularly when using high-dimensional function
approximators, tends to limit their applicability to physical systems. In this
paper, we explore algorithms and representations to reduce the sample
complexity of deep reinforcement learning for continuous control tasks. We
propose two complementary techniques for improving the efficiency of such
algorithms. First, we derive a continuous variant of the Q-learning algorithm,
which we call normalized adantage functions (NAF), as an alternative to the
more commonly used policy gradient and actor-critic methods. NAF representation
allows us to apply Q-learning with experience replay to continuous tasks, and
substantially improves performance on a set of simulated robotic control tasks.
To further improve the efficiency of our approach, we explore the use of
learned models for accelerating model-free reinforcement learning. We show that
iteratively refitted local linear models are especially effective for this, and
demonstrate substantially faster learning on domains where such models are
applicable
The MineRL 2019 Competition on Sample Efficient Reinforcement Learning using Human Priors
Though deep reinforcement learning has led to breakthroughs in many difficult
domains, these successes have required an ever-increasing number of samples. As
state-of-the-art reinforcement learning (RL) systems require an exponentially
increasing number of samples, their development is restricted to a continually
shrinking segment of the AI community. Likewise, many of these systems cannot
be applied to real-world problems, where environment samples are expensive.
Resolution of these limitations requires new, sample-efficient methods. To
facilitate research in this direction, we introduce the MineRL Competition on
Sample Efficient Reinforcement Learning using Human Priors.
The primary goal of the competition is to foster the development of
algorithms which can efficiently leverage human demonstrations to drastically
reduce the number of samples needed to solve complex, hierarchical, and sparse
environments. To that end, we introduce: (1) the Minecraft ObtainDiamond task,
a sequential decision making environment requiring long-term planning,
hierarchical control, and efficient exploration methods; and (2) the MineRL-v0
dataset, a large-scale collection of over 60 million state-action pairs of
human demonstrations that can be resimulated into embodied trajectories with
arbitrary modifications to game state and visuals.
Participants will compete to develop systems which solve the ObtainDiamond
task with a limited number of samples from the environment simulator, Malmo.
The competition is structured into two rounds in which competitors are provided
several paired versions of the dataset and environment with different game
textures. At the end of each round, competitors will submit containerized
versions of their learning algorithms and they will then be trained/evaluated
from scratch on a hold-out dataset-environment pair for a total of 4-days on a
prespecified hardware platform.Comment: accepted at NeurIPS 2019, 28 page
Multiobjective Reinforcement Learning for Reconfigurable Adaptive Optimal Control of Manufacturing Processes
In industrial applications of adaptive optimal control often multiple
contrary objectives have to be considered. The weights (relative importance) of
the objectives are often not known during the design of the control and can
change with changing production conditions and requirements. In this work a
novel model-free multiobjective reinforcement learning approach for adaptive
optimal control of manufacturing processes is proposed. The approach enables
sample-efficient learning in sequences of control configurations, given by
particular objective weights.Comment: Conference, Preprint, 978-1-5386-5925-0/18/$31.00 \c{opyright} 2018
IEE
Recommended from our members
Towards Informed Exploration for Deep Reinforcement Learning
In this thesis, we discuss various techniques for improving exploration for deep reinforcement learning. We begin with a brief review of reinforcement learning (RL) and the fundamental v.s. exploitation trade-off. Then we review how deep RL has improved upon classical and summarize six categories of the latest exploration methods for deep RL, in the order increasing usage of prior information. We then explore representative works in three categories discuss their strengths and weaknesses. The first category, represented by Soft Q-learning, uses regularization to encourage exploration. The second category, represented by count-based via hashing, maps states to hash codes for counting and assigns higher exploration to less-encountered states. The third category utilizes hierarchy and is represented by modular architecture for RL agents to play StarCraft II. Finally, we conclude that exploration by prior knowledge is a promising research direction and suggest topics of potentially impact
Sample-efficient Actor-Critic Reinforcement Learning with Supervised Data for Dialogue Management
Deep reinforcement learning (RL) methods have significant potential for
dialogue policy optimisation. However, they suffer from a poor performance in
the early stages of learning. This is especially problematic for on-line
learning with real users. Two approaches are introduced to tackle this problem.
Firstly, to speed up the learning process, two sample-efficient neural networks
algorithms: trust region actor-critic with experience replay (TRACER) and
episodic natural actor-critic with experience replay (eNACER) are presented.
For TRACER, the trust region helps to control the learning step size and avoid
catastrophic model changes. For eNACER, the natural gradient identifies the
steepest ascent direction in policy space to speed up the convergence. Both
models employ off-policy learning with experience replay to improve
sample-efficiency. Secondly, to mitigate the cold start issue, a corpus of
demonstration data is utilised to pre-train the models prior to on-line
reinforcement learning. Combining these two approaches, we demonstrate a
practical approach to learn deep RL-based dialogue policies and demonstrate
their effectiveness in a task-oriented information seeking domain.Comment: Accepted as a long paper in SigDial 201
A Survey and Critique of Multiagent Deep Reinforcement Learning
Deep reinforcement learning (RL) has achieved outstanding results in recent
years. This has led to a dramatic increase in the number of applications and
methods. Recent works have explored learning beyond single-agent scenarios and
have considered multiagent learning (MAL) scenarios. Initial results report
successes in complex multiagent domains, although there are several challenges
to be addressed. The primary goal of this article is to provide a clear
overview of current multiagent deep reinforcement learning (MDRL) literature.
Additionally, we complement the overview with a broader analysis: (i) we
revisit previous key components, originally presented in MAL and RL, and
highlight how they have been adapted to multiagent deep reinforcement learning
settings. (ii) We provide general guidelines to new practitioners in the area:
describing lessons learned from MDRL works, pointing to recent benchmarks, and
outlining open avenues of research. (iii) We take a more critical tone raising
practical challenges of MDRL (e.g., implementation and computational demands).
We expect this article will help unify and motivate future research to take
advantage of the abundant literature that exists (e.g., RL and MAL) in a joint
effort to promote fruitful research in the multiagent community.Comment: Under review since Oct 2018. Earlier versions of this work had the
title: "Is multiagent deep reinforcement learning the answer or the question?
A brief survey
- …