540 research outputs found
Multitask reinforcement learning on the distribution of mdps,”
Abstract In this paper we address a new problem in reinforcement learning. Here we consider an agent that faces multiple learning tasks within its lifetime. The agent's objective is to maximize its total reward in the lifetime as well as a conventional return in each task. To realize this, it has to be endowed an important ability to keep its past learning experiences and utilize them for improving future learning performance. This time we try to phrase this problem formally. The central idea is to introduce an environmental class, BV-MDPs that is defined with the distribution of MDPs. As an approach to exploiting past learning experiences, we focus on statistics (mean and deviation) about the agent's value tables. The mean can be used as initial values of the table when a new task is presented. The deviation can be viewed as measuring reliability of the mean, and we utilize it in calculating priority of simulated backups. We conduct experiments in computer simulation to evaluate the effectiveness
Bayesian multitask inverse reinforcement learning
We generalise the problem of inverse reinforcement learning to multiple
tasks, from multiple demonstrations. Each one may represent one expert trying
to solve a different task, or as different experts trying to solve the same
task. Our main contribution is to formalise the problem as statistical
preference elicitation, via a number of structured priors, whose form captures
our biases about the relatedness of different tasks or expert policies. In
doing so, we introduce a prior on policy optimality, which is more natural to
specify. We show that our framework allows us not only to learn to efficiently
from multiple experts but to also effectively differentiate between the goals
of each. Possible applications include analysing the intrinsic motivations of
subjects in behavioural experiments and learning from multiple teachers.Comment: Corrected version. 13 pages, 8 figure
Classifying Options for Deep Reinforcement Learning
In this paper we combine one method for hierarchical reinforcement learning -
the options framework - with deep Q-networks (DQNs) through the use of
different "option heads" on the policy network, and a supervisory network for
choosing between the different options. We utilise our setup to investigate the
effects of architectural constraints in subtasks with positive and negative
transfer, across a range of network capacities. We empirically show that our
augmented DQN has lower sample complexity when simultaneously learning subtasks
with negative transfer, without degrading performance when learning subtasks
with positive transfer.Comment: IJCAI 2016 Workshop on Deep Reinforcement Learning: Frontiers and
Challenge
- …