61,759 research outputs found
Grounding Language for Transfer in Deep Reinforcement Learning
In this paper, we explore the utilization of natural language to drive
transfer for reinforcement learning (RL). Despite the wide-spread application
of deep RL techniques, learning generalized policy representations that work
across domains remains a challenging problem. We demonstrate that textual
descriptions of environments provide a compact intermediate channel to
facilitate effective policy transfer. Specifically, by learning to ground the
meaning of text to the dynamics of the environment such as transitions and
rewards, an autonomous agent can effectively bootstrap policy learning on a new
domain given its description. We employ a model-based RL approach consisting of
a differentiable planning module, a model-free component and a factorized state
representation to effectively use entity descriptions. Our model outperforms
prior work on both transfer and multi-task scenarios in a variety of different
environments. For instance, we achieve up to 14% and 11.5% absolute improvement
over previously existing models in terms of average and initial rewards,
respectively.Comment: JAIR 201
Towards Task-Prioritized Policy Composition
Combining learned policies in a prioritized, ordered manner is desirable
because it allows for modular design and facilitates data reuse through
knowledge transfer. In control theory, prioritized composition is realized by
null-space control, where low-priority control actions are projected into the
null-space of high-priority control actions. Such a method is currently
unavailable for Reinforcement Learning. We propose a novel, task-prioritized
composition framework for Reinforcement Learning, which involves a novel
concept: The indifferent-space of Reinforcement Learning policies. Our
framework has the potential to facilitate knowledge transfer and modular design
while greatly increasing data efficiency and data reuse for Reinforcement
Learning agents. Further, our approach can ensure high-priority constraint
satisfaction, which makes it promising for learning in safety-critical domains
like robotics. Unlike null-space control, our approach allows learning globally
optimal policies for the compound task by online learning in the
indifference-space of higher-level policies after initial compound policy
construction
- …