4,740 research outputs found
Model Learning for Look-ahead Exploration in Continuous Control
We propose an exploration method that incorporates look-ahead search over
basic learnt skills and their dynamics, and use it for reinforcement learning
(RL) of manipulation policies . Our skills are multi-goal policies learned in
isolation in simpler environments using existing multigoal RL formulations,
analogous to options or macroactions. Coarse skill dynamics, i.e., the state
transition caused by a (complete) skill execution, are learnt and are unrolled
forward during lookahead search. Policy search benefits from temporal
abstraction during exploration, though itself operates over low-level primitive
actions, and thus the resulting policies does not suffer from suboptimality and
inflexibility caused by coarse skill chaining. We show that the proposed
exploration strategy results in effective learning of complex manipulation
policies faster than current state-of-the-art RL methods, and converges to
better policies than methods that use options or parametrized skills as
building blocks of the policy itself, as opposed to guiding exploration. We
show that the proposed exploration strategy results in effective learning of
complex manipulation policies faster than current state-of-the-art RL methods,
and converges to better policies than methods that use options or parameterized
skills as building blocks of the policy itself, as opposed to guiding
exploration.Comment: This is a pre-print of our paper which is accepted in AAAI 201
Recommended from our members
Towards Informed Exploration for Deep Reinforcement Learning
In this thesis, we discuss various techniques for improving exploration for deep reinforcement learning. We begin with a brief review of reinforcement learning (RL) and the fundamental v.s. exploitation trade-off. Then we review how deep RL has improved upon classical and summarize six categories of the latest exploration methods for deep RL, in the order increasing usage of prior information. We then explore representative works in three categories discuss their strengths and weaknesses. The first category, represented by Soft Q-learning, uses regularization to encourage exploration. The second category, represented by count-based via hashing, maps states to hash codes for counting and assigns higher exploration to less-encountered states. The third category utilizes hierarchy and is represented by modular architecture for RL agents to play StarCraft II. Finally, we conclude that exploration by prior knowledge is a promising research direction and suggest topics of potentially impact
MaMiC: Macro and Micro Curriculum for Robotic Reinforcement Learning
Shaping in humans and animals has been shown to be a powerful tool for
learning complex tasks as compared to learning in a randomized fashion. This
makes the problem less complex and enables one to solve the easier sub task at
hand first. Generating a curriculum for such guided learning involves
subjecting the agent to easier goals first, and then gradually increasing their
difficulty. This paper takes a similar direction and proposes a dual curriculum
scheme for solving robotic manipulation tasks with sparse rewards, called
MaMiC. It includes a macro curriculum scheme which divides the task into
multiple sub-tasks followed by a micro curriculum scheme which enables the
agent to learn between such discovered sub-tasks. We show how combining macro
and micro curriculum strategies help in overcoming major exploratory
constraints considered in robot manipulation tasks without having to engineer
any complex rewards. We also illustrate the meaning of the individual curricula
and how they can be used independently based on the task. The performance of
such a dual curriculum scheme is analyzed on the Fetch environments.Comment: To appear in the Proceedings of the 18th International Conference on
Autonomous Agents and Multiagent Systems (AAMAS 2019). (Extended Abstract
- …