196 research outputs found
Benchmarking Deep Reinforcement Learning for Continuous Control
Recently, researchers have made significant progress combining the advances
in deep learning for learning feature representations with reinforcement
learning. Some notable examples include training agents to play Atari games
based on raw pixel data and to acquire advanced manipulation skills using raw
sensory inputs. However, it has been difficult to quantify progress in the
domain of continuous control due to the lack of a commonly adopted benchmark.
In this work, we present a benchmark suite of continuous control tasks,
including classic tasks like cart-pole swing-up, tasks with very high state and
action dimensionality such as 3D humanoid locomotion, tasks with partial
observations, and tasks with hierarchical structure. We report novel findings
based on the systematic evaluation of a range of implemented reinforcement
learning algorithms. Both the benchmark and reference implementations are
released at https://github.com/rllab/rllab in order to facilitate experimental
reproducibility and to encourage adoption by other researchers.Comment: 14 pages, ICML 201
Self-Imitation Advantage Learning
Self-imitation learning is a Reinforcement Learning (RL) method that
encourages actions whose returns were higher than expected, which helps in hard
exploration and sparse reward problems. It was shown to improve the performance
of on-policy actor-critic methods in several discrete control tasks.
Nevertheless, applying self-imitation to the mostly action-value based
off-policy RL methods is not straightforward. We propose SAIL, a novel
generalization of self-imitation learning for off-policy RL, based on a
modification of the Bellman optimality operator that we connect to Advantage
Learning. Crucially, our method mitigates the problem of stale returns by
choosing the most optimistic return estimate between the observed return and
the current action-value for self-imitation. We demonstrate the empirical
effectiveness of SAIL on the Arcade Learning Environment, with a focus on hard
exploration games.Comment: AAMAS 202
Recommended from our members
Towards Informed Exploration for Deep Reinforcement Learning
In this thesis, we discuss various techniques for improving exploration for deep reinforcement learning. We begin with a brief review of reinforcement learning (RL) and the fundamental v.s. exploitation trade-off. Then we review how deep RL has improved upon classical and summarize six categories of the latest exploration methods for deep RL, in the order increasing usage of prior information. We then explore representative works in three categories discuss their strengths and weaknesses. The first category, represented by Soft Q-learning, uses regularization to encourage exploration. The second category, represented by count-based via hashing, maps states to hash codes for counting and assigns higher exploration to less-encountered states. The third category utilizes hierarchy and is represented by modular architecture for RL agents to play StarCraft II. Finally, we conclude that exploration by prior knowledge is a promising research direction and suggest topics of potentially impact
- …