20,385 research outputs found
Exploring Restart Distributions
We consider the generic approach of using an experience memory to help
exploration by adapting a restart distribution. That is, given the capacity to
reset the state with those corresponding to the agent's past observations, we
help exploration by promoting faster state-space coverage via restarting the
agent from a more diverse set of initial states, as well as allowing it to
restart in states associated with significant past experiences. This approach
is compatible with both on-policy and off-policy methods. However, a caveat is
that altering the distribution of initial states could change the optimal
policies when searching within a restricted class of policies. To reduce this
unsought learning bias, we evaluate our approach in deep reinforcement learning
which benefits from the high representational capacity of deep neural networks.
We instantiate three variants of our approach, each inspired by an idea in the
context of experience replay. Using these variants, we show that performance
gains can be achieved, especially in hard exploration problems.Comment: RLDM 201
Recommended from our members
Towards Informed Exploration for Deep Reinforcement Learning
In this thesis, we discuss various techniques for improving exploration for deep reinforcement learning. We begin with a brief review of reinforcement learning (RL) and the fundamental v.s. exploitation trade-off. Then we review how deep RL has improved upon classical and summarize six categories of the latest exploration methods for deep RL, in the order increasing usage of prior information. We then explore representative works in three categories discuss their strengths and weaknesses. The first category, represented by Soft Q-learning, uses regularization to encourage exploration. The second category, represented by count-based via hashing, maps states to hash codes for counting and assigns higher exploration to less-encountered states. The third category utilizes hierarchy and is represented by modular architecture for RL agents to play StarCraft II. Finally, we conclude that exploration by prior knowledge is a promising research direction and suggest topics of potentially impact
- …