11,430 research outputs found
VIME: Variational Information Maximizing Exploration
Scalable and effective exploration remains a key challenge in reinforcement
learning (RL). While there are methods with optimality guarantees in the
setting of discrete state and action spaces, these methods cannot be applied in
high-dimensional deep RL scenarios. As such, most contemporary RL relies on
simple heuristics such as epsilon-greedy exploration or adding Gaussian noise
to the controls. This paper introduces Variational Information Maximizing
Exploration (VIME), an exploration strategy based on maximization of
information gain about the agent's belief of environment dynamics. We propose a
practical implementation, using variational inference in Bayesian neural
networks which efficiently handles continuous state and action spaces. VIME
modifies the MDP reward function, and can be applied with several different
underlying RL algorithms. We demonstrate that VIME achieves significantly
better performance compared to heuristic exploration methods across a variety
of continuous control tasks and algorithms, including tasks with very sparse
rewards.Comment: Published in Advances in Neural Information Processing Systems 29
(NIPS), pages 1109-111
REinforcement learning based Adaptive samPling: REAPing Rewards by Exploring Protein Conformational Landscapes
One of the key limitations of Molecular Dynamics simulations is the
computational intractability of sampling protein conformational landscapes
associated with either large system size or long timescales. To overcome this
bottleneck, we present the REinforcement learning based Adaptive samPling
(REAP) algorithm that aims to efficiently sample conformational space by
learning the relative importance of each reaction coordinate as it samples the
landscape. To achieve this, the algorithm uses concepts from the field of
reinforcement learning, a subset of machine learning, which rewards sampling
along important degrees of freedom and disregards others that do not facilitate
exploration or exploitation. We demonstrate the effectiveness of REAP by
comparing the sampling to long continuous MD simulations and least-counts
adaptive sampling on two model landscapes (L-shaped and circular), and
realistic systems such as alanine dipeptide and Src kinase. In all four
systems, the REAP algorithm consistently demonstrates its ability to explore
conformational space faster than the other two methods when comparing the
expected values of the landscape discovered for a given amount of time. The key
advantage of REAP is on-the-fly estimation of the importance of collective
variables, which makes it particularly useful for systems with limited
structural information
Recommended from our members
Towards Informed Exploration for Deep Reinforcement Learning
In this thesis, we discuss various techniques for improving exploration for deep reinforcement learning. We begin with a brief review of reinforcement learning (RL) and the fundamental v.s. exploitation trade-off. Then we review how deep RL has improved upon classical and summarize six categories of the latest exploration methods for deep RL, in the order increasing usage of prior information. We then explore representative works in three categories discuss their strengths and weaknesses. The first category, represented by Soft Q-learning, uses regularization to encourage exploration. The second category, represented by count-based via hashing, maps states to hash codes for counting and assigns higher exploration to less-encountered states. The third category utilizes hierarchy and is represented by modular architecture for RL agents to play StarCraft II. Finally, we conclude that exploration by prior knowledge is a promising research direction and suggest topics of potentially impact
- …