1,809 research outputs found

    Exploring Restart Distributions

    Get PDF
    We consider the generic approach of using an experience memory to help exploration by adapting a restart distribution. That is, given the capacity to reset the state with those corresponding to the agent's past observations, we help exploration by promoting faster state-space coverage via restarting the agent from a more diverse set of initial states, as well as allowing it to restart in states associated with significant past experiences. This approach is compatible with both on-policy and off-policy methods. However, a caveat is that altering the distribution of initial states could change the optimal policies when searching within a restricted class of policies. To reduce this unsought learning bias, we evaluate our approach in deep reinforcement learning which benefits from the high representational capacity of deep neural networks. We instantiate three variants of our approach, each inspired by an idea in the context of experience replay. Using these variants, we show that performance gains can be achieved, especially in hard exploration problems.Comment: RLDM 201

    Combining Experience Replay with Exploration by Random Network Distillation

    Full text link
    Our work is a simple extension of the paper "Exploration by Random Network Distillation". More in detail, we show how to efficiently combine Intrinsic Rewards with Experience Replay in order to achieve more efficient and robust exploration (with respect to PPO/RND) and consequently better results in terms of agent performances and sample efficiency. We are able to do it by using a new technique named Prioritized Oversampled Experience Replay (POER), that has been built upon the definition of what is the important experience useful to replay. Finally, we evaluate our technique on the famous Atari game Montezuma's Revenge and some other hard exploration Atari games.Comment: 8 pages, 6 figures, accepted as full-paper at IEEE Conference on Games (CoG) 201

    Trajectory-Based Off-Policy Deep Reinforcement Learning

    Full text link
    Policy gradient methods are powerful reinforcement learning algorithms and have been demonstrated to solve many complex tasks. However, these methods are also data-inefficient, afflicted with high variance gradient estimates, and frequently get stuck in local optima. This work addresses these weaknesses by combining recent improvements in the reuse of off-policy data and exploration in parameter space with deterministic behavioral policies. The resulting objective is amenable to standard neural network optimization strategies like stochastic gradient descent or stochastic gradient Hamiltonian Monte Carlo. Incorporation of previous rollouts via importance sampling greatly improves data-efficiency, whilst stochastic optimization schemes facilitate the escape from local optima. We evaluate the proposed approach on a series of continuous control benchmark tasks. The results show that the proposed algorithm is able to successfully and reliably learn solutions using fewer system interactions than standard policy gradient methods.Comment: Includes appendix. Accepted for ICML 201
    corecore