13,123 research outputs found

    Attention-based experience replay in deep Q-learning

    Get PDF
    Using neural networks as function approximators in temporal difference reinforcement problems proved to be very effective in dealing with high-dimensionality of input state space, especially in more recent developments such as Deep Q-learning. These approaches share the use of a mechanism, called experience replay, that uniformly samples the previous experiences to a memory buffer to exploit them to re-learn, thus improving the efficiency of the learning process. In order to increase the learning performance, techniques such as prioritized experience and prioritized sampling have been introduced to deal with storing and replaying, respectively, the transitions with larger TD error. In this paper, we present a concept, called Attention-Based Experience REplay (ABERE), concerned with selective focusing of the replay buffer to specific types of experiences, therefore modeling the behavioral characteristics of the learning agent in a single and multi-agent environment. We further explore how different behavioral characteristics influence the performance of agents faced with dynamic environment that is able to become more hostile or benevolent by changing the relative probability to get positive or negative reinforcement

    Attention Loss Adjusted Prioritized Experience Replay

    Full text link
    Prioritized Experience Replay (PER) is a technical means of deep reinforcement learning by selecting experience samples with more knowledge quantity to improve the training rate of neural network. However, the non-uniform sampling used in PER inevitably shifts the state-action space distribution and brings the estimation error of Q-value function. In this paper, an Attention Loss Adjusted Prioritized (ALAP) Experience Replay algorithm is proposed, which integrates the improved Self-Attention network with Double-Sampling mechanism to fit the hyperparameter that can regulate the importance sampling weights to eliminate the estimation error caused by PER. In order to verify the effectiveness and generality of the algorithm, the ALAP is tested with value-function based, policy-gradient based and multi-agent reinforcement learning algorithms in OPENAI gym, and comparison studies verify the advantage and efficiency of the proposed training framework

    Overcoming Exploration in Reinforcement Learning with Demonstrations

    Full text link
    Exploration in environments with sparse rewards has been a persistent problem in reinforcement learning (RL). Many tasks are natural to specify with a sparse reward, and manually shaping a reward function can result in suboptimal performance. However, finding a non-zero reward is exponentially more difficult with increasing task horizon or action dimensionality. This puts many real-world tasks out of practical reach of RL methods. In this work, we use demonstrations to overcome the exploration problem and successfully learn to perform long-horizon, multi-step robotics tasks with continuous control such as stacking blocks with a robot arm. Our method, which builds on top of Deep Deterministic Policy Gradients and Hindsight Experience Replay, provides an order of magnitude of speedup over RL on simulated robotics tasks. It is simple to implement and makes only the additional assumption that we can collect a small set of demonstrations. Furthermore, our method is able to solve tasks not solvable by either RL or behavior cloning alone, and often ends up outperforming the demonstrator policy.Comment: 8 pages, ICRA 201
    • …
    corecore