Search CORE

13,123 research outputs found

Attention-based experience replay in deep Q-learning

Author: Bonarini Andrea
Ramicic Mirza
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

Using neural networks as function approximators in temporal difference reinforcement problems proved to be very effective in dealing with high-dimensionality of input state space, especially in more recent developments such as Deep Q-learning. These approaches share the use of a mechanism, called experience replay, that uniformly samples the previous experiences to a memory buffer to exploit them to re-learn, thus improving the efficiency of the learning process. In order to increase the learning performance, techniques such as prioritized experience and prioritized sampling have been introduced to deal with storing and replaying, respectively, the transitions with larger TD error. In this paper, we present a concept, called Attention-Based Experience REplay (ABERE), concerned with selective focusing of the replay buffer to specific types of experiences, therefore modeling the behavioral characteristics of the learning agent in a single and multi-agent environment. We further explore how different behavioral characteristics influence the performance of agents faced with dynamic environment that is able to become more hostile or benevolent by changing the relative probability to get positive or negative reinforcement

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

Attention Loss Adjusted Prioritized Experience Replay

Author: Chen Zhuoying
Li Huiping
Wang Rizhong
Publication venue
Publication date: 12/09/2023
Field of study

Prioritized Experience Replay (PER) is a technical means of deep reinforcement learning by selecting experience samples with more knowledge quantity to improve the training rate of neural network. However, the non-uniform sampling used in PER inevitably shifts the state-action space distribution and brings the estimation error of Q-value function. In this paper, an Attention Loss Adjusted Prioritized (ALAP) Experience Replay algorithm is proposed, which integrates the improved Self-Attention network with Double-Sampling mechanism to fit the hyperparameter that can regulate the importance sampling weights to eliminate the estimation error caused by PER. In order to verify the effectiveness and generality of the algorithm, the ALAP is tested with value-function based, policy-gradient based and multi-agent reinforcement learning algorithms in OPENAI gym, and comparison studies verify the advantage and efficiency of the proposed training framework

arXiv.org e-Print Archive

Overcoming Exploration in Reinforcement Learning with Demonstrations

Author: Abbeel Pieter
Andrychowicz Marcin
McGrew Bob
Nair Ashvin
Zaremba Wojciech
Publication venue
Publication date: 25/02/2018
Field of study

Exploration in environments with sparse rewards has been a persistent problem in reinforcement learning (RL). Many tasks are natural to specify with a sparse reward, and manually shaping a reward function can result in suboptimal performance. However, finding a non-zero reward is exponentially more difficult with increasing task horizon or action dimensionality. This puts many real-world tasks out of practical reach of RL methods. In this work, we use demonstrations to overcome the exploration problem and successfully learn to perform long-horizon, multi-step robotics tasks with continuous control such as stacking blocks with a robot arm. Our method, which builds on top of Deep Deterministic Policy Gradients and Hindsight Experience Replay, provides an order of magnitude of speedup over RL on simulated robotics tasks. It is simple to implement and makes only the additional assumption that we can collect a small set of demonstrations. Furthermore, our method is able to solve tasks not solvable by either RL or behavior cloning alone, and often ends up outperforming the demonstrator policy.Comment: 8 pages, ICRA 201

arXiv.org e-Print Archive

Crossref