Search CORE

29 research outputs found

Continual Reinforcement Learning in 3D Non-stationary Environments

Author: Culurciello Eugenio
Desai Karan
Lomonaco Vincenzo
Maltoni Davide
Publication venue
Publication date: 21/04/2020
Field of study

High-dimensional always-changing environments constitute a hard challenge for current reinforcement learning techniques. Artificial agents, nowadays, are often trained off-line in very static and controlled conditions in simulation such that training observations can be thought as sampled i.i.d. from the entire observations space. However, in real world settings, the environment is often non-stationary and subject to unpredictable, frequent changes. In this paper we propose and openly release CRLMaze, a new benchmark for learning continually through reinforcement in a complex 3D non-stationary task based on ViZDoom and subject to several environmental changes. Then, we introduce an end-to-end model-free continual reinforcement learning strategy showing competitive results with respect to four different baselines and not requiring any access to additional supervised signals, previously encountered environmental conditions or observations.Comment: Accepted in the CLVision Workshop at CVPR2020: 13 pages, 4 figures, 5 table

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

A Dual Memory Structure for Efficient Use of Replay Memory in Deep Reinforcement Learning

Author: brockman
caterini
dhariwal
lillicrap
lin
mnih
schaul
sutton
wang
Publication venue
Publication date: 15/07/2019
Field of study

In this paper, we propose a dual memory structure for reinforcement learning algorithms with replay memory. The dual memory consists of a main memory that stores various data and a cache memory that manages the data and trains the reinforcement learning agent efficiently. Experimental results show that the dual memory structure achieves higher training and test scores than the conventional single memory structure in three selected environments of OpenAI Gym. This implies that the dual memory structure enables better and more efficient training than the single memory structure.Comment: 4 pages, 5 figure

arXiv.org e-Print Archive

Crossref

Attention Loss Adjusted Prioritized Experience Replay

Author: Chen Zhuoying
Li Huiping
Wang Rizhong
Publication venue
Publication date: 12/09/2023
Field of study

Prioritized Experience Replay (PER) is a technical means of deep reinforcement learning by selecting experience samples with more knowledge quantity to improve the training rate of neural network. However, the non-uniform sampling used in PER inevitably shifts the state-action space distribution and brings the estimation error of Q-value function. In this paper, an Attention Loss Adjusted Prioritized (ALAP) Experience Replay algorithm is proposed, which integrates the improved Self-Attention network with Double-Sampling mechanism to fit the hyperparameter that can regulate the importance sampling weights to eliminate the estimation error caused by PER. In order to verify the effectiveness and generality of the algorithm, the ALAP is tested with value-function based, policy-gradient based and multi-agent reinforcement learning algorithms in OPENAI gym, and comparison studies verify the advantage and efficiency of the proposed training framework

arXiv.org e-Print Archive