29 research outputs found
Continual Reinforcement Learning in 3D Non-stationary Environments
High-dimensional always-changing environments constitute a hard challenge for
current reinforcement learning techniques. Artificial agents, nowadays, are
often trained off-line in very static and controlled conditions in simulation
such that training observations can be thought as sampled i.i.d. from the
entire observations space. However, in real world settings, the environment is
often non-stationary and subject to unpredictable, frequent changes. In this
paper we propose and openly release CRLMaze, a new benchmark for learning
continually through reinforcement in a complex 3D non-stationary task based on
ViZDoom and subject to several environmental changes. Then, we introduce an
end-to-end model-free continual reinforcement learning strategy showing
competitive results with respect to four different baselines and not requiring
any access to additional supervised signals, previously encountered
environmental conditions or observations.Comment: Accepted in the CLVision Workshop at CVPR2020: 13 pages, 4 figures, 5
table
A Dual Memory Structure for Efficient Use of Replay Memory in Deep Reinforcement Learning
In this paper, we propose a dual memory structure for reinforcement learning
algorithms with replay memory. The dual memory consists of a main memory that
stores various data and a cache memory that manages the data and trains the
reinforcement learning agent efficiently. Experimental results show that the
dual memory structure achieves higher training and test scores than the
conventional single memory structure in three selected environments of OpenAI
Gym. This implies that the dual memory structure enables better and more
efficient training than the single memory structure.Comment: 4 pages, 5 figure
Attention Loss Adjusted Prioritized Experience Replay
Prioritized Experience Replay (PER) is a technical means of deep
reinforcement learning by selecting experience samples with more knowledge
quantity to improve the training rate of neural network. However, the
non-uniform sampling used in PER inevitably shifts the state-action space
distribution and brings the estimation error of Q-value function. In this
paper, an Attention Loss Adjusted Prioritized (ALAP) Experience Replay
algorithm is proposed, which integrates the improved Self-Attention network
with Double-Sampling mechanism to fit the hyperparameter that can regulate the
importance sampling weights to eliminate the estimation error caused by PER. In
order to verify the effectiveness and generality of the algorithm, the ALAP is
tested with value-function based, policy-gradient based and multi-agent
reinforcement learning algorithms in OPENAI gym, and comparison studies verify
the advantage and efficiency of the proposed training framework