Search CORE

208 research outputs found

Prioritized Sweeping Neural DynaQ with Multiple Predecessors, and Hippocampal Replays

Author: A Peyrache
AK Lee
AS Gupta
AW Moore
DJ Foster
G Girardeau
G Lavilléon De
H Eichenbaum
J O’Keefe
J Peng
JL McClelland
LH Lin
M Khamassi
MA Wilson
R Sutton
RA Jacobs
Richard S. Sutton
V Paz-Villagrán
Z Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/08/2018
Field of study

During sleep and awake rest, the hippocampus replays sequences of place cells that have been activated during prior experiences. These have been interpreted as a memory consolidation process, but recent results suggest a possible interpretation in terms of reinforcement learning. The Dyna reinforcement learning algorithms use off-line replays to improve learning. Under limited replay budget, a prioritized sweeping approach, which requires a model of the transitions to the predecessors, can be used to improve performance. We investigate whether such algorithms can explain the experimentally observed replays. We propose a neural network version of prioritized sweeping Q-learning, for which we developed a growing multiple expert algorithm, able to cope with multiple predecessors. The resulting architecture is able to improve the learning of simulated agents confronted to a navigation task. We predict that, in animals, learning the world model should occur during rest periods, and that the corresponding replays should be shuffled.Comment: Living Machines 2018 (Paris, France

arXiv.org e-Print Archive

Crossref

Switch-based Active Deep Dyna-Q: Efficient Adaptive Planning for Task-Completion Dialogue Policy Learning

Author: Gao Jianfeng
Li Xiujun
Liu Jingjing
Wu Yuexin
Yang Yiming
Publication venue
Publication date: 19/11/2018
Field of study

Training task-completion dialogue agents with reinforcement learning usually requires a large number of real user experiences. The Dyna-Q algorithm extends Q-learning by integrating a world model, and thus can effectively boost training efficiency using simulated experiences generated by the world model. The effectiveness of Dyna-Q, however, depends on the quality of the world model - or implicitly, the pre-specified ratio of real vs. simulated experiences used for Q-learning. To this end, we extend the recently proposed Deep Dyna-Q (DDQ) framework by integrating a switcher that automatically determines whether to use a real or simulated experience for Q-learning. Furthermore, we explore the use of active learning for improving sample efficiency, by encouraging the world model to generate simulated experiences in the state-action space where the agent has not (fully) explored. Our results show that by combining switcher and active learning, the new framework named as Switch-based Active Deep Dyna-Q (Switch-DDQ), leads to significant improvement over DDQ and Q-learning baselines in both simulation and human evaluations.Comment: 8 pages, 9 figures, AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications