728,455 research outputs found
Prioritized Sweeping Neural DynaQ with Multiple Predecessors, and Hippocampal Replays
During sleep and awake rest, the hippocampus replays sequences of place cells
that have been activated during prior experiences. These have been interpreted
as a memory consolidation process, but recent results suggest a possible
interpretation in terms of reinforcement learning. The Dyna reinforcement
learning algorithms use off-line replays to improve learning. Under limited
replay budget, a prioritized sweeping approach, which requires a model of the
transitions to the predecessors, can be used to improve performance. We
investigate whether such algorithms can explain the experimentally observed
replays. We propose a neural network version of prioritized sweeping
Q-learning, for which we developed a growing multiple expert algorithm, able to
cope with multiple predecessors. The resulting architecture is able to improve
the learning of simulated agents confronted to a navigation task. We predict
that, in animals, learning the world model should occur during rest periods,
and that the corresponding replays should be shuffled.Comment: Living Machines 2018 (Paris, France
Regularising Non-linear Models Using Feature Side-information
Very often features come with their own vectorial descriptions which provide
detailed information about their properties. We refer to these vectorial
descriptions as feature side-information. In the standard learning scenario,
input is represented as a vector of features and the feature side-information
is most often ignored or used only for feature selection prior to model
fitting. We believe that feature side-information which carries information
about features intrinsic property will help improve model prediction if used in
a proper way during learning process. In this paper, we propose a framework
that allows for the incorporation of the feature side-information during the
learning of very general model families to improve the prediction performance.
We control the structures of the learned models so that they reflect features
similarities as these are defined on the basis of the side-information. We
perform experiments on a number of benchmark datasets which show significant
predictive performance gains, over a number of baselines, as a result of the
exploitation of the side-information.Comment: 11 page with appendi
- …