Search CORE

3,258 research outputs found

Recommended from our members

Action selection in modular reinforcement learning

Author: Zhang Ruohan
Publication venue
Publication date: 16/09/2014
Field of study

textModular reinforcement learning is an approach to resolve the curse of dimensionality problem in traditional reinforcement learning. We design and implement a modular reinforcement learning algorithm, which is based on three major components: Markov decision process decomposition, module training, and global action selection. We define and formalize module class and module instance concepts in decomposition step. Under our framework of decomposition, we train each modules efficiently using SARSA(

\lambda

) algorithm. Then we design, implement, test, and compare three action selection algorithms based on different heuristics: Module Combination, Module Selection, and Module Voting. For last two algorithms, we propose a method to calculate module weights efficiently, by using standard deviation of Q-values of each module. We show that Module Combination and Module Voting algorithms produce satisfactory performance in our test domain.Computer Science

Texas ScholarWorks

Truncating Temporal Differences: On the Efficient Implementation of TD(lambda) for Reinforcement Learning

Author: Cichosz P.
Publication venue
Publication date: 31/12/1994
Field of study

Temporal difference (TD) methods constitute a class of methods for learning predictions in multi-step prediction problems, parameterized by a recency factor lambda. Currently the most important application of these methods is to temporal credit assignment in reinforcement learning. Well known reinforcement learning algorithms, such as AHC or Q-learning, may be viewed as instances of TD learning. This paper examines the issues of the efficient and general implementation of TD(lambda) for arbitrary lambda, for use with reinforcement learning algorithms optimizing the discounted sum of rewards. The traditional approach, based on eligibility traces, is argued to suffer from both inefficiency and lack of generality. The TTD (Truncated Temporal Differences) procedure is proposed as an alternative, that indeed only approximates TD(lambda), but requires very little computation per action and can be used with arbitrary function representation methods. The idea from which it is derived is fairly simple and not new, but probably unexplored so far. Encouraging experimental results are presented, suggesting that using lambda &gt 0 with the TTD procedure allows one to obtain a significant learning speedup at essentially the same cost as usual TD(0) learning.Comment: See http://www.jair.org/ for any accompanying file

arXiv.org e-Print Archive

CiteSeerX

Eligibility Traces and Plasticity on Behavioral Time Scales: Experimental Support of neoHebbian Three-Factor Learning Rules

Author: Brea Johanni
Corneil Dane
Gerstner Wulfram
Lehmann Marco
Liakoni Vasiliki
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2018
Field of study

Most elementary behaviors such as moving the arm to grasp an object or walking into the next room to explore a museum evolve on the time scale of seconds; in contrast, neuronal action potentials occur on the time scale of a few milliseconds. Learning rules of the brain must therefore bridge the gap between these two different time scales. Modern theories of synaptic plasticity have postulated that the co-activation of pre- and postsynaptic neurons sets a flag at the synapse, called an eligibility trace, that leads to a weight change only if an additional factor is present while the flag is set. This third factor, signaling reward, punishment, surprise, or novelty, could be implemented by the phasic activity of neuromodulators or specific neuronal inputs signaling special events. While the theoretical framework has been developed over the last decades, experimental evidence in support of eligibility traces on the time scale of seconds has been collected only during the last few years. Here we review, in the context of three-factor rules of synaptic plasticity, four key experiments that support the role of synaptic eligibility traces in combination with a third factor as a biological implementation of neoHebbian three-factor learning rules

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Directory of Open Access Journals

Frontiers - Publisher Connector

Recommended from our members

Retrospective model-based inference guides model-free credit assignment

Author: A Lak
BB Doll
BW Balleine
CD Adams
CD Gipson
CK Starkweather
EJ Wagenmakers
ES Bromberg-Martin
F Cushman
HH Yin
HH Yin
J Gläscher
K Iigaya
M Keramati
M Vasconcelos
N Kriegeskorte
ND Daw
ND Daw
ND Daw
P Smittenaar
R Kiani
R Moran
RA Rescorla
RJ Dolan
RPN Rao
S Kakade
S Killcross
S Wan Lee
SJ Gershman
SP Singh
TR Zentall
VV Valentin
W Schultz
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

An extensive reinforcement learning literature shows that organisms assign credit efficiently, even under conditions of state uncertainty. However, little is known about credit-assignment when state uncertainty is subsequently resolved. Here, we address this problem within the framework of an interaction between model-free (MF) and model-based (MB) control systems. We present and support experimentally a theory of MB retrospective-inference. Within this framework, a MB system resolves uncertainty that prevailed when actions were taken thus guiding an MF credit-assignment. Using a task in which there was initial uncertainty about the lotteries that were chosen, we found that when participants’ momentary uncertainty about which lottery had generated an outcome was resolved by provision of subsequent information, participants preferentially assigned credit within a MF system to the lottery they retrospectively inferred was responsible for this outcome. These findings extend our knowledge about the range of MB functions and the scope of system interactions

City Research Online

Crossref

Directory of Open Access Journals

UCL Discovery

MPG.PuRe