916 research outputs found
Autonomous Reinforcement of Behavioral Sequences in Neural Dynamics
We introduce a dynamic neural algorithm called Dynamic Neural (DN)
SARSA(\lambda) for learning a behavioral sequence from delayed reward.
DN-SARSA(\lambda) combines Dynamic Field Theory models of behavioral sequence
representation, classical reinforcement learning, and a computational
neuroscience model of working memory, called Item and Order working memory,
which serves as an eligibility trace. DN-SARSA(\lambda) is implemented on both
a simulated and real robot that must learn a specific rewarding sequence of
elementary behaviors from exploration. Results show DN-SARSA(\lambda) performs
on the level of the discrete SARSA(\lambda), validating the feasibility of
general reinforcement learning without compromising neural dynamics.Comment: Sohrob Kazerounian, Matthew Luciw are Joint first author
Recommended from our members
Beyond dichotomies in reinforcement learning
Reinforcement learning (RL) is a framework of particular importance to psychology, neuroscience and machine learning. Interactions between these fields, as promoted through the common hub of RL, has facilitated paradigm shifts that relate multiple levels of analysis in a singular framework (for example, relating dopamine function to a computationally defined RL signal). Recently, more sophisticated RL algorithms have been proposed to better account for human learning, and in particular its oft-documented reliance on two separable systems: a model-based (MB) system and a model-free (MF) system. However, along with many benefits, this dichotomous lens can distort questions, and may contribute to an unnecessarily narrow perspective on learning and decision-making. Here, we outline some of the consequences that come from overconfidently mapping algorithms, such as MB versus MF RL, with putative cognitive processes. We argue that the field is well positioned to move beyond simplistic dichotomies, and we propose a means of refocusing research questions towards the rich and complex components that comprise learning and decision-making
Reward feedback stimuli elicit high-beta EEG oscillations in human dorsolateral prefrontal cortex
Reward-related feedback stimuli have been observed to elicit a burst of power in the beta frequency range over frontal areas of the human scalp. Recent discussions have suggested possible neural sources for this activity but there is a paucity of empirical evidence on the question. Here we recorded EEG from participants while they navigated a virtual T-maze to find monetary rewards. Consistent with previous studies, we found that the reward feedback stimuli elicited an increase in beta power (20-30 Hz) over a right-frontal area of the scalp. Source analysis indicated that this signal was produced in the right dorsolateral prefrontal cortex (DLPFC). These findings align with previous observations of reward-related beta oscillations in the DLPFC in non-human primates. We speculate that increased power in the beta frequency range following reward receipt reflects the activation of task-related neural assemblies that encode the stimulus-response mapping in working memory
The influence of dopamine on prediction, action and learning
In this thesis I explore functions of the neuromodulator dopamine in the context
of autonomous learning and behaviour. I first investigate dopaminergic influence
within a simulated agent-based model, demonstrating how modulation of
synaptic plasticity can enable reward-mediated learning that is both adaptive and
self-limiting. I describe how this mechanism is driven by the dynamics of agentenvironment
interaction and consequently suggest roles for both complex spontaneous
neuronal activity and specific neuroanatomy in the expression of early, exploratory
behaviour. I then show how the observed response of dopamine neurons
in the mammalian basal ganglia may also be modelled by similar processes involving
dopaminergic neuromodulation and cortical spike-pattern representation within
an architecture of counteracting excitatory and inhibitory neural pathways, reflecting
gross mammalian neuroanatomy. Significantly, I demonstrate how combined
modulation of synaptic plasticity and neuronal excitability enables specific (timely)
spike-patterns to be recognised and selectively responded to by efferent neural populations,
therefore providing a novel spike-timing based implementation of the hypothetical
‘serial-compound’ representation suggested by temporal difference learning.
I subsequently discuss more recent work, focused upon modelling those complex
spike-patterns observed in cortex. Here, I describe neural features likely to contribute
to the expression of such activity and subsequently present novel simulation
software allowing for interactive exploration of these factors, in a more comprehensive
neural model that implements both dynamical synapses and dopaminergic
neuromodulation. I conclude by describing how the work presented ultimately suggests
an integrated theory of autonomous learning, in which direct coupling of agent
and environment supports a predictive coding mechanism, bootstrapped in early
development by a more fundamental process of trial-and-error learning
Continuous-time on-policy neural reinforcement learning of working memory tasks
As living organisms, one of our primary characteristics is the ability to rapidly process and react to unknown and unexpected events. To this end, we are able to recognize an event or a sequence of events and learn to respond properly. Despite advances in machine learning, current cognitive robotic systems are not able to rapidly and efficiently respond in the real world: the challenge is to learn to recognize both what is important, and also when to act. Reinforcement Learning (RL) is typically used to solve complex tasks: to learn the how. To respond quickly - to learn when - the environment has to be sampled often enough. For “enough”, a programmer has to decide on the step-size as a time-representation, choosing between a fine-grained representation of time (many state-transitions; difficult to learn with RL) or to a coarse temporal resolution (easier to learn with RL but lacking precise timing). Here, we derive a continuous-time version of on-policy SARSA-learning in a working-memory neural network model, AuGMEnT. Using a neural working memory network resolves the what problem, our when solution is built on the notion that in the real world, instantaneous actions of duration dt are actually impossible. We demonstrate how we can decouple action duration from the internal time-steps in the neural RL model using an action selection system. The resultant CT-AuGMEnT successfully learns to react to the events of a continuous-time task, without any pre-imposed specifications about the duration of the events or the delays between them
An introduction to reinforcement learning for neuroscience
Reinforcement learning has a rich history in neuroscience, from early work on
dopamine as a reward prediction error signal for temporal difference learning
(Schultz et al., 1997) to recent work suggesting that dopamine could implement
a form of 'distributional reinforcement learning' popularized in deep learning
(Dabney et al., 2020). Throughout this literature, there has been a tight link
between theoretical advances in reinforcement learning and neuroscientific
experiments and findings. As a result, the theories describing our experimental
data have become increasingly complex and difficult to navigate. In this
review, we cover the basic theory underlying classical work in reinforcement
learning and build up to an introductory overview of methods used in modern
deep reinforcement learning that have found applications in systems
neuroscience. We start with an overview of the reinforcement learning problem
and classical temporal difference algorithms, followed by a discussion of
'model-free' and 'model-based' reinforcement learning together with methods
such as DYNA and successor representations that fall in between these two
categories. Throughout these sections, we highlight the close parallels between
the machine learning methods and related work in both experimental and
theoretical neuroscience. We then provide an introduction to deep reinforcement
learning with examples of how these methods have been used to model different
learning phenomena in the systems neuroscience literature, such as
meta-reinforcement learning (Wang et al., 2018) and distributional
reinforcement learning (Dabney et al., 2020). Code that implements the methods
discussed in this work and generates the figures is also provided.Comment: Code available at:
https://colab.research.google.com/drive/1kWOz2Uxn0cf2c4YizqIXQKWyxeYd6wvL?usp=sharin
- …