351 research outputs found
Reinforcement learning or active inference?
This paper questions the need for reinforcement learning or control theory when optimising behaviour. We show that it is fairly simple to teach an agent complicated and adaptive behaviours using a free-energy formulation of perception. In this formulation, agents adjust their internal states and sampling of the environment to minimize their free-energy. Such agents learn causal structure in the environment and sample it in an adaptive and self-supervised fashion. This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming. Critically, we do not need to invoke the notion of reward, value or utility. We illustrate these points by solving a benchmark problem in dynamic programming; namely the mountain-car problem, using active perception or inference under the free-energy principle. The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain
A Novel Task for the Investigation of Action Acquisition
We present a behavioural task designed for the investigation of how novel instrumental actions are discovered and learnt. The task consists of free movement with a manipulandum, during which the full range of possible movements can be explored by the participant and recorded. A subset of these movements, the ‘target’, is set to trigger a reinforcing signal. The task is to discover what movements of the manipulandum evoke the reinforcement signal. Targets can be defined in spatial, temporal, or kinematic terms, can be a combination of these aspects, or can represent the concatenation of actions into a larger gesture. The task allows the study of how the specific elements of behaviour which cause the reinforcing signal are identified, refined and stored by the participant. The task provides a paradigm where the exploratory motive drives learning and as such we view it as in the tradition of Thorndike [1]. Most importantly it allows for repeated measures, since when a novel action is acquired the criterion for triggering reinforcement can be changed requiring a new action to be discovered. Here, we present data using both humans and rats as subjects, showing that our task is easily scalable in difficulty, adaptable across species, and produces a rich set of behavioural measures offering new and valuable insight into the action learning process
Intermittent control models of human standing: similarities and differences
Two architectures of intermittent control are compared and contrasted in the context of the single inverted pendulum model often used for describing standing in humans. The architectures are similar insofar as they use periods of open-loop control punctuated by switching events when crossing a switching surface to keep the system state trajectories close to trajectories leading to equilibrium. The architectures differ in two significant ways. Firstly, in one case, the open-loop control trajectory is generated by a system-matched hold, and in the other case, the open-loop control signal is zero. Secondly, prediction is used in one case but not the other. The former difference is examined in this paper. The zero control alternative leads to periodic oscillations associated with limit cycles; whereas the system-matched control alternative gives trajectories (including homoclinic orbits) which contain the equilibrium point and do not have oscillatory behaviour. Despite this difference in behaviour, it is further shown that behaviour can appear similar when either the system is perturbed by additive noise or the system-matched trajectory generation is perturbed. The purpose of the research is to come to a common approach for understanding the theoretical properties of the two alternatives with the twin aims of choosing which provides the best explanation of current experimental data (which may not, by itself, distinguish beween the two alternatives) and suggesting future experiments to distinguish between the two alternatives
Reconstructing the three-dimensional GABAergic microcircuit of the striatum
A system's wiring constrains its dynamics, yet modelling of neural structures often overlooks the specific networks formed by their neurons. We developed an approach for constructing anatomically realistic networks and reconstructed the GABAergic microcircuit formed by the medium spiny neurons (MSNs) and fast-spiking interneurons (FSIs) of the adult rat striatum. We grew dendrite and axon models for these neurons and extracted probabilities for the presence of these neurites as a function of distance from the soma. From these, we found the probabilities of intersection between the neurites of two neurons given their inter-somatic distance, and used these to construct three-dimensional striatal networks. The MSN dendrite models predicted that half of all dendritic spines are within 100 mu m of the soma. The constructed networks predict distributions of gap junctions between FSI dendrites, synaptic contacts between MSNs, and synaptic inputs from FSIs to MSNs that are consistent with current estimates. The models predict that to achieve this, FSIs should be at most 1% of the striatal population. They also show that the striatum is sparsely connected: FSI-MSN and MSN-MSN contacts respectively form 7% and 1.7% of all possible connections. The models predict two striking network properties: the dominant GABAergic input to a MSN arises from neurons with somas at the edge of its dendritic field; and FSIs are interconnected on two different spatial scales: locally by gap junctions and distally by synapses. We show that both properties influence striatal dynamics: the most potent inhibition of a MSN arises from a region of striatum at the edge of its dendritic field; and the combination of local gap junction and distal synaptic networks between FSIs sets a robust input-output regime for the MSN population. Our models thus intimately link striatal micro-anatomy to its dynamics, providing a biologically grounded platform for further study
Infants in Control: Rapid Anticipation of Action Outcomes in a Gaze-Contingent Paradigm
Infants' poor motor abilities limit their interaction with their environment and render studying infant cognition notoriously difficult. Exceptions are eye movements, which reach high accuracy early, but generally do not allow manipulation of the physical environment. In this study, real-time eye tracking is used to put 6- and 8-month-old infants in direct control of their visual surroundings to study the fundamental problem of discovery of agency, i.e. the ability to infer that certain sensory events are caused by one's own actions. We demonstrate that infants quickly learn to perform eye movements to trigger the appearance of new stimuli and that they anticipate the consequences of their actions in as few as 3 trials. Our findings show that infants can rapidly discover new ways of controlling their environment. We suggest that gaze-contingent paradigms offer effective new ways for studying many aspects of infant learning and cognition in an interactive fashion and provide new opportunities for behavioral training and treatment in infants
Assessment of MMP-9, TIMP-1, and COX-2 in normal tissue and in advanced symptomatic and asymptomatic carotid plaques
<p>Abstract</p> <p>Background</p> <p>Mature carotid plaques are complex structures, and their histological classification is challenging. The carotid plaques of asymptomatic and symptomatic patients could exhibit identical histological components.</p> <p>Objectives</p> <p>To investigate whether matrix metalloproteinase 9 (MMP-9), tissue inhibitor of MMP (TIMP), and cyclooxygenase-2 (COX-2) have different expression levels in advanced symptomatic carotid plaques, asymptomatic carotid plaques, and normal tissue.</p> <p>Methods</p> <p>Thirty patients admitted for carotid endarterectomy were selected. Each patient was assigned preoperatively to one of two groups: group I consisted of symptomatic patients (n = 16, 12 males, mean age 66.7 ± 6.8 years), and group II consisted of asymptomatic patients (n = 14, 8 males, mean age 67.6 ± 6.81 years). Nine normal carotid arteries were used as control. Tissue specimens were analyzed for fibromuscular, lipid and calcium contents. The expressions of MMP-9, TIMP-1 and COX-2 in each plaque were quantified.</p> <p>Results</p> <p>Fifty-eight percent of all carotid plaques were classified as Type VI according to the American Heart Association Committee on Vascular Lesions. The control carotid arteries all were classified as Type III. The median percentage of fibromuscular tissue was significantly greater in group II compared to group I (<it>p </it>< 0.05). The median percentage of lipid tissue had a tendency to be greater in group I than in group II (<it>p </it>= 0.057). The percentages of calcification were similar among the two groups. MMP-9 protein expression levels were significantly higher in group II and in the control group when compared with group I (p < 0.001). TIMP-1 expression levels were significantly higher in the control group and in group II when compared to group I, with statistical difference between control group and group I (p = 0.010). COX-2 expression levels did not differ among groups. There was no statistical correlation between MMP-9, COX-2, and TIMP-1 levels and fibrous tissue.</p> <p>Conclusions</p> <p>MMP-9 and TIMP-1 are present in all stages of atherosclerotic plaque progression, from normal tissue to advanced lesions. When sections of a plaque are analyzed without preselection, MMP-9 concentration is higher in normal tissues and asymptomatic surgical specimens than in symptomatic specimens, and TIMP-1 concentration is higher in normal tissue than in symptomatic specimens.</p
Optogenetic Mimicry of the Transient Activation of Dopamine Neurons by Natural Reward Is Sufficient for Operant Reinforcement
Activation of dopamine receptors in forebrain regions, for minutes or longer, is known to be sufficient for positive reinforcement of stimuli and actions. However, the firing rate of dopamine neurons is increased for only about 200 milliseconds following natural reward events that are better than expected, a response which has been described as a “reward prediction error” (RPE). Although RPE drives reinforcement learning (RL) in computational models, it has not been possible to directly test whether the transient dopamine signal actually drives RL. Here we have performed optical stimulation of genetically targeted ventral tegmental area (VTA) dopamine neurons expressing Channelrhodopsin-2 (ChR2) in mice. We mimicked the transient activation of dopamine neurons that occurs in response to natural reward by applying a light pulse of 200 ms in VTA. When a single light pulse followed each self-initiated nose poke, it was sufficient in itself to cause operant reinforcement. Furthermore, when optical stimulation was delivered in separate sessions according to a predetermined pattern, it increased locomotion and contralateral rotations, behaviors that are known to result from activation of dopamine neurons. All three of the optically induced operant and locomotor behaviors were tightly correlated with the number of VTA dopamine neurons that expressed ChR2, providing additional evidence that the behavioral responses were caused by activation of dopamine neurons. These results provide strong evidence that the transient activation of dopamine neurons provides a functional reward signal that drives learning, in support of RL theories of dopamine function
A new framework for cortico-striatal plasticity: behavioural theory meets In vitro data at the reinforcement-action interface
Operant learning requires that reinforcement signals interact with action representations at a suitable neural interface. Much evidence suggests that this occurs when phasic dopamine, acting as a reinforcement prediction error, gates plasticity at cortico-striatal synapses, and thereby changes the future likelihood of selecting the action(s) coded by striatal neurons. But this hypothesis faces serious challenges. First, cortico-striatal plasticity is inexplicably complex, depending on spike timing, dopamine level, and dopamine receptor type. Second, there is a credit assignment problem—action selection signals occur long before the consequent dopamine reinforcement signal. Third, the two types of striatal output neuron have apparently opposite effects on action selection. Whether these factors rule out the interface hypothesis and how they interact to produce reinforcement learning is unknown. We present a computational framework that addresses these challenges. We first predict the expected activity changes over an operant task for both types of action-coding striatal neuron, and show they co-operate to promote action selection in learning and compete to promote action suppression in extinction. Separately, we derive a complete model of dopamine and spike-timing dependent cortico-striatal plasticity from in vitro data. We then show this model produces the predicted activity changes necessary for learning and extinction in an operant task, a remarkable convergence of a bottom-up data-driven plasticity model with the top-down behavioural requirements of learning theory. Moreover, we show the complex dependencies of cortico-striatal plasticity are not only sufficient but necessary for learning and extinction. Validating the model, we show it can account for behavioural data describing extinction, renewal, and reacquisition, and replicate in vitro experimental data on cortico-striatal plasticity. By bridging the levels between the single synapse and behaviour, our model shows how striatum acts as the action-reinforcement interface
- …