Search CORE

53 research outputs found

Reinforcement Learning Describes the Computational and Neural Processes Underlying Flexible Learning of Values and Attentional Selection

Author: Balcarras Matthew Dwight
Publication venue
Publication date: 16/12/2015
Field of study

Attention and learning are cognitive control processes that are closely related. This thesis investigates this inter-relatedness by using computational models to describe the mechanisms that are shared between these processes. Computational models describe the transformation of stimuli to observable variables (behaviour) and contain the latent mechanisms that affect this transformation. Here, I captured these mechanisms with the reinforcement learning (RL) framework applied in two different task contexts and three different projects to show 1) how attentional selection of stimuli involves the learning of values for stimuli, 2) how the learning of stimulus values is influenced by previously learned rules, and 3) how explorations of value-related mechanisms in the brain benefit from using intracranial EEG to investigate the strength of oscillatory activity in ventromedial prefrontal cortex. In the first project, the RL framework is applied to a feature-based attention task that required macaques to learn the value of stimulus features while ignoring non-relevant information. By comparing different RL schemes I found that trial-by-trial covert attentional selections were best predicted by a model that only represents expected values for the task relevant feature dimension. In the second project, I explore mechanisms of stimulus-feature value learning in humans in order to understand the influence of learned rules for the flexible, on-going learning of expected values. I test the hypothesis that naive subjects will show enhanced learning of feature specific reward associations by switching to the use of an abstract rule that associates stimuli by feature type. I found that two-thirds of subjects (n=22/32) exhibited behaviour that was best fit by a ‘flexible-rule-selection’ model. Low-frequency oscillatory activity in frontal cortex has been associated with cognitive control and integrative brain functions, however, the relationship between expected values for stimuli and band-limited, rhythmic neural activity in the human brain is largely unknown. In the third project, I used intracranial electrocorticography (ECoG) in a proof-of-principle study to reveal spectral power signatures in vmPFC related to the expected values of stimuli predicted by a RL model for a single human subject

YorkSpace

Feature-specific prediction errors and surprise across macaque fronto-striatal circuits

Author: Ardid-Ramírez Joan Salvador
Azimi Marzyeh
Hassani Seyed Alireza
Oemisch M.
Tiesinga Paul
Westendorff S.
Womelsdorf Thilo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

[EN] To adjust expectations efficiently, prediction errors need to be associated with the precise features that gave rise to the unexpected outcome, but this credit assignment may be problematic if stimuli differ on multiple dimensions and it is ambiguous which feature dimension caused the outcome. Here, we report a potential solution: neurons in four recorded areas of the anterior fronto-striatal networks encode prediction errors that are specific to feature values of different dimensions of attended multidimensional stimuli. The most ubiquitous prediction error occurred for the reward-relevant dimension. Feature-specific prediction error signals a) emerge on average shortly after non-specific prediction error signals, b) arise earliest in the anterior cingulate cortex and later in dorsolateral prefrontal cortex, caudate and ventral striatum, and c) contribute to feature-based stimulus selection after learning. Thus, a widely-distributed feature-specific eligibility trace may be used to update synaptic weights for improved feature-based attention.This work was supported by grant MOP 102482 from the Canadian Institutes of Health Research (T.W.) and the Natural Sciences and Engineering Research Council of Canada (T.W.), as well as by the Brain in Action CREATE-IRTG program (M.O. and T.W.), and by grant LPDS 2012-08 from the Deutsche Akademie der Naturforscher Leopoldina (S.W.). Imaging data provided by the Duke Center for In Vivo Microscopy, an NIH Biomedical Technology Resource (NIHP41EB015897, 1S10OD010683-01). The funders had no role in study design, data collection and analysis, the decision to publish, or the preparation of this manuscript. The authors would like to thank Hongying Wang for technical supportOemisch, M.; Westendorff, S.; Azimi, M.; Hassani, SA.; Ardid-Ramírez, JS.; Tiesinga, P.; Womelsdorf, T. (2019). Feature-specific prediction errors and surprise across macaque fronto-striatal circuits. Nature Communications. 10:1-15. https://doi.org/10.1038/s41467-018-08184-9S11510Farashahi, S., Rowe, K., Aslami, Z., Lee, D. & Soltani, A. Feature-based learning improves adaptability without compromising precision. Nat. Commun. 8, 1768 (2017).Hikosaka, O., Ghazizadeh, A., Griggs, W. & Amita, H. Parallel basal ganglia circuits for decision making. J. Neural Transm. 1–15 (2017). https://doi.org/10.1007/s00702-017-1691-1Leong, Y. C., Radulescu, A., Daniel, R., DeWoskin, V. & Niv, Y. Dynamic Interaction between reinforcement learning and attention in multidimensional environments. Neuron 93, 451–463 (2017).Niv, Y. et al. Reinforcement learning in multidimensional environments relies on attention mechanisms. J. Neurosci. 35, 8145–8157 (2015).Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction. Vol. 135. Cambridge: MIT Press (1998).Gottlieb, J. Attention, learning and the value of information. Neuron 76, 281–295 (2012).Pearce, J. & Hall, G. A model for Pavlovian learning: variation in the effectiveness of conditioned but not unconditioned stimuli. Psychol. Rev. 87, 532–552 (1980).Daddaoua, N., Lopes, M. & Gottlieb, J. Intrinsically motivated oculomotor exploration guided by uncertainty reduction and conditioned reinforcement in non-human primates. Sci. Rep. 6, 1–15 (2016).Dayan, P., Kakade, S. & Montague, P. R. Learning and selective attention. Nat. Neurosci. 3, 1218–1223 (2000).Hassani, S. A. et al. A computational psychiatry approach identifies how alpha-2A noradrenergic agonist Guanfacine affects feature-based reinforcement learning in the macaque. Sci. Rep. 7, 1–19 (2017).Wilson, R. C. & Niv, Y. Inferring relevance in a changing world. Front. Hum. Neurosci. 5, 1–14 (2012).Kruschke, J. K. & Hullinger, R. A. Evolution of attention in learning. Comput. Models Condition. (2010). https://doi.org/10.1017/CBO9780511760402.002Asaad, W. F., Lauro, P. M., Perge, J. A. & Eskandar, E. N. Prefrontal neurons encode a solution to the credit assignment problem. J. Neurosci. 37, 3311–3316 (2017).Haber, S. N. & Knutson, B. The reward circuit: linking primate anatomy and human imaging. Neuropsychopharmacology 35, 4–26 (2010).Dias, R., Robbins, T. W. & Roberts, A. C. Dissociation in prefrontal cortex of affective and attentional shifts. Nature 380, 69–72 (1996).Glimcher, P. W. Understanding dopamine and reinforcement learning: the dopamine reward prediction error hypothesis. Proc. Natl Acad. Sci. USA 108 Suppl, 15647–15654 (2011).Bichot, N. P., Heard, M. T., DeGennaro, E. M. & Desimone, R. A source for feature-based attention in the prefrontal cortex. Neuron 88, 832–844 (2015).Kaping, D., Vinck, M., Hutchison, R. M., Everling, S. & Womelsdorf, T. Specific contributions of ventromedial, anterior cingulate, and lateral prefrontal cortex for attentional selection and stimulus valuation. PLoS Biol. 9, e1001224 (2011).Alexander, W. H. & Brown, J. W. Hierarchical error representation: a computational model of anterior cingulate and dorsolateral prefrontal cortex. Neural Comput. 27, 2354–2410 (2015).Sutton, R. S. & Barto, A. G. Reinforcement Learning: an Introduction. 322 (1998). https://doi.org/10.1109/TNN.1998.712192Roelfsema, P. R. & van Ooyen, A. Attention-gated reinforcement learning of internal representations for classification. Neural Comput. 17, 2176–2214 (2005).Rombouts, J. O., Bohte, S. M. & Roelfsema, P. R. How attention can create synaptic tags for the learning of working memories in sequential tasks. PLoS Comput. Biol. 11, 1–34 (2015).Balcarras, M., Ardid, S., Kaping, D., Everling, S. & Womelsdorf, T. Attentional selection can be predicted by reinforcement learning of task-relevant stimulus features weighted by value-independent stickiness. J. Cogn. Neurosci. 28, 333–349 (2016).Smith, A. C. et al. Dynamic analysis of learning in behavioral experiments. J. Neurosci. 24, 447–461 (2004).Kennerley, S. W., Behrens, T. E. J. & Wallis, J. D. Double dissociation of value computations in orbitofrontal and anterior cingulate neurons. Nat. Neurosci. 14, 1581–1589 (2011).Asaad, W. F. & Eskandar, E. N. Encoding of both positive and negative reward prediction errors by neurons of the primate lateral prefrontal cortex and caudate nucleus. J. Neurosci. 31, 17772–17787 (2011).Hayden, B. Y., Heilbronner, S. R., Pearson, J. M. & Platt, M. L. Surprise signals in anterior cingulate cortex: neuronal encoding of unsigned reward prediction errors driving adjustment in behavior. J. Neurosci. 31, 4178–4187 (2011).Schultz, W. Dopamine reward prediction error coding. Dialogues Clin. Neurosci. 18, 23–32 (2016).Izquierdo, A., Brigman, J. L., Radke, A. K., Rudebeck, P. H. & Holmes, A. The neural basis of reversal learning: an updated perspective. Neuroscience 345, 12–26 (2017).Fiorillo, C. D., Tobler, P. N. & Schultz, W. Discrete coding of reward probability and uncertainty by dopamine neurons. Science 299, 1898–1902 (2003).Schultz, W. Predictive reward signal of dopamine neurons. J. Neurophysiol. 80, 1–27 (1998).Ardid, S. et al. Mapping of functionally characterized cell classes onto canonical circuit operations in primate prefrontal cortex. J. Neurosci. 35, 2975–2991 (2015).Berke, J. D. Uncoordinated firing rate changes of striatal fast-spiking interneurons during behavioural task performance. J. Neurosci. 28, 10075–10080 (2008).Lansink, C. S., Goltstein, P. M., Lankelma, J. V. & Pennartz, C. M. A. Fast-spiking interneurons of the rat ventral striatum: temporal coordination of activity with principal cells and responsiveness to reward. Eur. J. Neurosci. 32, 494–508 (2010).Kawaguchi, Y. Physiological, morphological, and histochemical characterization of three classes of interneurons in rat neostriatum. J. Neurosci. 13, 4908–4923 (1993).Shen, C. et al. Anterior cingulate cortex cells identify process-specific errors of attentional control prior to transient prefrontal-cingulate inhibition. Cereb. Cortex 25, 2213–2228 (2015).Shenhav, A., Cohen, J. D. & Botvinick, M. M. Dorsal anterior cingulate cortex and the value of control. Nat. Neurosci. 19, 1286–1291 (2016).Quilodran, R., Rothé, M. & Procyk, E. Behavioral shifts and action valuation in the anterior cingulate cortex. Neuron 57, 314–325 (2008).Kennerley, S. W., Dahmubed, A. F., Lara, A. H. & Wallis, J. D. Neurons in the frontal lobe encode the value of multiple decision variables. J. Cogn. Neurosci. 21, 1162–1178 (2009).Womelsdorf, T., Johnston, K., Vinck, M. & Everling, S. Theta-activity in anterior cingulate cortex predicts task rules and their adjustments following errors. Proc. Natl Acad. Sci. 107, 5248–5253 (2010).Oemisch, M., Westendorff, S., Everling, S. & Womelsdorf, T. Interareal spike-train correlations of anterior cingulate and dorsal prefrontal cortex during attention shifts. J. Neurosci. 35, 13076–13089 (2015).Voloh, B., Valiante, T. A., Everling, S. & Womelsdorf, T. Theta-gamma coordination between anterior cingulate and prefrontal cortex indexes correct attention shifts. Proc. Natl Acad. Sci. USA 112, 8457–8462 (2015).Westendorff, S., Kaping, D., Everling, S. & Womelsdorf, T. Prefrontal and anterior cingulate cortex neurons encode attentional targets even when they do not apparently bias behavior. J. Neurophysiol. 116, 796–811 (2016).Womelsdorf, T. & Everling, S. Long-range attention networks: circuit motifs underlying endogenously controlled stimulus selection. Trends Neurosci. 38, 682–700 (2015).Medalla, M. & Barbas, H. Synapses with inhibitory neurons differentiate anterior cingulate from dorsolateral prefrontal pathways associated with cognitive control. Neuron 61, 609–620 (2009).Antzoulatos, E. G. & Miller, E. K. Increases in functional connectivity between prefrontal cortex and striatum during category learning. Neuron 83, 216–225 (2014).Womelsdorf, T., Ardid, S., Everling, S. & Valiante, T. A. Burst firing synchronizes prefrontal and anterior cingulate cortex during attentional control. Curr. Biol. 1–9 (2014). https://doi.org/10.1016/j.cub.2014.09.046Hunt, L. T. & Hayden, B. Y. A distributed, hierarchical and recurrent framework for reward-based choice. Nat. Rev. Neurosci. 18, 172–182 (2017).Kable, J. W. & Glimcher, P. W. The neurobiology of decision: consensus and controversy. Neuron 63, 733–745 (2009).Badre, D. & Nee, D. E. Frontal cortex and the hierarchical control of behavior. Trends Cogn. Sci. 22, 170–188 (2018).Tian, J. et al. Distributed and mixed information in monosynaptic inputs to dopamine neurons. Neuron 1374–1389 (2016). https://doi.org/10.1016/j.neuron.2016.08.018den Ouden, H. E. M., Kok, P. & de Lange, F. P. How prediction errors shape perception, attention, and motivation. Front. Psychol. 3, 1–12 (2012).Rigotti, M. et al. The importance of mixed selectivity in complex cognitive tasks. Nature 497, 585–590 (2013).Genovesio, A., Wise, S. P. & Passingham, R. E. Prefrontal—parietal function: from foraging to foresight. Trends Cogn. Sci. 18, 72–81 (2014).Donahue, C. H. & Lee, D. Dynamic routing of task-relevant signals for decision making in dorsolateral prefrontal cortex. Nat. Neurosci. 18, 1–9 (2015).Mante, V., Sussillo, D., Shenoy, K. V. & Newsome, W. T. Context-dependent computation by recurrent dynamics in prefrontal cortex. Nature 503, 78–84 (2013).Berke, J. D. Functional properties of striatal fast-spiking interneurons. Front. Syst. Neurosci. 5, 1–7 (2011).Hennequin, G., Agnes, E. J. & Vogels, T. P. Inhibitory plasticity: balance, control, and codependence. Annu. Rev. Neurosci. 40, 557–579 (2017).Wilson, F. A., O’Scalaidhe, S. P. & Goldman-Rakic, P. S. Functional synergism between putative gamma-aminobutyrate-containing neurons and pyramidal neurons in prefrontal cortex. Proc. Natl Acad. Sci. 91, 4009–4013 (1994).Lee, K. et al. Parvalbumin interneurons modulate striatal output and enhance performance during associative learning. Neuron 93, 1451–1463.e4 (2017).Vogels, T. P., Sprekeler, H., Zenke, F., Clopath, C. & Gerstner, W. Inhibitory plasticity balances excitation and inhibition in sensory pathways and memory networks. Science 334, 1569–1573 (2011).Le Pelley, M. E., Mitchell, C. J., Beesley, T., George, D. N. & Wills, A. J. Attention and associative learning in humans: an integrative review. Psychol. Bull. 142, 1111–1140 (2016).Courville, A. C., Daw, N. D. & Touretzky, D. S. Bayesian theories of conditioning in a changing world. Trends Cogn. Sci. 10, 294–300 (2006).Gottlieb, J., Hayhoe, M., Hikosaka, O. & Rangel, A. Attention, reward, and information seeking. J. Neurosci. 34, 15497–15504 (2014).Rusch, T., Korn, C. W. & Gläscher, J. A two-way street between attention and learning. Neuron 93, 256–258 (2017).Takahashi, Y. K., Langdon, A. J., Niv, Y. & Schoenbaum, G. Temporal specificity of reward prediction errors signaled by putative dopamine neurons in rat VTA depends on ventral striatum. Neuron 91, 182–193 (2016).Watabe-Uchida, M., Eshel, N. & Uchida, N. Neural circuitry of reward prediction error. Annu. Rev. Neurosci. 40, 373–394 (2017).Krauzlis, R. J., Bollimunta, A., Arcizet, F. & Wang, L. Attention as an effect not a cause. Trends Cogn. Sci. 18, 457–464 (2014).Lovejoy, L. P. & Krauzlis, R. J. Inactivation of primate superior colliculus impairs covert selection of signals for perceptual judgments. Nat. Neurosci. 13, 261–266 (2010).Rasmussen, D., Voelker, A. & Eliasmith, C. A neural model of hierarchical reinforcement learning. PLoS ONE 12, e0180234 (2017).Fusi, S., Asaad, W. F., Miller, E. K. & Wang, X. J. A neural circuit model of flexible sensorimotor mapping: learning and forgetting on multiple timescales. Neuron 54, 319–333 (2007).Roelfsema, P. R. & Holtmaat, A. Control of synaptic plasticity in deep cortical networks. Nat. Rev. Neurosci. 19, 166–180 (2018).Calabrese, E. et al. A diffusion tensor MRI atlas of the postmortem rhesus macaque brain. Neuroimage 117, 408–416 (2015).Bakker, R., Tiesinga, P. & Kötter, R. The scalable brain atlas: instant web-based access to public brain atlases and related content. Neuroinformatics 13, 353–366 (2015)

Directory of Open Access Journals

Publikationsserver der Universität Tübingen

RiuNet

A computational psychiatry approach identifies how alpha-2A noradrenergic agonist Guanfacine affects feature-based reinforcement learning in the macaque

Author: A Shenhav
AC Smith
AD Redish
AF Arnsten
AF Arnsten
AF Arnsten
AF Arnsten
AFT Arnsten
AJ Yu
AM Barth
BJ Cole
C Aoki
DF Connor
E Decamp
E Seu
Ea Witte
F Schlagenhauf
F Yi
FR Sallee
G Aston-Jones
G Aston-Jones
G Engberg
J O’Neill
J Zhang
JC Steere
JS Franowicz
JW Dalley
JX O’Reilly
K Doya
K Doya
K Kawaura
KL Clark
KM Harlé
L Scahill
LF Callado
M Balcarras
M Wang
M Wang
M Wang
MJ Frank
MJ Millan
MR Nassar
MS Caetano
P Jakala
P Jakala
P Rämä
QJ Huys
QJM Huys
RA Adams
RC Wilson
S Amemiya
S Kim
S Uhlen
SJ Gershman
T Womelsdorf
TV Maia
TV Wiecki
U Muller
V Devauges
V Voon
XH Ji
XJ Wang
Y Niv
Y Yang
ZM Mao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

[EN] Noradrenaline is believed to support cognitive flexibility through the alpha 2A noradrenergic receptor (a2A-NAR) acting in prefrontal cortex. Enhanced flexibility has been inferred from improved working memory with the a2A-NA agonist Guanfacine. But it has been unclear whether Guanfacine improves specific attention and learning mechanisms beyond working memory, and whether the drug effects can be formalized computationally to allow single subject predictions. We tested and confirmed these suggestions in a case study with a healthy nonhuman primate performing a feature-based reversal learning task evaluating performance using Bayesian and Reinforcement learning models. In an initial dose-testing phase we found a Guanfacine dose that increased performance accuracy, decreased distractibility and improved learning. In a second experimental phase using only that dose we examined the faster feature-based reversal learning with Guanfacine with single-subject computational modeling. Parameter estimation suggested that improved learning is not accounted for by varying a single reinforcement learning mechanism, but by changing the set of parameter values to higher learning rates and stronger suppression of non-chosen over chosen feature information. These findings provide an important starting point for developing nonhuman primate models to discern the synaptic mechanisms of attention and learning functions within the context of a computational neuropsychiatry framework.This research was supported by grants from the Canadian Institutes of Health Research (CIHR), the Natural Sciences and Engineering Research Council of Canada (NSERC) and the Ontario Ministry of Economic Development and Innovation (MEDI). We thank Dr. Hongying Wang for invaluable help with drug administration and animal careHassani, SA.; Oemisch, M.; Balcarras, M.; Westendorff, S.; Ardid-Ramírez, JS.; Van Der Meer, MA.; Tiesinga, P.... (2017). A computational psychiatry approach identifies how alpha-2A noradrenergic agonist Guanfacine affects feature-based reinforcement learning in the macaque. Scientific Reports. 7:1-19. https://doi.org/10.1038/srep40606S1197Arnsten, A. F., Wang, M. J. & Paspalas, C. D. Neuromodulation of thought: flexibilities and vulnerabilities in prefrontal cortical network synapses. Neuron 76, 223–239 (2012).Arnsten, A. F. & Dudley, A. G. Methylphenidate improves prefrontal cortical cognitive function through alpha2 adrenoceptor and dopamine D1 receptor actions: Relevance to therapeutic effects in Attention Deficit Hyperactivity Disorder. Behav Brain Funct 1, 2 (2005).Clark, K. L. & Noudoost, B. The role of prefrontal catecholamines in attention and working memory. Front Neural Circuits 8, 33 (2014).Wang, M. et al. Neuronal basis of age-related working memory decline. Nature 476, 210–213 (2011).Wang, M. et al. Alpha2A-adrenoceptors strengthen working memory networks by inhibiting cAMP-HCN channel signaling in prefrontal cortex. Cell 129, 397–410 (2007).Aston-Jones, G. & Cohen, J. D. An integrative theory of locus coeruleus-norepinephrine function: adaptive gain and optimal performance. Annu Rev Neurosci 28, 403–450 (2005).Yu, A. J. & Dayan, P. Uncertainty, neuromodulation, and attention. Neuron 46, 681–692 (2005).Mather, M., Clewett, D., Sakaki, M. & Harley, C. W. Norepinephrine ignites local hot spots of neuronal excitation: How arousal amplifies selectivity in perception and memory. Behav Brain Sci, 1–100, doi: 10.1017/S0140525X15000667 (2015).Amemiya, S. & Redish, A. D. Manipulating Decisiveness in Decision Making: Effects of Clonidine on Hippocampal Search Strategies. J Neurosci 36, 814–827 (2016).Doya, K. Metalearning and neuromodulation. Neural Netw 15, 495–506 (2002).Uhlen, S., Muceniece, R., Rangel, N., Tiger, G. & Wikberg, J. E. Comparison of the binding activities of some drugs on alpha 2A, alpha 2B and alpha 2C-adrenoceptors and non-adrenergic imidazoline sites in the guinea pig. Pharmacology & toxicology 76, 353–364 (1995).Mao, Z. M., Arnsten, A. F. & Li, B. M. Local infusion of an alpha-1 adrenergic agonist into the prefrontal cortex impairs spatial working memory performance in monkeys. Biological psychiatry 46, 1259–1265 (1999).Arnsten, A. F. & Goldman-Rakic, P. S. Analysis of alpha-2 adrenergic agonist effects on the delayed nonmatch-to-sample performance of aged rhesus monkeys. Neurobiol Aging 11, 583–590 (1990).Franowicz, J. S. & Arnsten, A. F. The alpha-2a noradrenergic agonist, guanfacine, improves delayed response performance in young adult rhesus monkeys. Psychopharmacology 136, 8–14 (1998).Caetano, M. S. et al. Noradrenergic control of error perseveration in medial prefrontal cortex. Frontiers in Integrative Neuroscience 6, 125 (2012).Kim, S., Bobeica, I., Gamo, N. J., Arnsten, A. F. & Lee, D. Effects of alpha-2A adrenergic receptor agonist on time and risk preference in primates. Psychopharmacology 219, 363–375 (2012).Seu, E., Lang, A., Rivera, R. J. & Jentsch, J. D. Inhibition of the norepinephrine transporter improves behavioral flexibility in rats and monkeys. Psychopharmacology 202, 505–519 (2009).Kawaura, K., Karasawa, J., Chaki, S. & Hikichi, H. Stimulation of postsynapse adrenergic alpha2A receptor improves attention/cognition performance in an animal model of attention deficit hyperactivity disorder. Behav Brain Res 270, 349–356 (2014).Aoki, C., Go, C. G., Venkatesan, C. & Kurose, H. Perikaryal and synaptic localization of alpha 2A-adrenergic receptor-like immunoreactivity. Brain Res 650, 181–204 (1994).Barth, A. M., Vizi, E. S., Zelles, T. & Lendvai, B. Alpha2-adrenergic receptors modify dendritic spike generation via HCN channels in the prefrontal cortex. J Neurophysiol 99, 394–401 (2008).Ji, X. H., Ji, J. Z., Zhang, H. & Li, B. M. Stimulation of alpha2-adrenoceptors suppresses excitatory synaptic transmission in the medial prefrontal cortex of rat. Neuropsychopharmacology 33, 2263–2271 (2008).Yi, F., Liu, S. S., Luo, F., Zhang, X. H. & Li, B. M. Signaling mechanism underlying alpha2A -adrenergic suppression of excitatory synaptic transmission in the medial prefrontal cortex of rats. Eur J Neurosci 38, 2364–2373 (2013).Engberg, G. & Eriksson, E. Effects of alpha 2-adrenoceptor agonists on locus coeruleus firing rate and brain noradrenaline turnover in N-ethoxycarbonyl-2-ethoxy-1,2-dihydroquinoline (EEDQ)-treated rats. Naunyn-Schmiedeberg’s archives of pharmacology 343, 472–477 (1991).Jakala, P. et al. Guanfacine, but not clonidine, improves planning and working memory performance in humans. Neuropsychopharmacology 20, 460–470 (1999).Jakala, P. et al. Guanfacine and clonidine, alpha 2-agonists, improve paired associates learning, but not delayed matching to sample, in humans. Neuropsychopharmacology 20, 119–130 (1999).Muller, U. et al. Lack of effects of guanfacine on executive and memory functions in healthy male volunteers. Psychopharmacology 182, 205–213 (2005).Scahill, L. et al. A placebo-controlled study of guanfacine in the treatment of children with tic disorders and attention deficit hyperactivity disorder. The American journal of psychiatry 158, 1067–1074 (2001).Huys, Q. J. M., Maia, T. V. & Frank, M. J. Computational psychiatry as a bridge from neuroscience to clinical applications. Nat Neurosci 19, 404–413 (2016).Stephan, K. E. et al. Computational neuroimaging strategies for single patient predictions. NeuroImage in press (2015).Arnsten, A. F., Cai, J. X. & Goldman-Rakic, P. S. The alpha-2 adrenergic agonist guanfacine improves memory in aged monkeys without sedative or hypotensive side effects: evidence for alpha-2 receptor subtypes. J Neurosci 8, 4287–4298 (1988).Callado, L. F. & Stamford, J. A. Alpha2A- but not alpha2B/C-adrenoceptors modulate noradrenaline release in rat locus coeruleus: voltammetric data. Eur J Pharmacol 366, 35–39 (1999).Millan, M. J. et al. Cognitive dysfunction in psychiatric disorders: characteristics, causes and the quest for improved therapy. Nature reviews. Drug discovery 11, 141–168 (2012).Niv, Y. et al. Reinforcement learning in multidimensional environments relies on attention mechanisms. J Neurosci 35, 8145–8157 (2015).Balcarras, M., Ardid, S., Kaping, D., Everling, S. & Womelsdorf, T. Attentional Selection Can Be Predicted by Reinforcement Learning of Task-relevant Stimulus Features Weighted by Value-independent Stickiness. J Cogn Neurosci 28, 333–349 (2016).Redish, A. D., Jensen, S., Johnson, A. & Kurth-Nelson, Z. Reconciling reinforcement learning models with behavioral extinction and renewal: implications for addiction, relapse, and problem gambling. Psychol Rev 114, 784–805 (2007).Nassar, M. R. et al. Rational regulation of learning dynamics by pupil-linked arousal systems. Nat Neurosci 15, 1040–1046 (2012).O’Reilly, J. X. et al. Dissociable effects of surprise and model update in parietal and anterior cingulate cortex. Proc Natl Acad Sci USA 110, 3660–3669 (2013).Shenhav, A., Botvinick, M. M. & Cohen, J. D. The expected value of control: an integrative theory of anterior cingulate cortex function. Neuron 79, 217–240 (2013).Womelsdorf, T. & Everling, S. Long-Range Attention Networks: Circuit Motifs Underlying Endogenously Controlled Stimulus Selection. Trends Neurosci 38, 682–700 (2015).Yang, Y. et al. Nicotinic alpha7 receptors enhance NMDA cognitive circuits in dorsolateral prefrontal cortex. Proc Natl Acad Sci USA 110, 12078–12083 (2013).Aston-Jones, G., Rajkowski, J. & Cohen, J. Role of locus coeruleus in attention and behavioral flexibility. Biological psychiatry 46, 1309–1320 (1999).Cole, B. J. & Robbins, T. W. Forebrain norepinephrine: role in controlled information processing in the rat. Neuropsychopharmacology 7, 129–142 (1992).Dalley, J. W., Cardinal, R. N. & Robbins, T. W. Prefrontal executive and cognitive functions in rodents: neural and neurochemical substrates. Neuroscience and biobehavioral reviews 28, 771–784 (2004).Devauges, V. & Sara, S. J. Activation of the noradrenergic system facilitates an attentional shift in the rat. Behav Brain Res 39, 19–28 (1990).Connor, D. F., Arnsten, A. F., Pearson, G. S. & Greco, G. F. Guanfacine extended release for the treatment of attention-deficit/hyperactivity disorder in children and adolescents. Expert opinion on pharmacotherapy 15, 1601–1610 (2014).Sallee, F. R. et al. Guanfacine extended release in children and adolescents with attention-deficit/hyperactivity disorder: a placebo-controlled trial. J Am Acad Child Adolesc Psychiatry 48, 155–165 (2009).Steere, J. C. & Arnsten, A. F. The alpha-2A noradrenergic receptor agonist guanfacine improves visual object discrimination reversal performance in aged rhesus monkeys. Behav Neurosci 111, 883–891 (1997).Doya, K. Modulators of decision making. Nat Neurosci 11, 410–416 (2008).Wang, X. J. & Krystal, J. H. Computational psychiatry. Neuron 84, 638–654 (2014).Wiecki, T. V. et al. A Computational Cognitive Biomarker for Early-Stage Huntington’s Disease. PLoS One 11, e0148409, doi: 10.1371/journal.pone.0148409 (2016).Huys, Q. J., Pizzagalli, D. A., Bogdan, R. & Dayan, P. Mapping anhedonia onto reinforcement learning: a behavioural meta-analysis. Biol Mood Anxiety Disord 3, 12 (2013).Gershman, S. J. & Niv, Y. Learning latent structure: carving nature at its joints. Curr Opin Neurobiol 20, 251–256 (2010).Voon, V. et al. Disorders of compulsivity: a common bias towards learning habits. Mol Psychiatry 20, 345–352 (2015).Maia, T. V. & Frank, M. J. From reinforcement learning models to psychiatric and neurological disorders. Nature Neuroscience 14, 154–162 (2011).Adams, R. A., Huys, Q. J. M. & Roiser, J. P. Computational Psychiatry: towards a mathematically informed understanding of mental illness. Journal of Neurology, Neurosurgery, and Psychiatry 87, 53–63 (2015).Schlagenhauf, F. et al. Striatal dysfunction during reversal learning in unmedicated schizophrenia patients. NeuroImage 89, 171–180 (2014).Harlé, K. M. et al. Bayesian neural adjustment of inhibitory control predicts emergence of problem stimulant use. Brain 138, 3413–3426 (2015).Zhang, J. et al. Different decision deficits impair response inhibition in progressive supranuclear palsy and Parkinson’s disease. Brain 139, 161–173 (2016).Frank, M. J. et al. fMRI and EEG Predictors of Dynamic Decision Parameters during Human Reinforcement Learning. Journal of Neuroscience 35, 485–494 (2015).Smith, A. C. & Brown, E. N. Estimating a state-space model from point process observations. Neural Comput 15, 965–991 (2003).Wilson, R. C. & Niv, Y. Inferring relevance in a changing world. Frontiers in human neuroscience 5, 189 (2011).Rämä, P. et al. Medetomidine, atipamezole, and guanfacine in delayed response performance of aged monkeys. Pharmacology Biochemistry and Behavior 55, 415–422 (1996).Arnsten, A. F. T. & Contant, T. A. Alpha-2 adrenergic agonists decrease distractibility in aged monkeys performing the delayed response task. Psychopharmacology 108, 159–169 (1992).O’Neill, J. et al. Effects of guanfacine on three forms of distraction in the aging macaque. Life Sciences 67, 877–885 (2000).Wang, M., Ji, J.-Z. & Li, B.-M. The α2A-Adrenergic Agonist Guanfacine Improves Visuomotor Associative Learning in Monkeys. Neuropsychopharmacology 29, 86–92 (2004).Witte, E. a. & Marrocco, R. T. Alteration of brain noradrenergic activity in rhesus monkeys affects the alerting component of covert orienting. Psychopharmacology 132, 315–323 (1997).Decamp, E., Clark, K. & Schneider, J. S. Effects of the alpha-2 adrenoceptor agonist guanfacine on attention and working memory in aged non-human primates. European Journal of Neuroscience 34, 1018–1022 (2011)

Dartmouth Digital Commons (Dartmouth College)

Recommended from our members

Neurocognitive Mechanisms of Learning and Decision-Making in Adolescent-OCD: A Computational Approach

Author: Aziz Marzuki Aleya
Publication venue: University of Cambridge
Publication date: 22/05/2021
Field of study

Early-onset obsessive-compulsive disorder (OCD) is substantially less researched than adult-OCD, resulting in prevalent equivocation surrounding the neurocognitive profile of child-OCD. Research into this area is pivotal as population studies report that youths with OCD struggle significantly in academic settings. In the General Introduction of this thesis, I reviewed existing literature and found that strikingly, young patients do not show impairment on features that are considered both hallmarks of adult OCD and tightly linked to disorder symptomatology, such as response inhibition and cognitive flexibility. Among the characteristics that are thought to be present in children and adolescents with OCD are abnormal decision-making under uncertainty and impaired learning, and I decided to focus on these features as they may be driving poor academic attainment in young people with the disorder. In addition, I sought to investigate other cognitive processes that have not been well-researched in adolescent-OCD but are found to be robustly altered in adult OCD such as goal directed/model-based reasoning, meta-cognition, and feedback sensitivity. I aimed to delineate these various processes using a battery of suitably complex cognitive tasks. Moreover, I highlighted that majority of past studies fail to find differences between young patients and controls due to behavioural signatures being too subtle to be uncovered by standard statistical analyses. Hence, I employed computational modelling of cognitive task data to disentangle latent decision-making processes displayed by adolescents with OCD. In Chapter 2, I modelled data from the Wisconsin Card Sorting task, a frequently used paradigm of cognitive flexibility, and confirmed that youths with OCD show equivalent performance on the task to controls. Only patients on serotonergic medication showed increased response latencies and a tendency to make unique errors (choosing a deck associated with no rule present on the test card). Next, in Chapter 3, I sought to understand instrumental and Pavlovian learning, and whether adolescents with OCD show increased punishment sensitivity on a novel aversive Pavlovian-to Instrumental Transfer paradigm. Once again, patient performance was equivalent to that of controls. Hence, the remaining chapters were dedicated to probing behaviour on probabilistic paradigms. In Chapter 4, I formally investigated model-based and model-free learning using a well-validated two step decision-making task, and fit a reinforcement learning drift diffusion model to both choice and reaction time data. Patients showed increased exploration on the task as well as faster and more erratic decisions compared to controls. Nonetheless, model-based learning was equivalent between groups. In the penultimate chapter, I demonstrate on a predictive-inference task that patients with OCD update their choices more frequently compared to controls independent of prediction error magnitude. Finally, in Chapter 6, I administered a probabilistic reversal learning paradigm to a large sample of 50 adolescent patients and 53 matched controls. Standard analyses revealed a significant reversal learning deficit in patients with OCD, wherein they displayed more errors and a lower propensity to repeat choices following positive feedback during the post-reversal phase. Crucially, computational modelling revealed striking group differences where adolescents with OCD displayed elevated reward learning and lower punishment learning, increased exploration, and decreased perseveration compared to controls. In the General Discussion, I emphasise that atypical learning and decision-making in adolescent-OCD are more pronounced on probabilistic tasks, where task environments are more volatile. Results are partly discussed in the context of the uncertainty model of OCD, where subjective feelings of doubt experienced by patients drive compulsive behaviours such as checking and certainty-seeking in daily life, alongside excessive exploration on probabilistic tasks. I also consider various explanations for cognitive distinctions between adult- and adolescent OCD. More general implications of the findings are discussed for understanding OCD in the context of adolescent development and for treatment/support strategies.WELLCOME TRUST (104631/Z/14/Z

Apollo (Cambridge)

Recommended from our members

Neurochemical modulation of affective and behavioural control: Models and applications for psychiatry

Author: Kanen Jonathan
Publication venue: University of Cambridge
Publication date: 12/03/2020
Field of study

Impairments in emotional reactivity and behavioural flexibility are pervasive across disparate psychiatric conditions as traditionally defined. Here, I provide new evidence on how these processes are altered by neuromodulators in humans, with a primary focus on serotonin (5- HT; 5-hydroxytryptamine). Emotional reactions prepare the body for action. Some emotion is primitive, implicit, and critical for surviving threats, yet can inappropriately persist in times of safety. Other emotions are more complex, self-conscious and important in maintaining harmonious interpersonal relationships. At the same time, learned behaviours that are adaptive in the first instance, may become irrelevant or even disadvantageous as circumstances change. In Chapters 3 through 6, I report on experiments in healthy human volunteers that employed the dietary technique acute tryptophan depletion (ATD). ATD temporarily lowers serotonin synthesis and release by depleting its biosynthetic precursor tryptophan. Chapter 3 is a study of self-reported social emotion. ATD enhanced emotion in response to social injustice non-specifically; however, consideration of personality traits revealed that highly empathic participants reported more guilt under ATD, whereas individuals high in trait psychopathy demonstrated more annoyance. Chapter 4, in contrast, considers evolutionarily ancient automatic emotional reactions to threats. This was assayed instead by an objective measure, the skin conductance response (SCR). Here, ATD conversely attenuated the retention of Pavlovian conditioned emotional memory to threat. Traits again influenced this response: individuals more intolerant of uncertainty displayed the greatest attenuation of emotional reactions. Chapter 5 both extends the studies on emotion and bridges to the remaining empirical work by investigating reversal learning, an index of cognitive flexibility, in two experiments. Individuals again underwent Pavlovian (stimulus-outcome) threat conditioning, whereby one stimulus predicted threat, and another was safe. These contingencies then swapped (reversed). In a separate experiment, participants underwent instrumental (stimulus-response-outcome) conditioning on a deterministic schedule (the correct option was always correct), followed by reversal of the contingencies. ATD impaired both Pavlovian and instrumental reversal learning. Chapters 6 through 8 instead examine instrumental reversal learning that was probabilistic (the correct option was correct most but not all of the time), rather than deterministic. Chapter 6 expands on previous ATD studies of probabilistic reversal learning (PRL) in the literature, which had not found effects on choice behaviour. Despite nearly tripling the sample size, behaviour here assessed by conventional methods was unaffected, replicating previously published null results. Applying reinforcement learning (RL) models, however, revealed ATD elevated a basic perseverative tendency, referred to as “stimulus stickiness”; behaviour was more stimulus-bound and insensitive to the outcome of actions, consistent with the deterministic instrumental reversal impairment following ATD. Chapters 7 and 8 apply RL models as well, to existing datasets on PRL for comparison. Chapter 7 shows that healthy volunteers under lysergic acid diethylamide (LSD), which acts both at serotonin but also dopamine receptors, showed enhanced learning from positive feedback in particular, which was related to perseveration. Chapter 8 applies computational methods to PRL in clinical populations. RL modelling revealed a computational signature that dissociated PRL in stimulant use disorder (SUD) and obsessive-compulsive disorder (OCD): Individuals with SUD showed heightened stimulus stickiness, as occurred following ATD in healthy volunteers, whereas the OCD group (under serotonergic medication) demonstrated lower stimulus stickiness than healthy controls. Dopaminergic agents remediated a reward learning deficit in SUD, among other measures. The general discussion considers these various findings in terms of theories of central serotonin function, in relation to the animal literature, and its relevance to mental disorder. These results, collectively, advance knowledge of neurochemical and computational mechanisms underlying psychiatric conditions trans-diagnostically, with implications for revised psychiatric classifications in line with the Research Domain Criteria (RDoC).Gates Cambridge Trust Wellcome Trus

Apollo (Cambridge)

Habits without values

Author: Ludvig Elliot Andrew
Miller Kevin J.
Shenhav Amitai
Publication venue: 'American Psychological Association (APA)'
Publication date: 24/01/2019
Field of study

Habits form a crucial component of behavior. In recent years, key computational models have conceptualized habits as arising from model-free reinforcement learning (RL) mechanisms, which typically select between available actions based on the future value expected to result from each. Traditionally, however, habits have been understood as behaviors that can be triggered directly by a stimulus, without requiring the animal to evaluate expected outcomes. Here, we develop a computational model instantiating this traditional view, in which habits develop through the direct strengthening of recently taken actions rather than through the encoding of outcomes. We demonstrate that this model accounts for key behavioral manifestations of habits, including insensitivity to outcome devaluation and contingency degradation, as well as the effects of reinforcement schedule on the rate of habit formation. The model also explains the prevalent observation of perseveration in repeated-choice tasks as an additional behavioral manifestation of the habit system. We suggest that mapping habitual behaviors onto value-free mechanisms provides a parsimonious account of existing behavioral and neural data. This mapping may provide a new foundation for building robust and comprehensive models of the interaction of habits with other, more goal-directed types of behaviors and help to better guide research into the neural mechanisms underlying control of instrumental behavior more generally

Warwick Research Archives Portal Repository

The neuro-computational role of uncertainty in anxiety

Author: Hopkins Alexandra Kathryn
Publication venue: UCL (University College London)
Publication date: 28/03/2022
Field of study

Anxiety disorders are the most common mental health disorders and comprise a large number of years lost to disability. The work in this thesis is oriented towards understanding anxiety using a computational approach, focusing on uncertainty estimation as a key process. Chapter 1 introduces the role of uncertainty within anxiety and motivates the subsequent experimental chapters. Chapter 2 is a review of the computational role of the amygdala in humans, a key area for uncertainty computation. Chapter 3 is an experimental chapter which aimed to address gaps in the literature highlighted in the preceding chapters, namely the link between sensory uncertainty processing and anxiety and the role of the amygdala in this process. This chapter focuses on the development of a novel computational hierarchical Bayesian model to quantify sensory uncertainty and its application to neuroimaging data, with intolerance of uncertainty relating to greater neural activation in the insula but not amygdala. Chapter 4 targets the computational mechanisms underlying the negative self-bias observed in subclinical social anxiety. Again, this chapter focuses on the development of novel computational belief-update models which explicitly model uncertainty. Here, we see that a reduced trait self-positivity underpins this negative social evaluation process. The final experimental chapter presented in Chapter 5 investigates the link between different computational mechanisms, such as uncertainty, and a range of mood and anxiety symptomatology. This study revealed cognitive, social and somatic computational profiles that share a threat bias mechanism but have distinct negative-self bias and aversive learning signatures. Contrary to expectations, none of the uncertainty measures showed any associations with anxiety symptom subtypes. Finally, chapter 6 brings together the work in this thesis and alongside limitations of the work, discusses how these experiments contribute to our understanding of anxiety and the role of uncertainty across the anxiety spectrum

UCL Discovery

Recommended from our members

Functional organisation of behavioural inhibitory control mechanisms in cortico-basal ganglia circuitry: implications for stimulant use disorder.

Author: Zhukovsky Peter
Publication venue: University of Cambridge
Publication date: 30/04/2020
Field of study

The neural and psychological mechanisms of inhibitory control processes were investigated, focusing on the cortico-basal ganglia circuits in rats and humans. These included behavioural flexibility, ‘waiting’ and ‘stopping’ impulsivity and involved serial spatial reversal learning task in rodents, and in humans, premature responses in the Monetary Incentive Delay (MID) task and the stop-signal reaction time task. Chapter 2 and Chapter 3 focus on individual differences in behavioural flexibility in rats while Chapter 4, Chapter 5 and Chapter 6 consider how inhibitory control mechanisms are affected by the psychostimulant drug cocaine in both rats and humans. As reported in Chapter 2, systemic modulation of monoaminergic transmission by monoamine oxidase A (MAO-A) inhibitors enhanced reversal learning performance, selectively by decreasing the lose-shift probability, thereby implicating a role for dopamine, serotonin and noradrenaline in facilitating learning from negative feedback. Resting state functional magnetic resonance imaging (fMRI) revealed enhanced functional connectivity of the orbitofrontal and motor cortices as a correlate of flexible reversal learning performance, consistent with elevated levels of monoamines in these region (Chapter 3). Having clarified the mechanisms underlying behavioural flexibility in rats, Chapter 4 reports that escalation of intravenous cocaine self-administration induces behavioural inflexibility in rats even after a relatively short period of cocaine intake. Computational models, including a reinforced and Bayesian learner, revealed a lack of exploitation of the learned response-outcome relationships in cocaine-exposed rats. Chapter 5 focused on impulse control in human volunteers, identifying the striatal and cingulo-opercular networks as substrates of impulsive, premature responding in healthy 4 volunteers, stimulant-dependent individuals and their unaffected siblings. Loss of impulse control was elicited by different incentives for drug-free participants as opposed to drug users. Drug cues elicited striatal activation and increased premature responses in the stimulant-dependent group compared with the control group. In contrast, the ventral striatum was linked to incentive specific activation to reward anticipation. Task-based fMRI demonstrated that interactions between dorsal striatum and cingulo-opercular “cold cognition” networks underlie failures of impulse control in the control, at-risk and stimulant-dependent groups. However, whereas the cingulo-opercular networks were associated with premature responding in all groups, the reward system was activated specifically by the drug incentive cues in the stimulant group, and by monetary incentive cues in the drug-free groups. Chapter 6 presents evidence that corticostriatal functional and effective connectivity in an overlapping network that includes the anterior cingulate and inferior frontal cortices as well as motor cortex, the subthalamic nucleus and dorsal striatum, is critical to stopping impulse control in both control and cocaine individuals. No stopping efficiency impairments were observed in the cocaine-dependent group. Nevertheless, lower structural corticostriatal connectivity measured using diffusion MRI was associated with response execution impairments in cocaine participants performing a stop-signal reaction time task. Further, response execution was rescued by the selective noradrenaline reuptake inhibitor atomoxetine, which also increased corticostriatal effective connectivity. Finally, increased impulsivity and behavioural inflexibility seen in stimulant use disorder in Chapter 5 and Chapter 4, respectively, were not observed in the endophenotype at risk for developing stimulant abuse but were rather a consequence of stimulant abuse. These results further clarify the monoaminergic substrates of behavioural flexibility and specify the neural and computational impairments in inhibitory control induced by stimulant dependence.Pinsent Darwin Studentship from the Dept of Physiology, Development and Neuroscienc

Apollo (Cambridge)

Decisions, decisions, decisions: the development and plasticity of reinforcement learning, social and temporal decision making in children

Author: Smid Claire Rosalie
Publication venue: UCL (University College London)
Publication date: 28/03/2023
Field of study

Human decision-making is the flexible way people respond to their environment, take actions, and plan toward long-term goals. It is commonly thought that humans rely on distinct decision-making systems, which are either more habitual and reflexive or deliberate and calculated. How we make decisions can provide insight into our social functioning, mental health and underlying psychopathology, and ability to consider the consequences of our actions. Notably, the ability to make appropriate, habitual or deliberate decisions depending on the context, here referred to as metacontrol, remains underexplored in developmental samples. This thesis aims to investigate the development of different decision-making mechanisms in middle childhood (ages 5-13) and to illuminate the potential neurocognitive mechanisms underlying value-based decision-making. Using a novel sequential decision-making task, the first experimental chapter presents robust markers of model-based decision-making in childhood (N = 85), which reflects the ability to plan through a sequential task structure, contrary to previous developmental studies. Using the same paradigm, in a new sample via both behavioral (N = 69) and MRI-based measures (N = 44), the second experimental chapter explores the neurocognitive mechanisms that may underlie model-based decision-making and its metacontrol in childhood and links individual differences in inhibition and cortical thickness to metacontrol. The third experimental chapter explores the potential plasticity of social and intertemporal decision-making in a longitudinal executive function training paradigm (N = 205) and initial relationships with executive functions. Finally, I critically discuss the results presented in this thesis and their implications and outline directions for future research in the neurocognitive underpinnings of decision-making during development

UCL Discovery

The brain as a generative model: information-theoretic surprise in learning and action

Author: Gijsen Sam
Publication venue
Publication date: 01/01/2022
Field of study

Our environment is rich with statistical regularities, such as a sudden cold gust of wind indicating a potential change in weather. A combination of theoretical work and empirical evidence suggests that humans embed this information in an internal representation of the world. This generative model is used to perform probabilistic inference, which may be approximated through surprise minimization. This process rests on current beliefs enabling predictions, with expectation violation amounting to surprise. Through repeated interaction with the world, beliefs become more accurate and grow more certain over time. Perception and learning may be accounted for by minimizing surprise of current observations, while action is proposed to minimize expected surprise of future events. This framework thus shows promise as a common formulation for different brain functions. The work presented here adopts information-theoretic quantities of surprise to investigate both perceptual learning and action. We recorded electroencephalography (EEG) of participants in a somatosensory roving-stimulus paradigm and performed trial-by-trial modeling of cortical dynamics. Bayesian model selection suggests early processing in somatosensory cortices to encode confidence-corrected surprise and subsequently Bayesian surprise. This suggests the somatosensory system to signal surprise of observations and update a probabilistic model learning transition probabilities. We also extended this framework to include audition and vision in a multi-modal roving-stimulus study. Next, we studied action by investigating a sensitivity to expected Bayesian surprise. Interestingly, this quantity is also known as information gain and arises as an incentive to reduce uncertainty in the active inference framework, which can correspond to surprise minimization. In comparing active inference to a classical reinforcement learning model on the two-step decision-making task, we provided initial evidence for active inference to better account for human model-based behaviour. This appeared to relate to participants’ sensitivity to expected Bayesian surprise and contributed to explaining exploration behaviour not accounted for by the reinforcement learning model. Overall, our findings provide evidence for information-theoretic surprise as a model for perceptual learning signals while also guiding human action.Unsere Umwelt ist reich an statistischen Regelmäßigkeiten, wie z. B. ein plötzlicher kalter Windstoß, der einen möglichen Wetterumschwung ankündigt. Eine Kombination aus theoretischen Arbeiten und empirischen Erkenntnissen legt nahe, dass der Mensch diese Informationen in eine interne Darstellung der Welt einbettet. Dieses generative Modell wird verwendet, um probabilistische Inferenz durchzuführen, die durch Minimierung von Überraschungen angenähert werden kann. Der Prozess beruht auf aktuellen Annahmen, die Vorhersagen ermöglichen, wobei eine Verletzung der Erwartungen einer Überraschung gleichkommt. Durch wiederholte Interaktion mit der Welt nehmen die Annahmen mit der Zeit an Genauigkeit und Gewissheit zu. Es wird angenommen, dass Wahrnehmung und Lernen durch die Minimierung von Überraschungen bei aktuellen Beobachtungen erklärt werden können, während Handlung erwartete Überraschungen für zukünftige Beobachtungen minimiert. Dieser Rahmen ist daher als gemeinsame Bezeichnung für verschiedene Gehirnfunktionen vielversprechend. In der hier vorgestellten Arbeit werden informationstheoretische Größen der Überraschung verwendet, um sowohl Wahrnehmungslernen als auch Handeln zu untersuchen. Wir haben die Elektroenzephalographie (EEG) von Teilnehmern in einem somatosensorischen Paradigma aufgezeichnet und eine trial-by-trial Modellierung der kortikalen Dynamik durchgeführt. Die Bayes'sche Modellauswahl deutet darauf hin, dass frühe Verarbeitung in den somatosensorischen Kortizes confidence corrected surprise und Bayesian surprise kodiert. Dies legt nahe, dass das somatosensorische System die Überraschung über Beobachtungen signalisiert und ein probabilistisches Modell aktualisiert, welches wiederum Wahrscheinlichkeiten in Bezug auf Übergänge zwischen Reizen lernt. In einer weiteren multimodalen Roving-Stimulus-Studie haben wir diesen Rahmen auch auf die auditorische und visuelle Modalität ausgeweitet. Als Nächstes untersuchten wir Handlungen, indem wir die Empfindlichkeit gegenüber der erwarteten Bayesian surprise betrachteten. Interessanterweise ist diese informationstheoretische Größe auch als Informationsgewinn bekannt und stellt, im Rahmen von active inference, einen Anreiz dar, Unsicherheit zu reduzieren. Dies wiederum kann einer Minimierung der Überraschung entsprechen. Durch den Vergleich von active inference mit einem klassischen Modell des Verstärkungslernens (reinforcement learning) bei der zweistufigen Entscheidungsaufgabe konnten wir erste Belege dafür liefern, dass active inference menschliches modellbasiertes Verhalten besser abbildet. Dies scheint mit der Sensibilität der Teilnehmer gegenüber der erwarteten Bayesian surprise zusammenzuhängen und trägt zur Erklärung des Explorationsverhaltens bei, das jedoch nicht vom reinforcement learning-Modell erklärt werden kann. Insgesamt liefern unsere Ergebnisse Hinweise für Formulierungen der informationstheoretischen Überraschung als Modell für Signale wahrnehmungsbasierten Lernens, die auch menschliches Handeln steuern

Institutional Repository of the Freie Universität Berlin