6,783 research outputs found

    Model-free and model-based reward prediction errors in EEG

    Get PDF
    Learning theorists posit two reinforcement learning systems: model-free and model-based. Model-based learning incorporates knowledge about structure and contingencies in the world to assign candidate actions with an expected value. Model-free learning is ignorant of the world’s structure; instead, actions hold a value based on prior reinforcement, with this value updated by expectancy violation in the form of a reward prediction error. Because they use such different learning mechanisms, it has been previously assumed that model-based and model-free learning are computationally dissociated in the brain. However, recent fMRI evidence suggests that the brain may compute reward prediction errors to both model-free and model-based estimates of value, signalling the possibility that these systems interact. Because of its poor temporal resolution, fMRI risks confounding reward prediction errors with other feedback-related neural activity. In the present study, EEG was used to show the presence of both model-based and model-free reward prediction errors and their place in a temporal sequence of events including state prediction errors and action value updates. This demonstration of model-based prediction errors questions a long-held assumption that model-free and model-based learning are dissociated in the brain

    Value and prediction error in medial frontal cortex: integrating the single-unit and systems levels of analysis

    Get PDF
    The role of the anterior cingulate cortex (ACC) in cognition has been extensively investigated with several techniques, including single-unit recordings in rodents and monkeys and EEG and fMRI in humans. This has generated a rich set of data and points of view. Important theoretical functions proposed for ACC are value estimation, error detection, error-likelihood estimation, conflict monitoring, and estimation of reward volatility. A unified view is lacking at this time, however. Here we propose that online value estimation could be the key function underlying these diverse data. This is instantiated in the reward value and prediction model (RVPM). The model contains units coding for the value of cues (stimuli or actions) and units coding for the differences between such values and the actual reward (prediction errors). We exposed the model to typical experimental paradigms from single-unit, EEG, and fMRI research to compare its overall behavior with the data from these studies. The model reproduced the ACC behavior of previous single-unit, EEG, and fMRI studies on reward processing, error processing, conflict monitoring, error-likelihood estimation, and volatility estimation, unifying the interpretations of the role performed by the ACC in some aspects of cognition

    Feedback information and the reward positivity

    No full text
    The reward positivity is a component of the event-related brain potential (ERP) sensitive to neural mechanisms of reward processing. Multiple studies have demonstrated that reward positivity amplitude indices a reward prediction error signal that is fundamental to theories of reinforcement learning. However, whether this ERP component is also sensitive to richer forms of performance information important for supervised learning is less clear. To investigate this question, we recorded the electroencephalogram from participants engaged in a time estimation task in which the type of error information conveyed by feedback stimuli was systematically varied across conditions. Consistent with our predictions, we found that reward positivity amplitude decreased in relation to increasing information content of the feedback, and that reward positivity amplitude was unrelated to trial-to-trial behavioral adjustments in task performance. By contrast, a series of exploratory analyses revealed frontal-central and posterior ERP components immediately following the reward positivity that related to these processes. Taken in the context of the wider literature, these results suggest that the reward positivity is produced by a neural mechanism that motivates task performance, whereas the later ERP components apply the feedback information according to principles of supervised learning

    When the outcome is different than expected : subjective expectancy shapes reward prediction error at the FRN level

    Get PDF
    Converging evidence in human electrophysiology suggests that evaluative feedback provided during performance monitoring (PM) elicits two distinctive and successive ERP components: the feedback-related negativity (FRN) and the P3b. Whereas the FRN has previously been linked to reward prediction error (RPE), the P3b has been conceived as reflecting motivational or attentional processes following the early processing of the RPE, including action value updating. However, it remains unclear whether these two consecutive neurophysiological effects depend on the direction of the unexpectedness (better- or worse-than-expected outcomes; signed RPE) or instead only on the degree of unexpectedness irrespective of direction (i.e., unsigned RPE). To address this question, we devised an experiment in which we manipulated the objective reward probability and the subjective reward expectancy (via instructions) in a factorial within-subject design and explored amplitude changes of the FRN and the P3b. A 64-channel EEG was recorded while 32 participants performed a speeded go/no-go task in which evaluative feedback based on the reward probability either violated expectancy (thereby creating a RPE) or did not. This violation corresponded either to better- or worse-than-expected events. Results showed that the FRN was larger when RPE occurred than when it did not, but irrespective of the direction of this violation. Interestingly, in these two conditions, action value was updated for the positive feedback selectively, as shown by the P3b amplitude. These results obey a two-stage model of PM assuming that unsigned RPE is first rapidly detected (FRN level) before the positive feedback's value is updated selectively (P3b effect)

    Predictive learning, prediction errors, and attention: evidence from event-related potentials and eye tracking

    Get PDF
    Prediction error (‘‘surprise’’) affects the rate of learning: We learn more rapidly about cues for which we initially make incorrect predictions than cues for which our initial predictions are correct. The current studies employ electrophysiological measures to reveal early attentional differentiation of events that differ in their previous involvement in errors of predictive judgment. Error-related events attract more attention, as evidenced by features of event-related scalp potentials previously implicated in selective visual attention (selection negativity, augmented anterior N1). The earliest differences detected occurred around 120 msec after stimulus onset, and distributed source localization (LORETA) indicated that the inferior temporal regions were one source of the earliest differences. In addition, stimuli associated with the production of prediction errors show higher dwell times in an eyetracking procedure. Our data support the view that early attentional processes play a role in human associative learning

    Spatiotemporal neural characterization of prediction error valence and surprise during reward learning in humans

    Get PDF
    Reward learning depends on accurate reward associations with potential choices. These associations can be attained with reinforcement learning mechanisms using a reward prediction error (RPE) signal (the difference between actual and expected rewards) for updating future reward expectations. Despite an extensive body of literature on the influence of RPE on learning, little has been done to investigate the potentially separate contributions of RPE valence (positive or negative) and surprise (absolute degree of deviation from expectations). Here, we coupled single-trial electroencephalography with simultaneously acquired fMRI, during a probabilistic reversal-learning task, to offer evidence of temporally overlapping but largely distinct spatial representations of RPE valence and surprise. Electrophysiological variability in RPE valence correlated with activity in regions of the human reward network promoting approach or avoidance learning. Electrophysiological variability in RPE surprise correlated primarily with activity in regions of the human attentional network controlling the speed of learning. Crucially, despite the largely separate spatial extend of these representations our EEG-informed fMRI approach uniquely revealed a linear superposition of the two RPE components in a smaller network encompassing visuo mnemonic and reward areas. Activity in this network was further predictive of stimulus value updating indicating a comparable contribution of both signals to reward learning

    The influence of the noradrenergic system on optimal control of neural plasticity

    Get PDF
    Decision making under uncertainty is challenging for any autonomous agent. The challenge increases when the environment’s stochastic properties change over time, i.e., when the environment is volatile. In order to efficiently adapt to volatile environments, agents must primarily rely on recent outcomes to quickly change their decision strategies; in other words, they need to increase their knowledge plasticity. On the contrary, in stable environments, knowledge stability must be preferred to preserve useful information against noise. Here we propose that in mammalian brain, the locus coeruleus (LC) is one of the nuclei involved in volatility estimation and in the subsequent control of neural plasticity. During a reinforcement learning task, LC activation, measured by means of pupil diameter, coded both for environmental volatility and learning rate. We hypothesize that LC could be responsible, through norepinephrinic modulation, for adaptations to optimize decision making in volatile environments. We also suggest a computational model on the interaction between the anterior cingulate cortex (ACC) and LC for volatility estimation

    Prefrontal signals precede striatal signals for biased credit assignment in motivational learning biases

    Get PDF
    Actions are biased by the outcomes they can produce: Humans are more likely to show action under reward prospect, but hold back under punishment prospect. Such motivational biases derive not only from biased response selection, but also from biased learning: humans tend to attribute rewards to their own actions, but are reluctant to attribute punishments to having held back. The neural origin of these biases is unclear. Specifically, it remains open whether motivational biases arise primarily from the architecture of subcortical regions or also reflect cortical influences, the latter being typically associated with increased behavioral flexibility and control beyond stereotyped behaviors. Simultaneous EEG-fMRI allowed us to track which regions encoded biased prediction errors in which order. Biased prediction errors occurred in cortical regions (dorsal anterior and posterior cingulate cortices) before subcortical regions (striatum). These results highlight that biased learning is not a mere feature of the basal ganglia, but arises through prefrontal cortical contributions, revealing motivational biases to be a potentially flexible, sophisticated mechanism

    Dissociable feedback valence effects on frontal midline theta during reward gain versus threat avoidance learning

    Get PDF
    While frontal midline theta (FMθ) has been associated with threat processing, with cognitive control in the context of anxiety, and with reinforcement learning, most reinforcement learning studies on FMθ have used reward rather than threat-related stimuli as reinforcer. Accordingly, the role of FMθ in threat-related reinforcement learning is largely unknown. Here, n = 23 human participants underwent one reward-, and one punishment-, based reversal learning task, which differed only with regard to the kind of reinforcers that feedback was tied to (i.e., monetary gain vs. loud noise burst, respectively). In addition to single-trial EEG, we assessed single-trial feedback expectations based on both a reinforcement learning computational model and trial-by-trial subjective feedback expectation ratings. While participants' performance and feedback expectations were comparable between the reward and punishment tasks, FMθ was more reliably amplified to negative vs. positive feedback in the reward vs. punishment task. Regressions with feedback valence, computationally derived, and self-reported expectations as predictors and FMθ as criterion further revealed that trial-by-trial variations in FMθ specifically relate to reward-related feedback-valence and not to threat-related feedback or to violated expectations/prediction errors. These findings suggest that FMθ as measured in reinforcement learning tasks may be less sensitive to the processing of events with direct relevance for fear and anxiety
    • …
    corecore