35 research outputs found

    Best fit parameters for the intrinsically enhanced model in G12.

    No full text
    Computational modeling scripts used to produce this figure are available at https://osf.io/sfnc9/. (TIF)</p

    Confusion matrices illustrating model recovery across data sets.

    No full text
    The upper row in each subplot shows model frequencies, the lower row shows exceedance probabilities. Model name abbreviations: IE = intrinsically enhanced, HAC = hybrid actor-critic, RL = unbiased, WSLS = win-stay/lose-shift. Data and analysis scripts underlying this figure are available at https://osf.io/sfnc9/. (TIF)</p

    Best fit parameters for the intrinsically enhanced model in B21.

    No full text
    Note that αu could only be fit in experiments with counterfactual feedback, and ωtest could only be fit in experiments with counterfactual feedback at testing, and are therefore left at the initial prior. Abbreviations: the first letter in each triplet indicates whether feedback was partial (P) or complete (C) during learning; the second letter indicates whether feedback in the test phase was partial (P), complete (C), or not provided (N); the third letter indicates whether the experimental design was interleaved (I) or blocked (B). Error bars indicate the SEM. (B) Model responsibilities overall and across experimental conditions. Computational modeling scripts used to produce this figure are available at https://osf.io/sfnc9/. (TIF)</p

    Control study (M22B) description.

    No full text
    When observing the outcome of a choice, people are sensitive to the choice’s context, such that the experienced value of an option depends on the alternatives: getting $1 when the possibilities were 0 or 1 feels much better than when the possibilities were 1 or 10. Context-sensitive valuation has been documented within reinforcement learning (RL) tasks, in which values are learned from experience through trial and error. Range adaptation, wherein options are rescaled according to the range of values yielded by available options, has been proposed to account for this phenomenon. However, we propose that other mechanisms—reflecting a different theoretical viewpoint—may also explain this phenomenon. Specifically, we theorize that internally defined goals play a crucial role in shaping the subjective value attributed to any given option. Motivated by this theory, we develop a new “intrinsically enhanced” RL model, which combines extrinsically provided rewards with internally generated signals of goal achievement as a teaching signal. Across 7 different studies (including previously published data sets as well as a novel, preregistered experiment with replication and control studies), we show that the intrinsically enhanced model can explain context-sensitive valuation as well as, or better than, range adaptation. Our findings indicate a more prominent role of intrinsic, goal-dependent rewards than previously recognized within formal models of human RL. By integrating internally generated signals of reward, standard RL theories should better account for human behavior, including context-sensitive valuation and beyond.</div

    Best fit parameters for the intrinsically enhanced model in the control study for M22 (M22B).

    No full text
    Computational modeling scripts used to produce this figure are available at https://osf.io/sfnc9/. (TIF)</p

    Learning phase performance in M22.

    No full text
    Participants learned to discriminate the correct option across bandit combinations in the learning phase. Data and analysis scripts underlying this figure are available at https://osf.io/sfnc9/. (TIF)</p

    Replication study (M22R) description.

    No full text
    When observing the outcome of a choice, people are sensitive to the choice’s context, such that the experienced value of an option depends on the alternatives: getting $1 when the possibilities were 0 or 1 feels much better than when the possibilities were 1 or 10. Context-sensitive valuation has been documented within reinforcement learning (RL) tasks, in which values are learned from experience through trial and error. Range adaptation, wherein options are rescaled according to the range of values yielded by available options, has been proposed to account for this phenomenon. However, we propose that other mechanisms—reflecting a different theoretical viewpoint—may also explain this phenomenon. Specifically, we theorize that internally defined goals play a crucial role in shaping the subjective value attributed to any given option. Motivated by this theory, we develop a new “intrinsically enhanced” RL model, which combines extrinsically provided rewards with internally generated signals of goal achievement as a teaching signal. Across 7 different studies (including previously published data sets as well as a novel, preregistered experiment with replication and control studies), we show that the intrinsically enhanced model can explain context-sensitive valuation as well as, or better than, range adaptation. Our findings indicate a more prominent role of intrinsic, goal-dependent rewards than previously recognized within formal models of human RL. By integrating internally generated signals of reward, standard RL theories should better account for human behavior, including context-sensitive valuation and beyond.</div

    Summary information for each of the data sets used for data analysis and/or modeling.

    No full text
    Summary information for each of the data sets used for data analysis and/or modeling.</p

    Predictions made by the unbiased, intrinsically enhanced, and range<sup>z</sup> models.

    No full text
    (A) The unbiased model correctly predicts lower choice rates for high-value and mid-value options in wide than in narrow contexts (upper gray box), but incorrectly predicts similar choice rates for the option with value 50 regardless of context (lower gray box). (B) The intrinsically enhanced model captures all behavioral patterns found in participants’ data [13]. It correctly predicts lower choice rates for high-value and mid-value options in wide than in narrow contexts (upper purple box) and correctly predicts higher choice rates for the option with value 50 in the trinary narrow context than in the trinary wide context (lower purple box). It also predicts that choice rates for mid-value options will be closer to those of low-value options than high-value options (lower purple box). (C) The rangez model correctly predicts higher choice rates for the option with value 50 in the trinary narrow context than in the trinary wide context (lower teal box), but incorrectly predicts similar choice rates for high-value options in the narrow and wide trinary contexts (upper teal box). Simulation scripts used to produce this figure are available at https://osf.io/sfnc9/.</p

    Fig 6 -

    No full text
    Left: Task structure. Participants viewed available options, indicated their choice with a mouse click, and viewed each option’s outcome, including their chosen one highlighted. Right: Experimental design. Both context 1 (top row) and context 2 (bottom row) contained 3 options, each having a mean value of 14, 50, or 86. The contexts differed in the frequency with which different combinations of within-context stimuli were presented during the learning phase (gray shaded area). In particular, while option M1 (EV = 50) was presented 20 times with option L1 (EV = 14), option M2 (EV = 50) was presented 20 times with option H2 (EV = 86). Intuitively, this made M1 a more frequent intrinsically rewarding outcome than M2. The 2 contexts were otherwise matched.</p
    corecore