314 research outputs found

    TD-based Markov prediction with perfect shaping.

    No full text
    (A) The ideal shaping function ϕ (blue circles) is 1 after acquisition of the food (at s1) until the reward arrives (red cross at sT). (B) Evolution of the value for the application of TD learning to the case that T = 10. Upper plot: average over 1,000 simulations; lower plot: single simulation showing . (C) Evolution of the TD prediction error δt over the same trials. Upper plot: average over 1,000 simulations; lower plots: single simulation showing δ0 for a transition to s = s1 (above); or to s = s* (below). Here, α = 0.1. TD, temporal difference.</p

    TD-based Markov prediction.

    No full text
    (A) Simple Markov prediction problem with a tasty morsel provided at t = 1 (s = s1) with probability p = 0.3, which leads to a digestive reward of rT = 1 at time T. (B) Evolution of the value for the application of TD learning to the case that T = 10. Upper plot: average over 1,000 simulations (here, and in later figures, we label state si by just its index i); lower plot: single simulation showing . (C) Evolution of the TD prediction error δt over the same trials. Upper plot: average over 1,000 simulations; lower plots: single simulation showing δ0 for a transition to s = s1 (above); or to s = s* (below). Here, α = 0.1. TD, temporal difference.</p

    TD-based Markov prediction with a partial shaping function.

    No full text
    (A) A suboptimal shaping function ϕ that decreases from 1 to 0 linearly after acquisition of the food (at s1), and with reward spread over five time steps (red crosses; note the extension of the state space to T = 15). (B) Evolution of the value for the application of TD learning to this case. Upper plot: average over 1,000 simulations; lower plot: single simulation showing . (C) Evolution of the TD prediction error δt over the same trials. Upper plot: average over 1,000 simulations; lower plots: single simulation showing δ0 for a transition to s = s1 (above); or to s = s* (below). Here, α = 0.1. TD, temporal difference.</p

    Simulation of flavour–nutrient conditioning.

    No full text
    The lines show the evolution of the learned value of three different flavours with orthogonalized intrinsic sweetness and nutritive values. The red flavour is not sweet but is highly nutritious—and so lacks shaping (as in Fig 1). The green flavour is very sweet (with a shaping function reflecting this) but is not nutritious. The blue flavour is somewhat sweet and somewhat nutritious and is also associated with a perfect shaping function (as in Fig 2). Here, transitions are deterministic and α = 0.4.</p

    Behavioural measures vs questionnaire scores.

    No full text
    Behavioural measures vs questionnaire scores.</p

    Accuracy for outcome prediction: Difficulty only vs difficulty and skill models.

    No full text
    Colours correspond to different wd values in the skill and difficulty models; note that wd = 0 (purple) is equivalent to a model with skill only; black is used for the model with difficulty only. Top: overall accuracy; each dot represents one subject. Bottom: accuracy per difficulty level; mean ± s.e.m across subjects; difficulty was z-scored for each subject and discretised in 10-quantiles. (TIF)</p

    Skill estimates and attributions overview.

    No full text
    A: Evolution of skill estimates across trials, mean ± s. e. m. across participants. B: Evolution of skill estimates for individual participants, chosen to illustrate variability. C: Attribution proportions, mean ± s.e.m across participants, overall and conditioned on outcomes. D: Time series of attributions for individual participants, chosen to illustrate variability, not the same as in B.</p

    Staircase procedure.

    No full text
    (DOCX)</p

    Two conceptions of a cued lever press.

    No full text
    <p>(A) A latency <i>τ</i> with which to press the lever is selected in an initial cued state (‘1’), leading to completion of the press <i>τ</i> seconds later (‘2’). (B) A latency <i>τ</i> with which to press the lever is selected in an initial cued state (‘1’), leading to a state of preparedness to press <i>τ</i> seconds later (‘2’). Completion of the press (‘3’) occurs only after a subsequent interval <i>τ</i><sub><i>post</i></sub>. After a further inter-trial interval <i>τ</i><sub><i>I</i></sub>, the process begins anew.</p

    Constant and variable hazard functions.

    No full text
    <p>(A) Two different gamma densities of the time <i>T</i> at which the critic receives notification of an impending lever press. (B) Corresponding hazard functions <math><mrow><mi>h</mi><mrow><mo>(</mo><mi>t</mi><mo>^</mo><mo>)</mo></mrow><mo>=</mo><msub><mo>lim</mo><mrow><mo>Δ</mo><mi>t</mi><mo>^</mo><mo>→</mo><mn>0</mn></mrow></msub><mrow><mo>{</mo><mi>P</mi><mrow><mo>(</mo><mi>T</mi><mo>≤</mo><mi>t</mi><mo>^</mo><mo>+</mo><mo>Δ</mo><mi>t</mi><mo>^</mo><mo>|</mo><mi>T</mi><mo>></mo><mi>t</mi><mo>^</mo><mo>)</mo></mrow><mo>/</mo><mo>Δ</mo><mi>t</mi><mo>^</mo><mo>}</mo></mrow></mrow></math>. Note that the hazard function is constant in the <math><mrow><mi>G</mi><mo>(</mo><mn>1</mn><mo>,</mo><mn>1</mn><mo>)</mo></mrow></math> case, but increases with time in the <math><mrow><mi>G</mi><mo>(</mo><mn>2</mn><mo>,</mo><mn>1</mn><mo>)</mo></mrow></math> case.</p
    corecore