image

TD-based Markov prediction with perfect shaping.

Abstract

(A) The ideal shaping function ϕ (blue circles) is 1 after acquisition of the food (at s1) until the reward arrives (red cross at sT). (B) Evolution of the value for the application of TD learning to the case that T = 10. Upper plot: average over 1,000 simulations; lower plot: single simulation showing . (C) Evolution of the TD prediction error δt over the same trials. Upper plot: average over 1,000 simulations; lower plots: single simulation showing δ0 for a transition to s = s1 (above); or to s = s* (below). Here, α = 0.1. TD, temporal difference.</p

    Similar works

    Full text

    thumbnail-image