image

TD-based Markov prediction.

Abstract

(A) Simple Markov prediction problem with a tasty morsel provided at t = 1 (s = s1) with probability p = 0.3, which leads to a digestive reward of rT = 1 at time T. (B) Evolution of the value for the application of TD learning to the case that T = 10. Upper plot: average over 1,000 simulations (here, and in later figures, we label state si by just its index i); lower plot: single simulation showing . (C) Evolution of the TD prediction error δt over the same trials. Upper plot: average over 1,000 simulations; lower plots: single simulation showing δ0 for a transition to s = s1 (above); or to s = s* (below). Here, α = 0.1. TD, temporal difference.</p

    Similar works

    Full text

    thumbnail-image