Temporal difference (TD) methods constitute a class of methods for learning
predictions in multi-step prediction problems, parameterized by a recency
factor lambda. Currently the most important application of these methods is to
temporal credit assignment in reinforcement learning. Well known reinforcement
learning algorithms, such as AHC or Q-learning, may be viewed as instances of
TD learning. This paper examines the issues of the efficient and general
implementation of TD(lambda) for arbitrary lambda, for use with reinforcement
learning algorithms optimizing the discounted sum of rewards. The traditional
approach, based on eligibility traces, is argued to suffer from both
inefficiency and lack of generality. The TTD (Truncated Temporal Differences)
procedure is proposed as an alternative, that indeed only approximates
TD(lambda), but requires very little computation per action and can be used
with arbitrary function representation methods. The idea from which it is
derived is fairly simple and not new, but probably unexplored so far.
Encouraging experimental results are presented, suggesting that using lambda
> 0 with the TTD procedure allows one to obtain a significant learning
speedup at essentially the same cost as usual TD(0) learning.Comment: See http://www.jair.org/ for any accompanying file