4 research outputs found

    Temporal Difference Learning in Complex Domains

    Get PDF
    PhDThis thesis adapts and improves on the methods of TD(k) (Sutton 1988) that were successfully used for backgammon (Tesauro 1994) and applies them to other complex games that are less amenable to simple pattem-matching approaches. The games investigated are chess and shogi, both of which (unlike backgammon) require significant amounts of computational effort to be expended on search in order to achieve expert play. The improved methods are also tested in a non-game domain. In the chess domain, the adapted TD(k) method is shown to successfully learn the relative values of the pieces, and matches using these learnt piece values indicate that they perform at least as well as piece values widely quoted in elementary chess books. The adapted TD(X) method is also shown to work well in shogi, considered by many researchers to be the next challenge for computer game-playing, and for which there is no standardised set of piece values. An original method to automatically set and adjust the major control parameters used by TD(k) is presented. The main performance advantage comes from the learning rate adjustment, which is based on a new concept called temporal coherence. Experiments in both chess and a random-walk domain show that the temporal coherence algorithm produces both faster learning and more stable values than both human-chosen parameters and an earlier method for learning rate adjustment. The methods presented in this thesis allow programs to learn with as little input of external knowledge as possible, exploring the domain on their own rather than by being taught. Further experiments show that the method is capable of handling many hundreds of weights, and that it is not necessary to perform deep searches during the leaming phase in order to learn effective weight

    Heuristic search under time and cost bounds

    Get PDF
    Intelligence is difficult to formally define, but one of its hallmarks is the ability find a solution to a novel problem. Therefore it makes good sense that heuristic search is a foundational topic in artificial intelligence. In this context search refers to the process of finding a solution to the problem by considering a large, possibly infinite, set of potential plans of action. Heuristic refers to a rule of thumb or a guiding, if not always accurate, principle. Heuristic search describes a family of techniques which consider members of the set of potential plans of action in turn, as determined by the heuristic, until a suitable solution to the problem is discovered. This work is concerned primarily with suboptimal heuristic search algorithms. These algorithms are not inherently flawed, but they are suboptimal in the sense that the plans that they return may be more expensive than a least cost, or optimal, plan for the problem. While suboptimal heuristic search algorithms may not return least cost solutions to the problem, they are often far faster than their optimal counterparts, making them more attractive for many applications. The thesis of this dissertation is that the performance of suboptimal search algorithms can be improved by taking advantage of information that, while widely available, has been overlooked. In particular, we will see how estimates of the length of a plan, estimates of plan cost that do not err on the side of caution, and measurements of the accuracy of our estimators can be used to improve the performance of suboptimal heuristic search algorithms
    corecore