and Temporal Difference Learning Algorithms in a Computerized Chess Program

Abstract

Computers have developed to the point where searching through a large set of data to find an optimum value can be done in a matter of seconds. However, there are still many domains (primarily in the realm of game theory) that are too complex to search through with brute force in a reasonable amount of time, and so heuristic searches have been developed to reduce the run time of such searches. That being said, some domains require very complex heuristics in order to be effective. The purpose of this study was to see if a computer could improve (or learn) its heuristic as it runs more searches. The domain 1 used was the game of chess, which has a very high complexity. The heuristic, or evaluation function, of a chess program needs to be able to accurately quantify the strength of a players position for any instance of the board. Creating such an evaluation function would be very dif-ficult because there are so many factors that go into determining the strength of a position: the relative value of pieces, the importance of controlling the center, the ability to attack the enemys stronger pieces something that chess masters spend entire lives trying to figure out. This study looked to see if it was possible for a computer program to learn an effective evaluation function by playing many games, ana-lyzing the results, and modifying its evaluation function accordingly. The process by which the program improved its evaluation function is called Temporal Difference learning.

Similar works

Full text

thumbnail-image

CiteSeerX

redirect
Last time updated on 28/10/2017

This paper was published in CiteSeerX.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.