181 research outputs found
Temporal Difference Learning in Complex Domains
PhDThis thesis adapts and improves on the methods of TD(k) (Sutton 1988) that were
successfully used for backgammon (Tesauro 1994) and applies them to other complex
games that are less amenable to simple pattem-matching approaches. The games
investigated are chess and shogi, both of which (unlike backgammon) require
significant amounts of computational effort to be expended on search in order to
achieve expert play. The improved methods are also tested in a non-game domain.
In the chess domain, the adapted TD(k) method is shown to successfully learn the
relative values of the pieces, and matches using these learnt piece values indicate that
they perform at least as well as piece values widely quoted in elementary chess books.
The adapted TD(X) method is also shown to work well in shogi, considered by many
researchers to be the next challenge for computer game-playing, and for which there
is no standardised set of piece values.
An original method to automatically set and adjust the major control parameters used
by TD(k) is presented. The main performance advantage comes from the learning
rate adjustment, which is based on a new concept called temporal coherence.
Experiments in both chess and a random-walk domain show that the temporal
coherence algorithm produces both faster learning and more stable values than both
human-chosen parameters and an earlier method for learning rate adjustment.
The methods presented in this thesis allow programs to learn with as little input of
external knowledge as possible, exploring the domain on their own rather than by
being taught. Further experiments show that the method is capable of handling many
hundreds of weights, and that it is not necessary to perform deep searches during the
leaming phase in order to learn effective weight
Temoral Difference Learning in Complex Domains
Submitted to the University of London for the Degree of Doctor of Philosophy in Computer Scienc
Advances in decision-theoretic AI : limited rationality and abstract search
Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1994.Includes bibliographical references (p. 153-165).by Michael Patrick Frank.M.S
A Mathematical Analysis of the Game of Santorini
Santorini is a two player combinatorial board game. Santorini bears resemblance to the graph theory game of Geography, a game of moving and deleting vertices on a graph. We explore Santorini with game theory, complexity theory, and artificial intelligence. We present David Lichtenstein’s proof that Geography is PSPACE-hard and adapt the proof for generalized forms of Santorini. Last, we discuss the development of an AI built for a software implementation of Santorini and present a number of improvements to that AI
The effect of simulation bias on action selection in Monte Carlo Tree Search
A dissertation submitted to the Faculty of Science, University of the Witwatersrand,
in fulfilment of the requirements for the degree of Master of Science. August 2016.Monte Carlo Tree Search (MCTS) is a family of directed search algorithms that has gained widespread
attention in recent years. It combines a traditional tree-search approach with Monte Carlo
simulations, using the outcome of these simulations (also known as playouts or rollouts) to evaluate
states in a look-ahead tree. That MCTS does not require an evaluation function makes it particularly
well-suited to the game of Go — seen by many to be chess’s successor as a grand challenge of
artificial intelligence — with MCTS-based agents recently able to achieve expert-level play on
19×19 boards. Furthermore, its domain-independent nature also makes it a focus in a variety of
other fields, such as Bayesian reinforcement learning and general game-playing.
Despite the vast amount of research into MCTS, the dynamics of the algorithm are still not
yet fully understood. In particular, the effect of using knowledge-heavy or biased simulations in
MCTS still remains unknown, with interesting results indicating that better-informed rollouts do
not necessarily result in stronger agents. This research provides support for the notion that MCTS
is well-suited to a class of domain possessing a smoothness property. In these domains, biased
rollouts are more likely to produce strong agents. Conversely, any error due to incorrect bias
is compounded in non-smooth domains, and in particular for low-variance simulations. This is
demonstrated empirically in a number of single-agent domains.LG201
Search and planning under incomplete information : a study using Bridge card play
This thesis investigates problem-solving in domains featuring incomplete information and multiple agents with opposing goals. In particular, we describe Finesse --- a system that forms plans for the problem of declarer play in the game of Bridge. We begin by examining the problem of search. We formalise a best defence model of incomplete information games in which equilibrium point strategies can be identified, and identify two specific problems that can affect algorithms in such domains. In Bridge, we show that the best defence model corresponds to the typical model analysed in expert texts, and examine search algorithms which overcome the problems we have identified. Next, we look at how planning algorithms can be made to cope with the difficulties of such domains. This calls for the development of new techniques for representing uncertainty and actions with disjunctive effects, for coping with an opposition, and for reasoning about compound actions. We tackle these problems with a..
A hybridisation technique for game playing using the upper confidence for trees algorithm with artificial neural networks
In the domain of strategic game playing, the use of statistical techniques such as the Upper Confidence for Trees (UCT) algorithm, has become the norm as they offer many benefits over classical algorithms. These benefits include requiring no game-specific strategic knowledge and time-scalable performance. UCT does not incorporate any strategic information specific to the game considered, but instead uses repeated sampling to effectively brute-force search through the game tree or search space. The lack of game-specific knowledge in UCT is thus both a benefit but also a strategic disadvantage. Pattern recognition techniques, specifically Neural Networks (NN), were identified as a means of addressing the lack of game-specific knowledge in UCT. Through a novel hybridisation technique which combines UCT and trained NNs for pruning, the UCTNN algorithm was derived. The NN component of UCT-NN was trained using a UCT self-play scheme to generate game-specific knowledge without the need to construct and manage game databases for training purposes. The UCT-NN algorithm is outlined for pruning in the game of Go-Moku as a candidate case-study for this research. The UCT-NN algorithm contained three major parameters which emerged from the UCT algorithm, the use of NNs and the pruning schemes considered. Suitable methods for finding candidate values for these three parameters were outlined and applied to the game of Go-Moku on a 5 by 5 board. An empirical investigation of the playing performance of UCT-NN was conducted in comparison to UCT through three benchmarks. The benchmarks comprise a common randomly moving opponent, a common UCTmax player which is given a large amount of playing time, and a pair-wise tournament between UCT-NN and UCT. The results of the performance evaluation for 5 by 5 Go-Moku were promising, which prompted an evaluation of a larger 9 by 9 Go-Moku board. The results of both evaluations indicate that the time allocated to the UCT-NN algorithm directly affects its performance when compared to UCT. The UCT-NN algorithm generally performs better than UCT in games with very limited time-constraints in all benchmarks considered except when playing against a randomly moving player in 9 by 9 Go-Moku. In real-time and near-real-time Go-Moku games, UCT-NN provides statistically significant improvements compared to UCT. The findings of this research contribute to the realisation of applying game-specific knowledge to the UCT algorithm
PSO-based coevolutionary Game Learning
Games have been investigated as computationally complex problems since the inception of artificial intelligence in the 1950’s. Originally, search-based techniques were applied to create a competent (and sometimes even expert) game player. The search-based techniques, such as game trees, made use of human-defined knowledge to evaluate the current game state and recommend the best move to make next. Recent research has shown that neural networks can be evolved as game state evaluators, thereby removing the human intelligence factor completely. This study builds on the initial research that made use of evolutionary programming to evolve neural networks in the game learning domain. Particle Swarm Optimisation (PSO) is applied inside a coevolutionary training environment to evolve the weights of the neural network. The training technique is applied to both the zero sum and non-zero sum game domains, with specific application to Tic-Tac-Toe, Checkers and the Iterated Prisoners Dilemma (IPD). The influence of the various PSO parameters on playing performance are experimentally examined, and the overall performance of three different neighbourhood information sharing structures compared. A new coevolutionary scoring scheme and particle dispersement operator are defined, inspired by Formula One Grand Prix racing. Finally, the PSO is applied in three novel ways to evolve strategies for the IPD – the first application of its kind in the PSO field. The PSO-based coevolutionary learning technique described and examined in this study shows promise in evolving intelligent evaluators for the aforementioned games, and further study will be conducted to analyse its scalability to larger search spaces and games of varying complexity.Dissertation (MSc)--University of Pretoria, 2005.Computer Scienceunrestricte
- …