865 research outputs found
Temporal Difference Learning in Complex Domains
PhDThis thesis adapts and improves on the methods of TD(k) (Sutton 1988) that were
successfully used for backgammon (Tesauro 1994) and applies them to other complex
games that are less amenable to simple pattem-matching approaches. The games
investigated are chess and shogi, both of which (unlike backgammon) require
significant amounts of computational effort to be expended on search in order to
achieve expert play. The improved methods are also tested in a non-game domain.
In the chess domain, the adapted TD(k) method is shown to successfully learn the
relative values of the pieces, and matches using these learnt piece values indicate that
they perform at least as well as piece values widely quoted in elementary chess books.
The adapted TD(X) method is also shown to work well in shogi, considered by many
researchers to be the next challenge for computer game-playing, and for which there
is no standardised set of piece values.
An original method to automatically set and adjust the major control parameters used
by TD(k) is presented. The main performance advantage comes from the learning
rate adjustment, which is based on a new concept called temporal coherence.
Experiments in both chess and a random-walk domain show that the temporal
coherence algorithm produces both faster learning and more stable values than both
human-chosen parameters and an earlier method for learning rate adjustment.
The methods presented in this thesis allow programs to learn with as little input of
external knowledge as possible, exploring the domain on their own rather than by
being taught. Further experiments show that the method is capable of handling many
hundreds of weights, and that it is not necessary to perform deep searches during the
leaming phase in order to learn effective weight
Improved Reinforcement Learning with Curriculum
Humans tend to learn complex abstract concepts faster if examples are
presented in a structured manner. For instance, when learning how to play a
board game, usually one of the first concepts learned is how the game ends,
i.e. the actions that lead to a terminal state (win, lose or draw). The
advantage of learning end-games first is that once the actions which lead to a
terminal state are understood, it becomes possible to incrementally learn the
consequences of actions that are further away from a terminal state - we call
this an end-game-first curriculum. Currently the state-of-the-art machine
learning player for general board games, AlphaZero by Google DeepMind, does not
employ a structured training curriculum; instead learning from the entire game
at all times. By employing an end-game-first training curriculum to train an
AlphaZero inspired player, we empirically show that the rate of learning of an
artificial player can be improved during the early stages of training when
compared to a player not using a training curriculum.Comment: Draft prior to submission to IEEE Trans on Games. Changed paper
slightl
Temoral Difference Learning in Complex Domains
Submitted to the University of London for the Degree of Doctor of Philosophy in Computer Scienc
Learning to Search in Reinforcement Learning
In this thesis, we investigate the use of search based algorithms with deep neural
networks to tackle a wide range of problems ranging from board games to video
games and beyond. Drawing inspiration from AlphaGo, the first computer program
to achieve superhuman performance in the game of Go, we developed a new algorithm AlphaZero. AlphaZero is a general reinforcement learning algorithm that
combines deep neural networks with a Monte Carlo Tree search for planning and
learning. Starting completely from scratch, without any prior human knowledge
beyond the basic rules of the game, AlphaZero managed to achieve superhuman
performance in Go, chess and shogi. Subsequently, building upon the success of AlphaZero, we investigated ways to extend our methods to problems in which the rules
are not known or cannot be hand-coded. This line of work led to the development
of MuZero, a model-based reinforcement learning agent that builds a deterministic
internal model of the world and uses it to construct plans in its imagination. We
applied our method to Go, chess, shogi and the classic Atari suite of video-games,
achieving superhuman performance. MuZero is the first RL algorithm to master
a variety of both canonical challenges for high performance planning and visually complex problems using the same principles. Finally, we describe Stochastic
MuZero, a general agent that extends the applicability of MuZero to highly stochastic environments. We show that our method achieves superhuman performance in
stochastic domains such as backgammon and the classic game of 2048 while matching the performance of MuZero in deterministic ones like Go
Playing Cassino with Reinforcement Learning
Reinforcement learning algorithms have been used to create game-playing agents for various games—mostly, deterministic games such as chess, shogi, and Go. This study used Deep-Q reinforcement learning to create an agent that plays a non-deterministic card game, Cassino. This agent’s performance was compared against the performance of a Cassino mobile app. Results showed that the trained models did not perform well and had trouble training around build actions which are important in Cassino. Future research could experiment with other reinforcement learning algorithms to see if they are better at training around build actions
- …