36,364 research outputs found
Expert iteration
In this thesis, we study how reinforcement learning algorithms can tackle classical board games without recourse to human knowledge. Specifically, we develop a framework and algorithms which learn to play the board game Hex starting from random play. We first describe Expert Iteration (ExIt), a novel reinforcement learning framework which extends Modified Policy Iteration. ExIt explicitly decomposes the reinforcement learning problem into two parts: planning and generalisation. A planning algorithm explores possible move sequences starting from a particular position to find good strategies from that position, while a parametric function approximator is trained to predict those plans, generalising to states not yet seen. Subsequently, planning is improved by using the approximated policy to guide search, increasing the strength of new plans. This decomposition allows ExIt to combine the benefits of both planning methods and function approximation methods. We demonstrate the effectiveness of the ExIt paradigm by implementing ExIt with two different planning algorithms. First, we develop a version based on Monte Carlo Tree Search (MCTS), a search algorithm which has been successful both in specific games, such as Go, Hex and Havannah, and in general game playing competitions. We then develop a new planning algorithm, Policy Gradient Search (PGS), which uses a model-free reinforcement learning algorithm for online planning. Unlike MCTS, PGS does not require an explicit search tree. Instead PGS uses function approximation within a single search, allowing it to be applied to problems with larger branching factors. Both MCTS-ExIt and PGS-ExIt defeated MoHex 2.0 - the most recent Hex Olympiad winner to be open sourced - in 9 × 9 Hex. More importantly, whereas MoHex makes use of many Hex-specific improvements and knowledge, all our programs were trained tabula rasa using general reinforcement learning methods. This bodes well for ExIt’s applicability to both other games and real world decision making problems
Assessing the Potential of Classical Q-learning in General Game Playing
After the recent groundbreaking results of AlphaGo and AlphaZero, we have
seen strong interests in deep reinforcement learning and artificial general
intelligence (AGI) in game playing. However, deep learning is
resource-intensive and the theory is not yet well developed. For small games,
simple classical table-based Q-learning might still be the algorithm of choice.
General Game Playing (GGP) provides a good testbed for reinforcement learning
to research AGI. Q-learning is one of the canonical reinforcement learning
methods, and has been used by (Banerjee Stone, IJCAI 2007) in GGP. In this
paper we implement Q-learning in GGP for three small-board games (Tic-Tac-Toe,
Connect Four, Hex)\footnote{source code: https://github.com/wh1992v/ggp-rl}, to
allow comparison to Banerjee et al.. We find that Q-learning converges to a
high win rate in GGP. For the -greedy strategy, we propose a first
enhancement, the dynamic algorithm. In addition, inspired by (Gelly
Silver, ICML 2007) we combine online search (Monte Carlo Search) to
enhance offline learning, and propose QM-learning for GGP. Both enhancements
improve the performance of classical Q-learning. In this work, GGP allows us to
show, if augmented by appropriate enhancements, that classical table-based
Q-learning can perform well in small games.Comment: arXiv admin note: substantial text overlap with arXiv:1802.0594
Finding Competitive Network Architectures Within a Day Using UCT
The design of neural network architectures for a new data set is a laborious
task which requires human deep learning expertise. In order to make deep
learning available for a broader audience, automated methods for finding a
neural network architecture are vital. Recently proposed methods can already
achieve human expert level performances. However, these methods have run times
of months or even years of GPU computing time, ignoring hardware constraints as
faced by many researchers and companies. We propose the use of Monte Carlo
planning in combination with two different UCT (upper confidence bound applied
to trees) derivations to search for network architectures. We adapt the UCT
algorithm to the needs of network architecture search by proposing two ways of
sharing information between different branches of the search tree. In an
empirical study we are able to demonstrate that this method is able to find
competitive networks for MNIST, SVHN and CIFAR-10 in just a single GPU day.
Extending the search time to five GPU days, we are able to outperform human
architectures and our competitors which consider the same types of layers
Cover Tree Bayesian Reinforcement Learning
This paper proposes an online tree-based Bayesian approach for reinforcement
learning. For inference, we employ a generalised context tree model. This
defines a distribution on multivariate Gaussian piecewise-linear models, which
can be updated in closed form. The tree structure itself is constructed using
the cover tree method, which remains efficient in high dimensional spaces. We
combine the model with Thompson sampling and approximate dynamic programming to
obtain effective exploration policies in unknown environments. The flexibility
and computational simplicity of the model render it suitable for many
reinforcement learning problems in continuous state spaces. We demonstrate this
in an experimental comparison with least squares policy iteration
Deep learning for video game playing
In this article, we review recent Deep Learning advances in the context of
how they have been applied to play different types of video games such as
first-person shooters, arcade games, and real-time strategy games. We analyze
the unique requirements that different game genres pose to a deep learning
system and highlight important open challenges in the context of applying these
machine learning methods to video games, such as general game playing, dealing
with extremely large decision spaces and sparse rewards
New Ideas for Brain Modelling
This paper describes some biologically-inspired processes that could be used
to build the sort of networks that we associate with the human brain. New to
this paper, a 'refined' neuron will be proposed. This is a group of neurons
that by joining together can produce a more analogue system, but with the same
level of control and reliability that a binary neuron would have. With this new
structure, it will be possible to think of an essentially binary system in
terms of a more variable set of values. The paper also shows how recent
research associated with the new model, can be combined with established
theories, to produce a more complete picture. The propositions are largely in
line with conventional thinking, but possibly with one or two more radical
suggestions. An earlier cognitive model can be filled in with more specific
details, based on the new research results, where the components appear to fit
together almost seamlessly. The intention of the research has been to describe
plausible 'mechanical' processes that can produce the appropriate brain
structures and mechanisms, but that could be used without the magical
'intelligence' part that is still not fully understood. There are also some
important updates from an earlier version of this paper
- …