11 research outputs found
Biasing MCTS with Features for General Games
This paper proposes using a linear function approximator, rather than a deep
neural network (DNN), to bias a Monte Carlo tree search (MCTS) player for
general games. This is unlikely to match the potential raw playing strength of
DNNs, but has advantages in terms of generality, interpretability and resources
(time and hardware) required for training. Features describing local patterns
are used as inputs. The features are formulated in such a way that they are
easily interpretable and applicable to a wide range of general games, and might
encode simple local strategies. We gradually create new features during the
same self-play training process used to learn feature weights. We evaluate the
playing strength of an MCTS player biased by learnt features against a standard
upper confidence bounds for trees (UCT) player in multiple different board
games, and demonstrate significantly improved playing strength in the majority
of them after a small number of self-play training games.Comment: Accepted at IEEE CEC 2019, Special Session on Games. Copyright of
final version held by IEE
Learning Policies from Self-Play with Policy Gradients and MCTS Value Estimates
In recent years, state-of-the-art game-playing agents often involve policies
that are trained in self-playing processes where Monte Carlo tree search (MCTS)
algorithms and trained policies iteratively improve each other. The strongest
results have been obtained when policies are trained to mimic the search
behaviour of MCTS by minimising a cross-entropy loss. Because MCTS, by design,
includes an element of exploration, policies trained in this manner are also
likely to exhibit a similar extent of exploration. In this paper, we are
interested in learning policies for a project with future goals including the
extraction of interpretable strategies, rather than state-of-the-art
game-playing performance. For these goals, we argue that such an extent of
exploration is undesirable, and we propose a novel objective function for
training policies that are not exploratory. We derive a policy gradient
expression for maximising this objective function, which can be estimated using
MCTS value estimates, rather than MCTS visit counts. We empirically evaluate
various properties of resulting policies, in a variety of board games.Comment: Accepted at the IEEE Conference on Games (CoG) 201
Distributed Nested Rollout Policy for Same Game
Nested Rollout Policy Adaptation (NRPA) is a Monte Carlo search heuristic for puzzles and other optimization problems. It achieves state-of-the-art performance on several games including SameGame. In this paper, we design several parallel and distributed NRPA-based search techniques, and we provide a number of experimental insights about their execution. Finally, we use our best implementation to discover 15 better scores for 20 standard SameGame boards
The Science of Networks: Urban Movement Design, Analytics, and Navigation
The science of networks, a relatively young field of research that appeared in such a form and definition at the beginning of the 21st century (as a distinctive, officially approved and accepted scientific discipline (Barabási, 2016)), represents a very powerful area considering the range of subjects to which it contributes and is applied. This science is key for complex systems analysis or analytics (when referring to the (big) data science framework, which now mostly defines its methods and resources (Betty, 2019)) based on the claim that networks encode the interactions between the system’s components (Barabási, 2016) and thus provide insights into the ways complex systems behave, or control the behaviour of the artificially created systems (emphasis added). The area herewith represented through its analytical methods and forms (network graphs and related operations) is the urban transportation system — the Grand Paris rail system, including all the categories with their existing lines and extensions currently either in construction and planned, or under consideration in the long term. The network has been created as a background topological environment and geometry for various research operations and generative design tasks. Some of them, such as urban movement path generation or the network’s incremental growth and reconfiguration as a system and the geometry of possible moves (legal actions), will be presented in more detail. The network can be considered both an abstract and real-world environment and situation, susceptible to the research of both gaming strategies for any constructed scenario and designed spatial situation (academic gaming, operational gaming, and heuristic gaming) and problem-solving strategies related to identified real-world design issues. Thus, the main question posed before the presented graph addresses the ways in which it can be operationalised and the methods through which this can be achieved, with special regard to AI.Invited Conference Contribution - Poster Section / Pape
Extracting tactics learned from self-play in general games
Local, spatial state-action features can be used to effectively train linear policies from self-play in a wide variety of board games. Such policies can play games directly, or be used to bias tree search agents. However, the resulting feature sets can be large, with a significant amount of overlap and redundancies between features. This is a problem for two reasons. Firstly, large feature sets can be computationally expensive, which reduces the playing strength of agents based on them. Secondly, redundancies and correlations between features impair the ability for humans to analyse, interpret, or understand tactics learned by the policies. We look towards decision trees for their ability to perform feature selection, and serve as interpretable models. Previous work on distilling policies into decision trees uses states as inputs, and distributions over the complete action space as outputs. In contrast, we propose and evaluate a variety of decision tree types, which take state-action pairs as inputs, and provide various different types of outputs on a per-action basis. An empirical evaluation over 43 different board games is presented, and two of those games are used as case studies where we attempt to interpret the discovered features