11 research outputs found

    Biasing MCTS with Features for General Games

    Get PDF
    This paper proposes using a linear function approximator, rather than a deep neural network (DNN), to bias a Monte Carlo tree search (MCTS) player for general games. This is unlikely to match the potential raw playing strength of DNNs, but has advantages in terms of generality, interpretability and resources (time and hardware) required for training. Features describing local patterns are used as inputs. The features are formulated in such a way that they are easily interpretable and applicable to a wide range of general games, and might encode simple local strategies. We gradually create new features during the same self-play training process used to learn feature weights. We evaluate the playing strength of an MCTS player biased by learnt features against a standard upper confidence bounds for trees (UCT) player in multiple different board games, and demonstrate significantly improved playing strength in the majority of them after a small number of self-play training games.Comment: Accepted at IEEE CEC 2019, Special Session on Games. Copyright of final version held by IEE

    Learning Policies from Self-Play with Policy Gradients and MCTS Value Estimates

    Get PDF
    In recent years, state-of-the-art game-playing agents often involve policies that are trained in self-playing processes where Monte Carlo tree search (MCTS) algorithms and trained policies iteratively improve each other. The strongest results have been obtained when policies are trained to mimic the search behaviour of MCTS by minimising a cross-entropy loss. Because MCTS, by design, includes an element of exploration, policies trained in this manner are also likely to exhibit a similar extent of exploration. In this paper, we are interested in learning policies for a project with future goals including the extraction of interpretable strategies, rather than state-of-the-art game-playing performance. For these goals, we argue that such an extent of exploration is undesirable, and we propose a novel objective function for training policies that are not exploratory. We derive a policy gradient expression for maximising this objective function, which can be estimated using MCTS value estimates, rather than MCTS visit counts. We empirically evaluate various properties of resulting policies, in a variety of board games.Comment: Accepted at the IEEE Conference on Games (CoG) 201

    Distributed Nested Rollout Policy for Same Game

    Get PDF
    Nested Rollout Policy Adaptation (NRPA) is a Monte Carlo search heuristic for puzzles and other optimization problems. It achieves state-of-the-art performance on several games including SameGame. In this paper, we design several parallel and distributed NRPA-based search techniques, and we provide a number of experimental insights about their execution. Finally, we use our best implementation to discover 15 better scores for 20 standard SameGame boards

    The Science of Networks: Urban Movement Design, Analytics, and Navigation

    Get PDF
    The science of networks, a relatively young field of research that appeared in such a form and definition at the beginning of the 21st century (as a distinctive, officially approved and accepted scientific discipline (Barabási, 2016)), represents a very powerful area considering the range of subjects to which it contributes and is applied. This science is key for complex systems analysis or analytics (when referring to the (big) data science framework, which now mostly defines its methods and resources (Betty, 2019)) based on the claim that networks encode the interactions between the system’s components (Barabási, 2016) and thus provide insights into the ways complex systems behave, or control the behaviour of the artificially created systems (emphasis added). The area herewith represented through its analytical methods and forms (network graphs and related operations) is the urban transportation system — the Grand Paris rail system, including all the categories with their existing lines and extensions currently either in construction and planned, or under consideration in the long term. The network has been created as a background topological environment and geometry for various research operations and generative design tasks. Some of them, such as urban movement path generation or the network’s incremental growth and reconfiguration as a system and the geometry of possible moves (legal actions), will be presented in more detail. The network can be considered both an abstract and real-world environment and situation, susceptible to the research of both gaming strategies for any constructed scenario and designed spatial situation (academic gaming, operational gaming, and heuristic gaming) and problem-solving strategies related to identified real-world design issues. Thus, the main question posed before the presented graph addresses the ways in which it can be operationalised and the methods through which this can be achieved, with special regard to AI.Invited Conference Contribution - Poster Section / Pape

    Extracting tactics learned from self-play in general games

    Get PDF
    Local, spatial state-action features can be used to effectively train linear policies from self-play in a wide variety of board games. Such policies can play games directly, or be used to bias tree search agents. However, the resulting feature sets can be large, with a significant amount of overlap and redundancies between features. This is a problem for two reasons. Firstly, large feature sets can be computationally expensive, which reduces the playing strength of agents based on them. Secondly, redundancies and correlations between features impair the ability for humans to analyse, interpret, or understand tactics learned by the policies. We look towards decision trees for their ability to perform feature selection, and serve as interpretable models. Previous work on distilling policies into decision trees uses states as inputs, and distributions over the complete action space as outputs. In contrast, we propose and evaluate a variety of decision tree types, which take state-action pairs as inputs, and provide various different types of outputs on a per-action basis. An empirical evaluation over 43 different board games is presented, and two of those games are used as case studies where we attempt to interpret the discovered features
    corecore