19 research outputs found

    A Survey of Monte Carlo Tree Search Methods

    Get PDF
    Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work

    Fast Approximate Max-n Monte Carlo Tree Search for Ms Pac-Man

    Get PDF
    We present an application of Monte Carlo tree search (MCTS) for the game of Ms Pac-Man. Contrary to most applications of MCTS to date, Ms Pac-Man requires almost real-time decision making and does not have a natural end state. We approached the problem by performing Monte Carlo tree searches on a five player maxn tree representation of the game with limited tree search depth. We performed a number of experiments using both the MCTS game agents (for pacman and ghosts) and agents used in previous work (for ghosts). Performance-wise, our approach gets excellent scores, outperforming previous non-MCTS opponent approaches to the game by up to two orders of magnitude. © 2011 IEEE

    Desarrollo de una política de selección para general Game Playing basada en el problema del bandido multi-armado

    Get PDF
    Durante la historia de la Inteligencia Artificial se han desarrollo diversos agentes inteligentes capaces de jugar juegos de tablero de entre los m´as destacados se tiene a DeepBlue para el ajedrez y AlphaGo para el juego de Go. Sin embargo, estos agentes solo est´an enfocados en un solo juego por lo cual el paso natural es desarrollar agentes capaces de jugar m´as de un juego, idea que persigue el ´area General Game Playing la cual se enfoca en desarrollar agentes completamente aut´onomos que puedan jugar cualquier juego de tablero sin intervenci´on humana y sin conocimiento previo. La mayor´ıa de los agentes est´an basados en el m´etodo de Arbol de B´usqueda Monte Carlo que usa ´ simulaciones Monte Carlo para estimar movimientos prometedores, este m´etodo consiste en cuatro pasos: Selecci´on, Expansi´on, Simulaci´on y Propagaci´on Hacia Atr´as. Los esfuerzos para incrementar el rendimiento del Arbol de B´usqueda Monte Carlo se enfocan en el paso de simulaci´on, sin embargo, ´ es tambi´en el paso de Selecci´on el que repercute en dicho rendimiento. El paso de selecci´on controla la forma en que se recorre el ´arbol asociado al juego de tablero, dicho recorrido es guiado por una Pol´ıtica de Selecci´on de la cual Upper Confidence Bound es la popularmente usada. En esta tesis de investigaci´on se presentan dos pol´ıticas de selecci´on UCBα1 y UCBα2, las cuales est´an basadas en Upper Confidence Bound pero est´an pensadas completamente para su aplicaci´on en General Game Playing. UCBα1 y UCBα2 a diferencia de Upper Confidence Bound aprovechan la estructura del ´arbol asociados a los juegos de tablero para determinar para un nodo padre, cu´anto se debe explorar los nodos hijos que representan los movimientos disponibles antes de decidirse por explotar un nodo hijo con un movimiento prometedor. Upper Confidence Bound controla la exploraci´on de nodos hijos por medio de una constante de exploraci´on, sin embargo, esta constante permanece fija en todo el ´arbol sin importar el n´umero de nodos hijos. UCBα1 y UCBα2 usan una funci´on que depende del

    Shallow decision-making analysis in General Video Game Playing

    Full text link
    The General Video Game AI competitions have been the testing ground for several techniques for game playing, such as evolutionary computation techniques, tree search algorithms, hyper heuristic based or knowledge based algorithms. So far the metrics used to evaluate the performance of agents have been win ratio, game score and length of games. In this paper we provide a wider set of metrics and a comparison method for evaluating and comparing agents. The metrics and the comparison method give shallow introspection into the agent's decision making process and they can be applied to any agent regardless of its algorithmic nature. In this work, the metrics and the comparison method are used to measure the impact of the terms that compose a tree policy of an MCTS based agent, comparing with several baseline agents. The results clearly show how promising such general approach is and how it can be useful to understand the behaviour of an AI agent, in particular, how the comparison with baseline agents can help understanding the shape of the agent decision landscape. The presented metrics and comparison method represent a step toward to more descriptive ways of logging and analysing agent's behaviours

    Self-adaptive MCTS for General Video Game Playing

    Get PDF
    Monte-carlo tree search (mcts) has shown particular success in general game playing (ggp) and general video game playing (gvgp) and many enhancements and variants have been developed. Recently, an on-line adaptive parameter tuning mechanism for mcts agents has been proposed that almost achieves the same performance as off-line tuning in ggp.in this paper we apply the same approach to gvgp and use the popular general video game ai (gvgai) framework, in which the time allowed to make a decision is only 40 ms. We design three self-adaptive mcts (sa-mcts) agents that optimize on-line the parameters of a standard non-self-adaptive mcts agent of gvgai. The three agents select the parameter values using naïve monte-carlo, an evolutionary algorithm and an n-tuple bandit evolutionary algorithm respectively, and are tested on 20 single-player games of gvgai.the sa-mcts agents achieve more robust results on the tested games. With the same time setting, they perform similarly to the baseline standard mcts agent in the games for which the baseline agent performs well, and significantly improve the win rate in the games for which the baseline agent performs poorly. As validation, we also test the performance of non-self-adaptive mcts instances that use the most sampled parameter settings during the on-line tuning of each of the three sa-mcts agents for each game. Results show that these parameter settings improve the win rate on the games wait for breakfast and escape by 4 times and 150 times, respectively

    Recent Advances in General Game Playing

    Get PDF
    The goal of General Game Playing (GGP) has been to develop computer programs that can perform well across various game types. It is natural for human game players to transfer knowledge from games they already know how to play to other similar games. GGP research attempts to design systems that work well across different game types, including unknown new games. In this review, we present a survey of recent advances (2011 to 2014) in GGP for both traditional games and video games. It is notable that research on GGP has been expanding into modern video games. Monte-Carlo Tree Search and its enhancements have been the most influential techniques in GGP for both research domains. Additionally, international competitions have become important events that promote and increase GGP research. Recently, a video GGP competition was launched. In this survey, we review recent progress in the most challenging research areas of Artificial Intelligence (AI) related to universal game playing

    Contributions to Monte Carlo Search

    Full text link
    This research is motivated by improving decision making under uncertainty and in particular for games and symbolic regression. The present dissertation gathers research contributions in the field of Monte Carlo Search. These contributions are focused around the selection, the simulation and the recommendation policies. Moreover, we develop a methodology to automatically generate an MCS algorithm for a given problem. For the selection policy, in most of the bandit literature, it is assumed that there is no structure or similarities between arms. Thus each arm is independent from one another. In several instances however, arms can be closely related. We show both theoretically and empirically, that a significant improvement over the state-of-the-art selection policies is possible. For the contribution on simulation policy, we focus on the symbolic regression problem and ponder on how to consistently generate different expressions by changing the probability to draw each symbol. We formalize the situation into an optimization problem and try different approaches. We show a clear improvement in the sampling process for any length. We further test the best approach by embedding it into a MCS algorithm and it still shows an improvement. For the contribution on recommendation policy, we study the most common in combination with selection policies. A good recommendation policy is a policy that works well with a given selection policy. We show that there is a trend that seems to favor a robust recommendation policy over a riskier one. We also present a contribution where we automatically generate several MCS algorithms from a list of core components upon which most MCS algorithms are built upon and compare them to generic algorithms. The results show that it often enables discovering new variants of MCS that significantly outperform generic MCS algorithms

    Artificial intelligence in co-operative games with partial observability

    Get PDF
    This thesis investigates Artificial Intelligence in co-operative games that feature Partial Observability. Most video games feature a combination of both co-operation, as well as Partial Observability. Co-operative games are games that feature a team of at least two agents, that must achieve a shared goal of some kind. Partial Observability is the restriction of how much of an environment that an agent can observe. The research performed in this thesis examines the challenge of creating Artificial Intelligence for co-operative games that feature Partial Observability. The main contributions are that Monte-Carlo Tree Search outperforms Genetic Algorithm based agents in solving co-operative problems without communication, the creation of a co-operative Partial Observability competition promoting Artificial Intelligence research as well as an investigation of the effect of varying Partial Observability to Artificial Intelligence, and finally the creation of a high performing Monte-Carlo Tree Search agent for the game Hanabi that uses agent modelling to rationalise about other players

    Improving Hearthstone AI by Combining MCTS and Supervised Learning Algorithms

    Full text link
    We investigate the impact of supervised prediction models on the strength and efficiency of artificial agents that use the Monte-Carlo Tree Search (MCTS) algorithm to play a popular video game Hearthstone: Heroes of Warcraft. We overview our custom implementation of the MCTS that is well-suited for games with partially hidden information and random effects. We also describe experiments which we designed to quantify the performance of our Hearthstone agent's decision making. We show that even simple neural networks can be trained and successfully used for the evaluation of game states. Moreover, we demonstrate that by providing a guidance to the game state search heuristic, it is possible to substantially improve the win rate, and at the same time reduce the required computations.Comment: Proceedings of the 2018 IEEE Conference on Computational Intelligence and Games (CIG'18); pages 445-452; ISBN: 978-1-5386-4358-

    The 2016 Two-Player GVGAI Competition

    Get PDF
    This paper showcases the setting and results of the first Two-Player General Video Game AI competition, which ran in 2016 at the IEEE World Congress on Computational Intelligence and the IEEE Conference on Computational Intelligence and Games. The challenges for the general game AI agents are expanded in this track from the single-player version, looking at direct player interaction in both competitive and cooperative environments of various types and degrees of difficulty. The focus is on the agents not only handling multiple problems, but also having to account for another intelligent entity in the game, who is expected to work towards their own goals (winning the game). This other player will possibly interact with first agent in a more engaging way than the environment or any non-playing character may do. The top competition entries are analyzed in detail and the performance of all agents is compared across the four sets of games. The results validate the competition system in assessing generality, as well as showing Monte Carlo Tree Search continuing to dominate by winning the overall Championship. However, this approach is closely followed by Rolling Horizon Evolutionary Algorithms, employed by the winner of the second leg of the contest
    corecore