7 research outputs found

    Monte Carlo Tree Search with Heuristic Evaluations using Implicit Minimax Backups

    Full text link
    Monte Carlo Tree Search (MCTS) has improved the performance of game engines in domains such as Go, Hex, and general game playing. MCTS has been shown to outperform classic alpha-beta search in games where good heuristic evaluations are difficult to obtain. In recent years, combining ideas from traditional minimax search in MCTS has been shown to be advantageous in some domains, such as Lines of Action, Amazons, and Breakthrough. In this paper, we propose a new way to use heuristic evaluations to guide the MCTS search by storing the two sources of information, estimated win rates and heuristic evaluations, separately. Rather than using the heuristic evaluations to replace the playouts, our technique backs them up implicitly during the MCTS simulations. These minimax values are then used to guide future simulations. We show that using implicit minimax backups leads to stronger play performance in Kalah, Breakthrough, and Lines of Action.Comment: 24 pages, 7 figures, 9 tables, expanded version of paper presented at IEEE Conference on Computational Intelligence and Games (CIG) 2014 conferenc

    Enhancements for Monte-Carlo Tree Search in Ms Pac-Man

    No full text
    Abstract—In this paper enhancements for the Monte-Carlo Tree Search (MCTS) framework are investigated to play Ms Pac-Man. MCTS is used to find an optimal path for an agent at each turn, determining the move to make based on randomised simulations. Ms Pac-Man is a real-time arcade game, in which the protagonist has several independent goals but no conclusive terminal state. Unlike games such as Chess or Go there is no state in which the player wins the game. Furthermore, the Pac-Man agent has to compete with a range of different ghost agents, hence limited assumptions can be made about the opponent’s behaviour. In order to expand the capabilities of existing MCTS agents, five enhancements are discussed: 1) a variable depth tree, 2) playout strategies for the ghost-team and Pac-Man, 3) including long-term goals in scoring, 4) endgame tactics, and 5) a Last-Good-Reply policy for memorising rewarding moves during playouts. An average performance gain of 40,962 points, compared to the average score of the top scoring Pac-Man agent during the CIG’11, is achieved by employing these methods. I

    Real-Time Monte Carlo Tree Search in Ms Pac-Man

    No full text

    Sequential halving for partially observable games

    No full text
    This paper investigates Sequential Halving as a selection policy in the following four partially observable games: Go Fish, Lost Cities, Phantom Domineering, and Phantom Go. Additionally, H-MCTS is studied, which uses Sequential Halving at the root of the search tree, and UCB elsewhere. Experimental results reveal that H-MCTS performs the best in Go Fish, whereas its performance is on par in Lost Cities and Phantom Domineering. Sequential Halving as a flat Monte-Carlo Search appears to be the stronger technique in Phantom Go
    corecore