492 research outputs found
A Survey of Monte Carlo Tree Search Methods
Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work
Solving Games with Functional Regret Estimation
We propose a novel online learning method for minimizing regret in large
extensive-form games. The approach learns a function approximator online to
estimate the regret for choosing a particular action. A no-regret algorithm
uses these estimates in place of the true regrets to define a sequence of
policies.
We prove the approach sound by providing a bound relating the quality of the
function approximation and regret of the algorithm. A corollary being that the
method is guaranteed to converge to a Nash equilibrium in self-play so long as
the regrets are ultimately realizable by the function approximator. Our
technique can be understood as a principled generalization of existing work on
abstraction in large games; in our work, both the abstraction as well as the
equilibrium are learned during self-play. We demonstrate empirically the method
achieves higher quality strategies than state-of-the-art abstraction techniques
given the same resources.Comment: AAAI Conference on Artificial Intelligence 201
Helping AI to Play Hearthstone: AAIA'17 Data Mining Challenge
This paper summarizes the AAIA'17 Data Mining Challenge: Helping AI to Play
Hearthstone which was held between March 23, and May 15, 2017 at the Knowledge
Pit platform. We briefly describe the scope and background of this competition
in the context of a more general project related to the development of an AI
engine for video games, called Grail. We also discuss the outcomes of this
challenge and demonstrate how predictive models for the assessment of player's
winning chances can be utilized in a construction of an intelligent agent for
playing Hearthstone. Finally, we show a few selected machine learning
approaches for modeling state and action values in Hearthstone. We provide
evaluation for a few promising solutions that may be used to create more
advanced types of agents, especially in conjunction with Monte Carlo Tree
Search algorithms.Comment: Federated Conference on Computer Science and Information Systems,
Prague (FedCSIS-2017) (Prague, Czech Republic
Kernel Estimation and Model Combination in a Bandit Problem with Covariates
Multi-armed bandit problem is an important optimization game that requires an exploration-exploitation tradeoff to achieve optimal total reward. Motivated from industrial applications such as online advertising and clinical research, we consider a setting where the rewards of bandit machines are associated with covariates, and the accurate estimation of the corresponding mean reward functions plays an important role in the performance of allocation rules. Under a flexible problem setup, we establish asymptotic strong consistency and perform a finite-time regret analysis for a sequential randomized allocation strategy based on kernel estimation. In addition, since many nonparametric and parametric methods in supervised learning may be applied to estimating the mean reward functions but guidance on how to choose among them is generally unavailable, we propose a model combining allocation strategy for adaptive performance. Simulations and a real data evaluation are conducted to illustrate the performance of the proposed allocation strategy
- …