2,217 research outputs found
A Survey of Monte Carlo Tree Search Methods
Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work
Price of Competition and Dueling Games
We study competition in a general framework introduced by Immorlica et al.
and answer their main open question. Immorlica et al. considered classic
optimization problems in terms of competition and introduced a general class of
games called dueling games. They model this competition as a zero-sum game,
where two players are competing for a user's satisfaction. In their main and
most natural game, the ranking duel, a user requests a webpage by submitting a
query and players output an ordering over all possible webpages based on the
submitted query. The user tends to choose the ordering which displays her
requested webpage in a higher rank. The goal of both players is to maximize the
probability that her ordering beats that of her opponent and gets the user's
attention. Immorlica et al. show this game directs both players to provide
suboptimal search results. However, they leave the following as their main open
question: "does competition between algorithms improve or degrade expected
performance?" In this paper, we resolve this question for the ranking duel and
a more general class of dueling games.
More precisely, we study the quality of orderings in a competition between
two players. This game is a zero-sum game, and thus any Nash equilibrium of the
game can be described by minimax strategies. Let the value of the user for an
ordering be a function of the position of her requested item in the
corresponding ordering, and the social welfare for an ordering be the expected
value of the corresponding ordering for the user. We propose the price of
competition which is the ratio of the social welfare for the worst minimax
strategy to the social welfare obtained by a social planner. We use this
criterion for analyzing the quality of orderings in the ranking duel. We prove
the quality of minimax results is surprisingly close to that of the optimum
solution
On Monte-Carlo tree search for deterministic games with alternate moves and complete information
We consider a deterministic game with alternate moves and complete
information, of which the issue is always the victory of one of the two
opponents. We assume that this game is the realization of a random model
enjoying some independence properties. We consider algorithms in the spirit of
Monte-Carlo Tree Search, to estimate at best the minimax value of a given
position: it consists in simulating, successively, well-chosen matches,
starting from this position. We build an algorithm, which is optimal, step by
step, in some sense: once the first matches are simulated, the algorithm
decides from the statistics furnished by the first matches (and the a
priori we have on the game) how to simulate the -th match in such a way
that the increase of information concerning the minimax value of the position
under study is maximal. This algorithm is remarkably quick. We prove that our
step by step optimal algorithm is not globally optimal and that it always
converges in a finite number of steps, even if the a priori we have on the game
is completely irrelevant. We finally test our algorithm, against MCTS, on
Pearl's game and, with a very simple and universal a priori, on the games
Connect Four and some variants. The numerical results are rather disappointing.
We however exhibit some situations in which our algorithm seems efficient
Analysis of Dialogical Argumentation via Finite State Machines
Dialogical argumentation is an important cognitive activity by which agents
exchange arguments and counterarguments as part of some process such as
discussion, debate, persuasion and negotiation. Whilst numerous formal systems
have been proposed, there is a lack of frameworks for implementing and
evaluating these proposals. First-order executable logic has been proposed as a
general framework for specifying and analysing dialogical argumentation. In
this paper, we investigate how we can implement systems for dialogical
argumentation using propositional executable logic. Our approach is to present
and evaluate an algorithm that generates a finite state machine that reflects a
propositional executable logic specification for a dialogical argumentation
together with an initial state. We also consider how the finite state machines
can be analysed, with the minimax strategy being used as an illustration of the
kinds of empirical analysis that can be undertaken.Comment: 10 page
Free Energy and the Generalized Optimality Equations for Sequential Decision Making
The free energy functional has recently been proposed as a variational
principle for bounded rational decision-making, since it instantiates a natural
trade-off between utility gains and information processing costs that can be
axiomatically derived. Here we apply the free energy principle to general
decision trees that include both adversarial and stochastic environments. We
derive generalized sequential optimality equations that not only include the
Bellman optimality equations as a limit case, but also lead to well-known
decision-rules such as Expectimax, Minimax and Expectiminimax. We show how
these decision-rules can be derived from a single free energy principle that
assigns a resource parameter to each node in the decision tree. These resource
parameters express a concrete computational cost that can be measured as the
amount of samples that are needed from the distribution that belongs to each
node. The free energy principle therefore provides the normative basis for
generalized optimality equations that account for both adversarial and
stochastic environments.Comment: 10 pages, 2 figure
- …