13,530 research outputs found
Understanding reciprocity
This paper surveys the evolutionary game theoretic literature on reciprocity in human interactions, dealing both with long-term relationships and with sporadic interactions. Four basic themes, repetition, commitment, assortation, and parochialism, appear repeatedly throughout the literature. Repetition can give rise to the evolution of behavior that exhibits reciprocity-like features but a vast array of other behaviors are also stable. In sporadic interactions, reciprocity can be stable if the propensity to punish selfish actions can induce opportunists to cooperate, if reciprocators themselves behave opportunistically when they expect others to do so, or if matching is sufficiently assortative.Reciprocity, Evolution, Assortation, Commitment, Parochialism
A Survey of Monte Carlo Tree Search Methods
Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work
The Update Equivalence Framework for Decision-Time Planning
The process of revising (or constructing) a policy immediately prior to
execution -- known as decision-time planning -- is key to achieving superhuman
performance in perfect-information settings like chess and Go. A recent line of
work has extended decision-time planning to more general imperfect-information
settings, leading to superhuman performance in poker. However, these methods
requires considering subgames whose sizes grow quickly in the amount of
non-public information, making them unhelpful when the amount of non-public
information is large. Motivated by this issue, we introduce an alternative
framework for decision-time planning that is not based on subgames but rather
on the notion of update equivalence. In this framework, decision-time planning
algorithms simulate updates of synchronous learning algorithms. This framework
enables us to introduce a new family of principled decision-time planning
algorithms that do not rely on public information, opening the door to sound
and effective decision-time planning in settings with large amounts of
non-public information. In experiments, members of this family produce
comparable or superior results compared to state-of-the-art approaches in
Hanabi and improve performance in 3x3 Abrupt Dark Hex and Phantom Tic-Tac-Toe
Abstracting Imperfect Information Away from Two-Player Zero-Sum Games
In their seminal work, Nayyar et al. (2013) showed that imperfect information
can be abstracted away from common-payoff games by having players publicly
announce their policies as they play. This insight underpins sound solvers and
decision-time planning algorithms for common-payoff games. Unfortunately, a
naive application of the same insight to two-player zero-sum games fails
because Nash equilibria of the game with public policy announcements may not
correspond to Nash equilibria of the original game. As a consequence, existing
sound decision-time planning algorithms require complicated additional
mechanisms that have unappealing properties. The main contribution of this work
is showing that certain regularized equilibria do not possess the
aforementioned non-correspondence problem -- thus, computing them can be
treated as perfect information problems. Because these regularized equilibria
can be made arbitrarily close to Nash equilibria, our result opens the door to
a new perspective on solving two-player zero-sum games and, in particular,
yields a simplified framework for decision-time planning in two-player zero-sum
games, void of the unappealing properties that plague existing decision-time
planning approaches
- …