11,292 research outputs found
On Robustness Properties in Empirical Centroid Fictitious Play
Empirical Centroid Fictitious Play (ECFP) is a generalization of the
well-known Fictitious Play (FP) algorithm designed for implementation in
large-scale games. In ECFP, the set of players is subdivided into equivalence
classes with players in the same class possessing similar properties. Players
choose a next-stage action by tracking and responding to aggregate statistics
related to each equivalence class. This setup alleviates the difficult task of
tracking and responding to the statistical behavior of every individual player,
as is the case in traditional FP. Aside from ECFP, many useful modifications
have been proposed to classical FP, e.g., rules allowing for network-based
implementation, increased computational efficiency, and stronger forms of
learning. Such modifications tend to be of great practical value; however,
their effectiveness relies heavily on two fundamental properties of FP:
robustness to alterations in the empirical distribution step size process, and
robustness to best-response perturbations. The main contribution of the paper
is to show that similar robustness properties also hold for the ECFP algorithm.
This result serves as a first step in enabling practical modifications to ECFP,
similar to those already developed for FP.Comment: Submitted for publication. Initial Submission: Mar. 201
Deep Reinforcement Learning from Self-Play in Imperfect-Information Games
Many real-world applications can be described as large-scale games of
imperfect information. To deal with these challenging domains, prior work has
focused on computing Nash equilibria in a handcrafted abstraction of the
domain. In this paper we introduce the first scalable end-to-end approach to
learning approximate Nash equilibria without prior domain knowledge. Our method
combines fictitious self-play with deep reinforcement learning. When applied to
Leduc poker, Neural Fictitious Self-Play (NFSP) approached a Nash equilibrium,
whereas common reinforcement learning methods diverged. In Limit Texas Holdem,
a poker game of real-world scale, NFSP learnt a strategy that approached the
performance of state-of-the-art, superhuman algorithms based on significant
domain expertise.Comment: updated version, incorporating conference feedbac
A Unified View of Large-scale Zero-sum Equilibrium Computation
The task of computing approximate Nash equilibria in large zero-sum
extensive-form games has received a tremendous amount of attention due mainly
to the Annual Computer Poker Competition. Immediately after its inception, two
competing and seemingly different approaches emerged---one an application of
no-regret online learning, the other a sophisticated gradient method applied to
a convex-concave saddle-point formulation. Since then, both approaches have
grown in relative isolation with advancements on one side not effecting the
other. In this paper, we rectify this by dissecting and, in a sense, unify the
two views.Comment: AAAI Workshop on Computer Poker and Imperfect Informatio
- âŠ