97,400 research outputs found
Deep Reinforcement Learning from Self-Play in Imperfect-Information Games
Many real-world applications can be described as large-scale games of
imperfect information. To deal with these challenging domains, prior work has
focused on computing Nash equilibria in a handcrafted abstraction of the
domain. In this paper we introduce the first scalable end-to-end approach to
learning approximate Nash equilibria without prior domain knowledge. Our method
combines fictitious self-play with deep reinforcement learning. When applied to
Leduc poker, Neural Fictitious Self-Play (NFSP) approached a Nash equilibrium,
whereas common reinforcement learning methods diverged. In Limit Texas Holdem,
a poker game of real-world scale, NFSP learnt a strategy that approached the
performance of state-of-the-art, superhuman algorithms based on significant
domain expertise.Comment: updated version, incorporating conference feedbac
On the Convergence Time of the Best Response Dynamics in Player-specific Congestion Games
We study the convergence time of the best response dynamics in
player-specific singleton congestion games. It is well known that this dynamics
can cycle, although from every state a short sequence of best responses to a
Nash equilibrium exists. Thus, the random best response dynamics, which selects
the next player to play a best response uniformly at random, terminates in a
Nash equilibrium with probability one. In this paper, we are interested in the
expected number of best responses until the random best response dynamics
terminates.
As a first step towards this goal, we consider games in which each player can
choose between only two resources. These games have a natural representation as
(multi-)graphs by identifying nodes with resources and edges with players. For
the class of games that can be represented as trees, we show that the
best-response dynamics cannot cycle and that it terminates after O(n^2) steps
where n denotes the number of resources. For the class of games represented as
cycles, we show that the best response dynamics can cycle. However, we also
show that the random best response dynamics terminates after O(n^2) steps in
expectation.
Additionally, we conjecture that in general player-specific singleton
congestion games there exists no polynomial upper bound on the expected number
of steps until the random best response dynamics terminates. We support our
conjecture by presenting a family of games for which simulations indicate a
super-polynomial convergence time
Selfish Network Creation with Non-Uniform Edge Cost
Network creation games investigate complex networks from a game-theoretic
point of view. Based on the original model by Fabrikant et al. [PODC'03] many
variants have been introduced. However, almost all versions have the drawback
that edges are treated uniformly, i.e. every edge has the same cost and that
this common parameter heavily influences the outcomes and the analysis of these
games.
We propose and analyze simple and natural parameter-free network creation
games with non-uniform edge cost. Our models are inspired by social networks
where the cost of forming a link is proportional to the popularity of the
targeted node. Besides results on the complexity of computing a best response
and on various properties of the sequential versions, we show that the most
general version of our model has constant Price of Anarchy. To the best of our
knowledge, this is the first proof of a constant Price of Anarchy for any
network creation game.Comment: To appear at SAGT'1
Bayesian Quadratic Network Game Filters
A repeated network game where agents have quadratic utilities that depend on
information externalities -- an unknown underlying state -- as well as payoff
externalities -- the actions of all other agents in the network -- is
considered. Agents play Bayesian Nash Equilibrium strategies with respect to
their beliefs on the state of the world and the actions of all other nodes in
the network. These beliefs are refined over subsequent stages based on the
observed actions of neighboring peers. This paper introduces the Quadratic
Network Game (QNG) filter that agents can run locally to update their beliefs,
select corresponding optimal actions, and eventually learn a sufficient
statistic of the network's state. The QNG filter is demonstrated on a Cournot
market competition game and a coordination game to implement navigation of an
autonomous team
- …