97,400 research outputs found

    Deep Reinforcement Learning from Self-Play in Imperfect-Information Games

    Get PDF
    Many real-world applications can be described as large-scale games of imperfect information. To deal with these challenging domains, prior work has focused on computing Nash equilibria in a handcrafted abstraction of the domain. In this paper we introduce the first scalable end-to-end approach to learning approximate Nash equilibria without prior domain knowledge. Our method combines fictitious self-play with deep reinforcement learning. When applied to Leduc poker, Neural Fictitious Self-Play (NFSP) approached a Nash equilibrium, whereas common reinforcement learning methods diverged. In Limit Texas Holdem, a poker game of real-world scale, NFSP learnt a strategy that approached the performance of state-of-the-art, superhuman algorithms based on significant domain expertise.Comment: updated version, incorporating conference feedbac

    On the Convergence Time of the Best Response Dynamics in Player-specific Congestion Games

    Full text link
    We study the convergence time of the best response dynamics in player-specific singleton congestion games. It is well known that this dynamics can cycle, although from every state a short sequence of best responses to a Nash equilibrium exists. Thus, the random best response dynamics, which selects the next player to play a best response uniformly at random, terminates in a Nash equilibrium with probability one. In this paper, we are interested in the expected number of best responses until the random best response dynamics terminates. As a first step towards this goal, we consider games in which each player can choose between only two resources. These games have a natural representation as (multi-)graphs by identifying nodes with resources and edges with players. For the class of games that can be represented as trees, we show that the best-response dynamics cannot cycle and that it terminates after O(n^2) steps where n denotes the number of resources. For the class of games represented as cycles, we show that the best response dynamics can cycle. However, we also show that the random best response dynamics terminates after O(n^2) steps in expectation. Additionally, we conjecture that in general player-specific singleton congestion games there exists no polynomial upper bound on the expected number of steps until the random best response dynamics terminates. We support our conjecture by presenting a family of games for which simulations indicate a super-polynomial convergence time

    Selfish Network Creation with Non-Uniform Edge Cost

    Full text link
    Network creation games investigate complex networks from a game-theoretic point of view. Based on the original model by Fabrikant et al. [PODC'03] many variants have been introduced. However, almost all versions have the drawback that edges are treated uniformly, i.e. every edge has the same cost and that this common parameter heavily influences the outcomes and the analysis of these games. We propose and analyze simple and natural parameter-free network creation games with non-uniform edge cost. Our models are inspired by social networks where the cost of forming a link is proportional to the popularity of the targeted node. Besides results on the complexity of computing a best response and on various properties of the sequential versions, we show that the most general version of our model has constant Price of Anarchy. To the best of our knowledge, this is the first proof of a constant Price of Anarchy for any network creation game.Comment: To appear at SAGT'1

    Bayesian Quadratic Network Game Filters

    Full text link
    A repeated network game where agents have quadratic utilities that depend on information externalities -- an unknown underlying state -- as well as payoff externalities -- the actions of all other agents in the network -- is considered. Agents play Bayesian Nash Equilibrium strategies with respect to their beliefs on the state of the world and the actions of all other nodes in the network. These beliefs are refined over subsequent stages based on the observed actions of neighboring peers. This paper introduces the Quadratic Network Game (QNG) filter that agents can run locally to update their beliefs, select corresponding optimal actions, and eventually learn a sufficient statistic of the network's state. The QNG filter is demonstrated on a Cournot market competition game and a coordination game to implement navigation of an autonomous team
    • …
    corecore