    On minmax theorems for multiplayer games

    We prove a generalization of von Neumann's minmax theorem to the class of separable multiplayer zero-sum games, introduced in [Bregman and Fokin 1998]. These games are polymatrix---that is, graphical games in which every edge is a two-player game between its endpoints---in which every outcome has zero total sum of players' payoffs. Our generalization of the minmax theorem implies convexity of equilibria, polynomial-time tractability, and convergence of no-regret learning algorithms to Nash equilibria. Given that Nash equilibria in 3-player zero-sum games are already PPAD-complete, this class of games, i.e. with pairwise separable utility functions, defines essentially the broadest class of multi-player constant-sum games to which we can hope to push tractability results. Our result is obtained by establishing a certain game-class collapse, showing that separable constant-sum games are payoff equivalent to pairwise constant-sum polymatrix games---polymatrix games in which all edges are constant-sum games, and invoking a recent result of [Daskalakis, Papadimitriou 2009] for these games. We also explore generalizations to classes of non-constant-sum multi-player games. A natural candidate is polymatrix games with strictly competitive games on their edges. In the two player setting, such games are minmax solvable and recent work has shown that they are merely affine transformations of zero-sum games [Adler, Daskalakis, Papadimitriou 2009]. Surprisingly we show that a polymatrix game comprising of strictly competitive games on its edges is PPAD-complete to solve, proving a striking difference in the complexity of networks of zero-sum and strictly competitive games. Finally, we look at the role of coordination in networked interactions, studying the complexity of polymatrix games with a mixture of coordination and zero-sum games. We show that finding a pure Nash equilibrium in coordination-only polymatrix games is PLS-complete; hence, computing a mixed Nash equilibrium is in PLS ∩ PPAD, but it remains open whether the problem is in P. If, on the other hand, coordination and zero-sum games are combined, we show that the problem becomes PPAD-complete, establishing that coordination and zero-sum games achieve the full generality of PPAD.National Science Foundation (U.S.) (CAREER Award CCF-0953960)Alfred P. Sloan Foundation (Fellowship

    Smoothed Efficient Algorithms and Reductions for Network Coordination Games

    Worst-case hardness results for most equilibrium computation problems have raised the need for beyond-worst-case analysis. To this end, we study the smoothed complexity of finding pure Nash equilibria in Network Coordination Games, a PLS-complete problem in the worst case. This is a potential game where the sequential-better-response algorithm is known to converge to a pure NE, albeit in exponential time. First, we prove polynomial (resp. quasi-polynomial) smoothed complexity when the underlying game graph is a complete (resp. arbitrary) graph, and every player has constantly many strategies. We note that the complete graph case is reminiscent of perturbing all parameters, a common assumption in most known smoothed analysis results. Second, we define a notion of smoothness-preserving reduction among search problems, and obtain reductions from 22-strategy network coordination games to local-max-cut, and from kk-strategy games (with arbitrary kk) to local-max-cut up to two flips. The former together with the recent result of [BCC18] gives an alternate O(n8)O(n^8)-time smoothed algorithm for the 22-strategy case. This notion of reduction allows for the extension of smoothed efficient algorithms from one problem to another. For the first set of results, we develop techniques to bound the probability that an (adversarial) better-response sequence makes slow improvements on the potential. Our approach combines and generalizes the local-max-cut approaches of [ER14,ABPW17] to handle the multi-strategy case: it requires a careful definition of the matrix which captures the increase in potential, a tighter union bound on adversarial sequences, and balancing it with good enough rank bounds. We believe that the approach and notions developed herein could be of interest in addressing the smoothed complexity of other potential and/or congestion games

    Arena: A General Evaluation Platform and Building Toolkit for Multi-Agent Intelligence

    Learning agents that are not only capable of taking tests, but also innovating is becoming a hot topic in AI. One of the most promising paths towards this vision is multi-agent learning, where agents act as the environment for each other, and improving each agent means proposing new problems for others. However, existing evaluation platforms are either not compatible with multi-agent settings, or limited to a specific game. That is, there is not yet a general evaluation platform for research on multi-agent intelligence. To this end, we introduce Arena, a general evaluation platform for multi-agent intelligence with 35 games of diverse logics and representations. Furthermore, multi-agent intelligence is still at the stage where many problems remain unexplored. Therefore, we provide a building toolkit for researchers to easily invent and build novel multi-agent problems from the provided game set based on a GUI-configurable social tree and five basic multi-agent reward schemes. Finally, we provide Python implementations of five state-of-the-art deep multi-agent reinforcement learning baselines. Along with the baseline implementations, we release a set of 100 best agents/teams that we can train with different training schemes for each game, as the base for evaluating agents with population performance. As such, the research community can perform comparisons under a stable and uniform standard. All the implementations and accompanied tutorials have been open-sourced for the community at https://sites.google.com/view/arena-unity/

    Approximating Nash Equilibria in Tree Polymatrix Games

    We develop a quasi-polynomial time Las Vegas algorithm for approximating Nash equilibria in polymatrix games over trees, under a mild renormalizing assumption. Our result, in particular, leads to an expected polynomial-time algorithm for computing approximate Nash equilibria of tree polymatrix games in which the number of actions per player is a fixed constant. Further, for trees with constant degree, the running time of the algorithm matches the best known upper bound for approximating Nash equilibria in bimatrix games (Lipton, Markakis, and Mehta 2003). Notably, this work closely complements the hardness result of Rubinstein (2015), which establishes the inapproximability of Nash equilibria in polymatrix games over constant-degree bipartite graphs with two actions per player

    Computing Approximate Nash Equilibria in Polymatrix Games

    In an ϵ\epsilon-Nash equilibrium, a player can gain at most ϵ\epsilon by unilaterally changing his behaviour. For two-player (bimatrix) games with payoffs in [0,1][0,1], the best-knownϵ\epsilon achievable in polynomial time is 0.3393. In general, for nn-player games an ϵ\epsilon-Nash equilibrium can be computed in polynomial time for an ϵ\epsilon that is an increasing function of nn but does not depend on the number of strategies of the players. For three-player and four-player games the corresponding values of ϵ\epsilon are 0.6022 and 0.7153, respectively. Polymatrix games are a restriction of general nn-player games where a player's payoff is the sum of payoffs from a number of bimatrix games. There exists a very small but constant ϵ\epsilon such that computing an ϵ\epsilon-Nash equilibrium of a polymatrix game is \PPAD-hard. Our main result is that a (0.5+δ)(0.5+\delta)-Nash equilibrium of an nn-player polymatrix game can be computed in time polynomial in the input size and 1δ\frac{1}{\delta}. Inspired by the algorithm of Tsaknakis and Spirakis, our algorithm uses gradient descent on the maximum regret of the players. We also show that this algorithm can be applied to efficiently find a (0.5+δ)(0.5+\delta)-Nash equilibrium in a two-player Bayesian game

    Cycles in adversarial regularized learning

    Regularized learning is a fundamental technique in online optimization, machine learning and many other fields of computer science. A natural question that arises in these settings is how regularized learning algorithms behave when faced against each other. We study a natural formulation of this problem by coupling regularized learning dynamics in zero-sum games. We show that the system's behavior is Poincar\'e recurrent, implying that almost every trajectory revisits any (arbitrarily small) neighborhood of its starting point infinitely often. This cycling behavior is robust to the agents' choice of regularization mechanism (each agent could be using a different regularizer), to positive-affine transformations of the agents' utilities, and it also persists in the case of networked competition, i.e., for zero-sum polymatrix games.Comment: 22 pages, 4 figure