Search CORE

3,815 research outputs found

Matrix Multiplicative Weights Updates in Quantum Zero-Sum Games: Conservation Laws & Recurrence

Author: Jain Rahul
Piliouras Georgios
Sim Ryann
Publication venue
Publication date: 26/04/2023
Field of study

Recent advances in quantum computing and in particular, the introduction of quantum GANs, have led to increased interest in quantum zero-sum game theory, extending the scope of learning algorithms for classical games into the quantum realm. In this paper, we focus on learning in quantum zero-sum games under Matrix Multiplicative Weights Update (a generalization of the multiplicative weights update method) and its continuous analogue, Quantum Replicator Dynamics. When each player selects their state according to quantum replicator dynamics, we show that the system exhibits conservation laws in a quantum-information theoretic sense. Moreover, we show that the system exhibits Poincare recurrence, meaning that almost all orbits return arbitrarily close to their initial conditions infinitely often. Our analysis generalizes previous results in the case of classical games.Comment: NeurIPS 202

arXiv.org e-Print Archive

Consensus Multiplicative Weights Update: Learning to Learn using Projector-based Game Signatures

Author: Ganesh Sumitra
Savani Rahul
Spooner Thomas
Vadori Nelson
Publication venue
Publication date: 14/10/2021
Field of study

Cheung and Piliouras (2020) recently showed that two variants of the Multiplicative Weights Update method - OMWU and MWU - display opposite convergence properties depending on whether the game is zero-sum or cooperative. Inspired by this work and the recent literature on learning to optimize for single functions, we introduce a new framework for learning last-iterate convergence to Nash Equilibria in games, where the update rule's coefficients (learning rates) along a trajectory are learnt by a reinforcement learning policy that is conditioned on the nature of the game: \textit{the game signature}. We construct the latter using a new decomposition of two-player games into eight components corresponding to commutative projection operators, generalizing and unifying recent game concepts studied in the literature. We compare the performance of various update rules when their coefficients are learnt, and show that the RL policy is able to exploit the game signature across a wide range of game types. In doing so, we introduce CMWU, a new algorithm that extends consensus optimization to the constrained case, has local convergence guarantees for zero-sum bimatrix games, and show that it enjoys competitive performance on both zero-sum games with constant coefficients and across a spectrum of games when its coefficients are learnt

arXiv.org e-Print Archive

University of Liverpool Repository

The Replicator Dynamic, Chain Components and the Response Graph

Author: Biggar Oliver
Shames Iman
Publication venue
Publication date: 30/09/2022
Field of study

In this paper we examine the relationship between the flow of the replicator dynamic, the continuum limit of Multiplicative Weights Update, and a game's response graph. We settle an open problem establishing that under the replicator, sink chain components -- a topological notion of long-run outcome of a dynamical system -- always exist and are approximated by the sink connected components of the game's response graph. More specifically, each sink chain component contains a sink connected component of the response graph, as well as all mixed strategy profiles whose support consists of pure profiles in the same connected component, a set we call the content of the connected component. As a corollary, all profiles are chain recurrent in games with strongly connected response graphs. In any two-player game sharing a response graph with a zero-sum game, the sink chain component is unique. In two-player zero-sum and potential games the sink chain components and sink connected components are in a one-to-one correspondence, and we conjecture that this holds in all games.Comment: 24 pages, 2 figure

arXiv.org e-Print Archive

Achieving Better Regret against Strategic Adversaries

Author: Dinh Le Cong
Nguyen Tri-Dung
Tran-Thanh Long
Zemkoho Alain
Publication venue
Publication date: 13/02/2023
Field of study

We study online learning problems in which the learner has extra knowledge about the adversary's behaviour, i.e., in game-theoretic settings where opponents typically follow some no-external regret learning algorithms. Under this assumption, we propose two new online learning algorithms, Accurate Follow the Regularized Leader (AFTRL) and Prod-Best Response (Prod-BR), that intensively exploit this extra knowledge while maintaining the no-regret property in the worst-case scenario of having inaccurate extra information. Specifically, AFTRL achieves

O(1)

external regret or

O(1)

\emph{forward regret} against no-external regret adversary in comparison with

O(\sqrt{T})

\emph{dynamic regret} of Prod-BR. To the best of our knowledge, our algorithm is the first to consider forward regret that achieves

O(1)

regret against strategic adversaries. When playing zero-sum games with Accurate Multiplicative Weights Update (AMWU), a special case of AFTRL, we achieve \emph{last round convergence} to the Nash Equilibrium. We also provide numerical experiments to further support our theoretical results. In particular, we demonstrate that our methods achieve significantly better regret bounds and rate of last round convergence, compared to the state of the art (e.g., Multiplicative Weights Update (MWU) and its optimistic counterpart, OMWU)

arXiv.org e-Print Archive

On Learning Algorithms for Nash Equilibria

Author: Daskalakis Constantinos
Frongillo Rafael
Papadimitriou Christos H.
Pierrakos George
Valiant Gregory
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Third International Symposium, SAGT 2010, Athens, Greece, October 18-20, 2010. ProceedingsCan learning algorithms find a Nash equilibrium? This is a natural question for several reasons. Learning algorithms resemble the behavior of players in many naturally arising games, and thus results on the convergence or non-convergence properties of such dynamics may inform our understanding of the applicability of Nash equilibria as a plausible solution concept in some settings. A second reason for asking this question is in the hope of being able to prove an impossibility result, not dependent on complexity assumptions, for computing Nash equilibria via a restricted class of reasonable algorithms. In this work, we begin to answer this question by considering the dynamics of the standard multiplicative weights update learning algorithms (which are known to converge to a Nash equilibrium for zero-sum games). We revisit a 3×3 game defined by Shapley [10] in the 1950s in order to establish that fictitious play does not converge in general games. For this simple game, we show via a potential function argument that in a variety of settings the multiplicative updates algorithm impressively fails to find the unique Nash equilibrium, in that the cumulative distributions of players produced by learning dynamics actually drift away from the equilibrium

CiteSeerX

DSpace@MIT

Crossref

The Computational Power of Optimization in Online Learning

Author: Agarwal A.
Agarwal A.
Dani V.
Dud´ık M.
Gofer E.
Hazan E.
Kakade S.
McMahan H. B.
Shalev-Shwartz S.
Zinkevich M.
Publication venue
Publication date: 27/01/2016
Field of study

We consider the fundamental problem of prediction with expert advice where the experts are "optimizable": there is a black-box optimization oracle that can be used to compute, in constant time, the leading expert in retrospect at any point in time. In this setting, we give a novel online algorithm that attains vanishing regret with respect to

N

experts in total

\widetilde{O}(\sqrt{N})

computation time. We also give a lower bound showing that this running time cannot be improved (up to log factors) in the oracle model, thereby exhibiting a quadratic speedup as compared to the standard, oracle-free setting where the required time for vanishing regret is

\widetilde{\Theta}(N)

. These results demonstrate an exponential gap between the power of optimization in online learning and its power in statistical learning: in the latter, an optimization oracle---i.e., an efficient empirical risk minimizer---allows to learn a finite hypothesis class of size

N

in time

O(\log{N})

. We also study the implications of our results to learning in repeated zero-sum games, in a setting where the players have access to oracles that compute, in constant time, their best-response to any mixed strategy of their opponent. We show that the runtime required for approximating the minimax value of the game in this setting is

\widetilde{\Theta}(\sqrt{N})

, yielding again a quadratic improvement upon the oracle-free setting, where

\widetilde{\Theta}(N)

is known to be tight

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref