3,815 research outputs found
Matrix Multiplicative Weights Updates in Quantum Zero-Sum Games: Conservation Laws & Recurrence
Recent advances in quantum computing and in particular, the introduction of
quantum GANs, have led to increased interest in quantum zero-sum game theory,
extending the scope of learning algorithms for classical games into the quantum
realm. In this paper, we focus on learning in quantum zero-sum games under
Matrix Multiplicative Weights Update (a generalization of the multiplicative
weights update method) and its continuous analogue, Quantum Replicator
Dynamics. When each player selects their state according to quantum replicator
dynamics, we show that the system exhibits conservation laws in a
quantum-information theoretic sense. Moreover, we show that the system exhibits
Poincare recurrence, meaning that almost all orbits return arbitrarily close to
their initial conditions infinitely often. Our analysis generalizes previous
results in the case of classical games.Comment: NeurIPS 202
Consensus Multiplicative Weights Update: Learning to Learn using Projector-based Game Signatures
Cheung and Piliouras (2020) recently showed that two variants of the
Multiplicative Weights Update method - OMWU and MWU - display opposite
convergence properties depending on whether the game is zero-sum or
cooperative. Inspired by this work and the recent literature on learning to
optimize for single functions, we introduce a new framework for learning
last-iterate convergence to Nash Equilibria in games, where the update rule's
coefficients (learning rates) along a trajectory are learnt by a reinforcement
learning policy that is conditioned on the nature of the game: \textit{the game
signature}. We construct the latter using a new decomposition of two-player
games into eight components corresponding to commutative projection operators,
generalizing and unifying recent game concepts studied in the literature. We
compare the performance of various update rules when their coefficients are
learnt, and show that the RL policy is able to exploit the game signature
across a wide range of game types. In doing so, we introduce CMWU, a new
algorithm that extends consensus optimization to the constrained case, has
local convergence guarantees for zero-sum bimatrix games, and show that it
enjoys competitive performance on both zero-sum games with constant
coefficients and across a spectrum of games when its coefficients are learnt
The Replicator Dynamic, Chain Components and the Response Graph
In this paper we examine the relationship between the flow of the replicator
dynamic, the continuum limit of Multiplicative Weights Update, and a game's
response graph. We settle an open problem establishing that under the
replicator, sink chain components -- a topological notion of long-run outcome
of a dynamical system -- always exist and are approximated by the sink
connected components of the game's response graph. More specifically, each sink
chain component contains a sink connected component of the response graph, as
well as all mixed strategy profiles whose support consists of pure profiles in
the same connected component, a set we call the content of the connected
component. As a corollary, all profiles are chain recurrent in games with
strongly connected response graphs. In any two-player game sharing a response
graph with a zero-sum game, the sink chain component is unique. In two-player
zero-sum and potential games the sink chain components and sink connected
components are in a one-to-one correspondence, and we conjecture that this
holds in all games.Comment: 24 pages, 2 figure
Achieving Better Regret against Strategic Adversaries
We study online learning problems in which the learner has extra knowledge
about the adversary's behaviour, i.e., in game-theoretic settings where
opponents typically follow some no-external regret learning algorithms. Under
this assumption, we propose two new online learning algorithms, Accurate Follow
the Regularized Leader (AFTRL) and Prod-Best Response (Prod-BR), that
intensively exploit this extra knowledge while maintaining the no-regret
property in the worst-case scenario of having inaccurate extra information.
Specifically, AFTRL achieves external regret or \emph{forward
regret} against no-external regret adversary in comparison with
\emph{dynamic regret} of Prod-BR. To the best of our knowledge, our algorithm
is the first to consider forward regret that achieves regret against
strategic adversaries. When playing zero-sum games with Accurate Multiplicative
Weights Update (AMWU), a special case of AFTRL, we achieve \emph{last round
convergence} to the Nash Equilibrium. We also provide numerical experiments to
further support our theoretical results. In particular, we demonstrate that our
methods achieve significantly better regret bounds and rate of last round
convergence, compared to the state of the art (e.g., Multiplicative Weights
Update (MWU) and its optimistic counterpart, OMWU)
On Learning Algorithms for Nash Equilibria
Third International Symposium, SAGT 2010, Athens, Greece, October 18-20, 2010. ProceedingsCan learning algorithms find a Nash equilibrium? This is a natural question for several reasons. Learning algorithms resemble the behavior of players in many naturally arising games, and thus results on the convergence or non-convergence properties of such dynamics may inform our understanding of the applicability of Nash equilibria as a plausible solution concept in some settings. A second reason for asking this question is in the hope of being able to prove an impossibility result, not dependent on complexity assumptions, for computing Nash equilibria via a restricted class of reasonable algorithms. In this work, we begin to answer this question by considering the dynamics of the standard multiplicative weights update learning algorithms (which are known to converge to a Nash equilibrium for zero-sum games). We revisit a 3×3 game defined by Shapley [10] in the 1950s in order to establish that fictitious play does not converge in general games. For this simple game, we show via a potential function argument that in a variety of settings the multiplicative updates algorithm impressively fails to find the unique Nash equilibrium, in that the cumulative distributions of players produced by learning dynamics actually drift away from the equilibrium
The Computational Power of Optimization in Online Learning
We consider the fundamental problem of prediction with expert advice where
the experts are "optimizable": there is a black-box optimization oracle that
can be used to compute, in constant time, the leading expert in retrospect at
any point in time. In this setting, we give a novel online algorithm that
attains vanishing regret with respect to experts in total
computation time. We also give a lower bound showing
that this running time cannot be improved (up to log factors) in the oracle
model, thereby exhibiting a quadratic speedup as compared to the standard,
oracle-free setting where the required time for vanishing regret is
. These results demonstrate an exponential gap between
the power of optimization in online learning and its power in statistical
learning: in the latter, an optimization oracle---i.e., an efficient empirical
risk minimizer---allows to learn a finite hypothesis class of size in time
. We also study the implications of our results to learning in
repeated zero-sum games, in a setting where the players have access to oracles
that compute, in constant time, their best-response to any mixed strategy of
their opponent. We show that the runtime required for approximating the minimax
value of the game in this setting is , yielding
again a quadratic improvement upon the oracle-free setting, where
is known to be tight
- …