19,582 research outputs found
Scaling up Mean Field Games with Online Mirror Descent
We address scaling up equilibrium computation in Mean Field Games (MFGs)
using Online Mirror Descent (OMD). We show that continuous-time OMD provably
converges to a Nash equilibrium under a natural and well-motivated set of
monotonicity assumptions. This theoretical result nicely extends to
multi-population games and to settings involving common noise. A thorough
experimental investigation on various single and multi-population MFGs shows
that OMD outperforms traditional algorithms such as Fictitious Play (FP). We
empirically show that OMD scales up and converges significantly faster than FP
by solving, for the first time to our knowledge, examples of MFGs with hundreds
of billions states. This study establishes the state-of-the-art for learning in
large-scale multi-agent and multi-population games
Equilibrium tracking and convergence in dynamic games
International audienceIn this paper, we examine the equilibrium tracking and convergence properties of no-regret learning algorithms in continuous games that evolve over time. Specifically, we focus on learning via "mirror descent", a widely used class of noregret learning schemes where players take small steps along their individual payoff gradients and then "mirror" the output back to their action sets. In this general context, we show that the induced sequence of play stays asymptotically close to the evolving equilibrium of the sequence of stage games (assuming they are strongly monotone), and converges to it if the game stabilizes to a strictly monotone limit. Our results apply to both gradient-and payoff-based feedback, i.e., the "bandit" case where players only observe the payoffs of their chosen actions
Stochastic mirror descent dynamics and their convergence in monotone variational inequalities
We examine a class of stochastic mirror descent dynamics in the context of
monotone variational inequalities (including Nash equilibrium and saddle-point
problems). The dynamics under study are formulated as a stochastic differential
equation driven by a (single-valued) monotone operator and perturbed by a
Brownian motion. The system's controllable parameters are two variable weight
sequences that respectively pre- and post-multiply the driver of the process.
By carefully tuning these parameters, we obtain global convergence in the
ergodic sense, and we estimate the average rate of convergence of the process.
We also establish a large deviations principle showing that individual
trajectories exhibit exponential concentration around this average.Comment: 23 pages; updated proofs in Section 3 and Section
- …