Search CORE

2 research outputs found

Last Round Convergence and No-Instant Regret in Repeated Games with Asymmetric Information

Author: Dinh Le Cong
Nguyen Tri-Dung
Tran-Thanh Long
Zemkoho Alain B.
Publication venue
Publication date: 25/03/2020
Field of study

This paper considers repeated games in which one player has more information about the game than the other players. In particular, we investigate repeated two-player zero-sum games where only the column player knows the payoff matrix A of the game. Suppose that while repeatedly playing this game, the row player chooses her strategy at each round by using a no-regret algorithm to minimize her (pseudo) regret. We develop a no-instant-regret algorithm for the column player to exhibit last round convergence to a minimax equilibrium. We show that our algorithm is efficient against a large set of popular no-regret algorithms of the row player, including the multiplicative weight update algorithm, the online mirror descent method/follow-the-regularized-leader, the linear multiplicative weight update algorithm, and the optimistic multiplicative weight update

arXiv.org e-Print Archive

Online Double Oracle

Author: Ammar Haitham Bou
Dinh Le Cong
Mguni David Henry
Nieves Nicolas Perez
Slumbers Oliver
Tian Zheng
Wang Jun
Yang Yaodong
Publication venue
Publication date: 04/06/2021
Field of study

Solving strategic games with huge action space is a critical yet under-explored topic in economics, operations research and artificial intelligence. This paper proposes new learning algorithms for solving two-player zero-sum normal-form games where the number of pure strategies is prohibitively large. Specifically, we combine no-regret analysis from online learning with Double Oracle (DO) methods from game theory. Our method -- \emph{Online Double Oracle (ODO)} -- is provably convergent to a Nash equilibrium (NE). Most importantly, unlike normal DO methods, ODO is \emph{rationale} in the sense that each agent in ODO can exploit strategic adversary with a regret bound of

\mathcal{O}(\sqrt{T k \log(k)})

where

k

is not the total number of pure strategies, but rather the size of \emph{effective strategy set} that is linearly dependent on the support size of the NE. On tens of different real-world games, ODO outperforms DO, PSRO methods, and no-regret algorithms such as Multiplicative Weight Update by a significant margin, both in terms of convergence rate to a NE and average payoff against strategic adversaries.Comment: [email protected]

arXiv.org e-Print Archive