3 research outputs found
Stable-Predictive Optimistic Counterfactual Regret Minimization
The CFR framework has been a powerful tool for solving large-scale
extensive-form games in practice. However, the theoretical rate at which past
CFR-based algorithms converge to the Nash equilibrium is on the order of
, where is the number of iterations. In contrast, first-order
methods can be used to achieve a dependence on iterations, yet
these methods have been less successful in practice. In this work we present
the first CFR variant that breaks the square-root dependence on iterations. By
combining and extending recent advances on predictive and stable regret
minimizers for the matrix-game setting we show that it is possible to leverage
"optimistic" regret minimizers to achieve a convergence rate
within CFR. This is achieved by introducing a new notion of
stable-predictivity, and by setting the stability of each counterfactual regret
minimizer relative to its location in the decision tree. Experiments show that
this method is faster than the original CFR algorithm, although not as fast as
newer variants, in spite of their worst-case dependence on
iterations
Combining No-regret and Q-learning
Counterfactual Regret Minimization (CFR) has found success in settings like
poker which have both terminal states and perfect recall. We seek to understand
how to relax these requirements. As a first step, we introduce a simple
algorithm, local no-regret learning (LONR), which uses a Q-learning-like update
rule to allow learning without terminal states or perfect recall. We prove its
convergence for the basic case of MDPs (and limited extensions of them) and
present empirical results showing that it achieves last iterate convergence in
a number of settings, most notably NoSDE games, a class of Markov games
specifically designed to be challenging to learn where no prior algorithm is
known to achieve convergence to a stationary equilibrium even on average.Comment: Presented as conference paper at AAMAS 202
Optimistic Regret Minimization for Extensive-Form Games via Dilated Distance-Generating Functions
We study the performance of optimistic regret-minimization algorithms for
both minimizing regret in, and computing Nash equilibria of, zero-sum
extensive-form games. In order to apply these algorithms to extensive-form
games, a distance-generating function is needed. We study the use of the
dilated entropy and dilated Euclidean distance functions. For the dilated
Euclidean distance function we prove the first explicit bounds on the
strong-convexity parameter for general treeplexes. Furthermore, we show that
the use of dilated distance-generating functions enable us to decompose the
mirror descent algorithm, and its optimistic variant, into local mirror descent
algorithms at each information set. This decomposition mirrors the structure of
the counterfactual regret minimization framework, and enables important
techniques in practice, such as distributed updates and pruning of cold parts
of the game tree. Our algorithms provably converge at a rate of , which
is superior to prior counterfactual regret minimization algorithms. We
experimentally compare to the popular algorithm CFR+, which has a theoretical
convergence rate of in theory, but is known to often converge at a
rate of , or better, in practice. We give an example matrix game where
CFR+ experimentally converges at a relatively slow rate of , whereas
our optimistic methods converge faster than . We go on to show that our
fast rate also holds in the Kuhn poker game, which is an extensive-form game.
For games with deeper game trees however, we find that CFR+ is still faster.
Finally we show that when the goal is minimizing regret, rather than computing
a Nash equilibrium, our optimistic methods can outperform CFR+, even in deep
game trees.Comment: Extended NeurIPS 2019 pape