3 research outputs found

    Stable-Predictive Optimistic Counterfactual Regret Minimization

    Full text link
    The CFR framework has been a powerful tool for solving large-scale extensive-form games in practice. However, the theoretical rate at which past CFR-based algorithms converge to the Nash equilibrium is on the order of O(Tβˆ’1/2)O(T^{-1/2}), where TT is the number of iterations. In contrast, first-order methods can be used to achieve a O(Tβˆ’1)O(T^{-1}) dependence on iterations, yet these methods have been less successful in practice. In this work we present the first CFR variant that breaks the square-root dependence on iterations. By combining and extending recent advances on predictive and stable regret minimizers for the matrix-game setting we show that it is possible to leverage "optimistic" regret minimizers to achieve a O(Tβˆ’3/4)O(T^{-3/4}) convergence rate within CFR. This is achieved by introducing a new notion of stable-predictivity, and by setting the stability of each counterfactual regret minimizer relative to its location in the decision tree. Experiments show that this method is faster than the original CFR algorithm, although not as fast as newer variants, in spite of their worst-case O(Tβˆ’1/2)O(T^{-1/2}) dependence on iterations

    Combining No-regret and Q-learning

    Full text link
    Counterfactual Regret Minimization (CFR) has found success in settings like poker which have both terminal states and perfect recall. We seek to understand how to relax these requirements. As a first step, we introduce a simple algorithm, local no-regret learning (LONR), which uses a Q-learning-like update rule to allow learning without terminal states or perfect recall. We prove its convergence for the basic case of MDPs (and limited extensions of them) and present empirical results showing that it achieves last iterate convergence in a number of settings, most notably NoSDE games, a class of Markov games specifically designed to be challenging to learn where no prior algorithm is known to achieve convergence to a stationary equilibrium even on average.Comment: Presented as conference paper at AAMAS 202

    Optimistic Regret Minimization for Extensive-Form Games via Dilated Distance-Generating Functions

    Full text link
    We study the performance of optimistic regret-minimization algorithms for both minimizing regret in, and computing Nash equilibria of, zero-sum extensive-form games. In order to apply these algorithms to extensive-form games, a distance-generating function is needed. We study the use of the dilated entropy and dilated Euclidean distance functions. For the dilated Euclidean distance function we prove the first explicit bounds on the strong-convexity parameter for general treeplexes. Furthermore, we show that the use of dilated distance-generating functions enable us to decompose the mirror descent algorithm, and its optimistic variant, into local mirror descent algorithms at each information set. This decomposition mirrors the structure of the counterfactual regret minimization framework, and enables important techniques in practice, such as distributed updates and pruning of cold parts of the game tree. Our algorithms provably converge at a rate of Tβˆ’1T^{-1}, which is superior to prior counterfactual regret minimization algorithms. We experimentally compare to the popular algorithm CFR+, which has a theoretical convergence rate of Tβˆ’0.5T^{-0.5} in theory, but is known to often converge at a rate of Tβˆ’1T^{-1}, or better, in practice. We give an example matrix game where CFR+ experimentally converges at a relatively slow rate of Tβˆ’0.74T^{-0.74}, whereas our optimistic methods converge faster than Tβˆ’1T^{-1}. We go on to show that our fast rate also holds in the Kuhn poker game, which is an extensive-form game. For games with deeper game trees however, we find that CFR+ is still faster. Finally we show that when the goal is minimizing regret, rather than computing a Nash equilibrium, our optimistic methods can outperform CFR+, even in deep game trees.Comment: Extended NeurIPS 2019 pape
    corecore