Search CORE

4 research outputs found

Efficient Last-iterate Convergence Algorithms in Solving Games

Author: An Bo
Gao Yang
Ge Zhenxing
Li Wenbin
Meng Linjian
Publication venue
Publication date: 22/08/2023
Field of study

No-regret algorithms are popular for learning Nash equilibrium (NE) in two-player zero-sum normal-form games (NFGs) and extensive-form games (EFGs). Many recent works consider the last-iterate convergence no-regret algorithms. Among them, the two most famous algorithms are Optimistic Gradient Descent Ascent (OGDA) and Optimistic Multiplicative Weight Update (OMWU). However, OGDA has high per-iteration complexity. OMWU exhibits a lower per-iteration complexity but poorer empirical performance, and its convergence holds only when NE is unique. Recent works propose a Reward Transformation (RT) framework for MWU, which removes the uniqueness condition and achieves competitive performance with OMWU. Unfortunately, RT-based algorithms perform worse than OGDA under the same number of iterations, and their convergence guarantee is based on the continuous-time feedback assumption, which does not hold in most scenarios. To address these issues, we provide a closer analysis of the RT framework, which holds for both continuous and discrete-time feedback. We demonstrate that the essence of the RT framework is to transform the problem of learning NE in the original game into a series of strongly convex-concave optimization problems (SCCPs). We show that the bottleneck of RT-based algorithms is the speed of solving SCCPs. To improve the their empirical performance, we design a novel transformation method to enable the SCCPs can be solved by Regret Matching+ (RM+), a no-regret algorithm with better empirical performance, resulting in Reward Transformation RM+ (RTRM+). RTRM+ enjoys last-iterate convergence under the discrete-time feedback setting. Using the counterfactual regret decomposition framework, we propose Reward Transformation CFR+ (RTCFR+) to extend RTRM+ to EFGs. Experimental results show that our algorithms significantly outperform existing last-iterate convergence algorithms and RM+ (CFR+)

arXiv.org e-Print Archive

Generalized Bandit Regret Minimizer Framework in Imperfect Information Extensive-Form Game

Author: Gao Yang
Meng Linjian
Publication venue
Publication date: 18/08/2023
Field of study

Regret minimization methods are a powerful tool for learning approximate Nash equilibrium (NE) in two-player zero-sum imperfect information extensive-form games (IIEGs). We consider the problem in the interactive bandit-feedback setting where we don't know the dynamics of the IIEG. In general, only the interactive trajectory and the reached terminal node value

v(z^t)

are revealed. To learn NE, the regret minimizer is required to estimate the full-feedback loss gradient

\ell^t

v(z^t)

and minimize the regret. In this paper, we propose a generalized framework for this learning setting. It presents a theoretical framework for the design and the modular analysis of the bandit regret minimization methods. We demonstrate that the most recent bandit regret minimization methods can be analyzed as a particular case of our framework. Following this framework, we describe a novel method SIX-OMD to learn approximate NE. It is model-free and extremely improves the best existing convergence rate from the order of

O(\sqrt{X B/T}+\sqrt{Y C/T})

O(\sqrt{ M_{\mathcal{X}}/T} +\sqrt{ M_{\mathcal{Y}}/T})

. Moreover, SIX-OMD is computationally efficient as it needs to perform the current strategy and average strategy updates only along the sampled trajectory.Comment: The proof of this paper includes many errors, especially for SIX-OMD, the regret bound of this algorithm is not right since this regret is lower than the lowest theoretical regret bound obtained by information theor

arXiv.org e-Print Archive

An Efficient Deep Reinforcement Learning Algorithm for Solving Imperfect Information Extensive-Form Games

Author: An Bo
Gao Yang
Ge Zhenxing
Meng Linjian
Tian Pinzhuo
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 26/06/2023
Field of study

One of the most popular methods for learning Nash equilibrium (NE) in large-scale imperfect information extensive-form games (IIEFGs) is the neural variants of counterfactual regret minimization (CFR). CFR is a special case of Follow-The-Regularized-Leader (FTRL). At each iteration, the neural variants of CFR update the agent's strategy via the estimated counterfactual regrets. Then, they use neural networks to approximate the new strategy, which incurs an approximation error. These approximation errors will accumulate since the counterfactual regrets at iteration t are estimated using the agent's past approximated strategies. Such accumulated approximation error causes poor performance. To address this accumulated approximation error, we propose a novel FTRL algorithm called FTRL-ORW, which does not utilize the agent's past strategies to pick the next iteration strategy. More importantly, FTRL-ORW can update its strategy via the trajectories sampled from the game, which is suitable to solve large-scale IIEFGs since sampling multiple actions for each information set is too expensive in such games. However, it remains unclear which algorithm to use to compute the next iteration strategy for FTRL-ORW when only such sampled trajectories are revealed at iteration t. To address this problem and scale FTRL-ORW to large-scale games, we provide a model-free method called Deep FTRL-ORW, which computes the next iteration strategy using model-free Maximum Entropy Deep Reinforcement Learning. Experimental results on two-player zero-sum IIEFGs show that Deep FTRL-ORW significantly outperforms existing model-free neural methods and OS-MCCFR

Association for the Advancement of Artificial Intelligence: AAAI Publications

Mechanical properties of coral concrete subjected to uniaxial dynamic compression

Author: Al-Salloum
Arumugam
Bischoff
Carpinteri
Chen
Chen
Chen
Chen
Comite Euro-International du Beton
Da
Da
Deng
Dubois
Dąbrowski
Ehlert
Etxeberria
Feng
Franz
Frew
Gray
Grote
Heard
Howdyshell
Jiagui Liu
Jiawen Wu
Kaushik
Kolsky
Li
Li
Limeira
Linjian Ma
Liqun Duan
Lu
Lundberg
Ma
Mandelbrot
Meng
Ngo
Ravichandran
Salem
Shi
Tai
Vines
Wang
Wang
Wang
Wattanachai
Xia
You
Yu
Yuan
Zeng Li
Zentar
Zhang
Zhang
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref