Search CORE

1,393 research outputs found

On the Convergence of No-Regret Learning Dynamics in Time-Varying Games

Author: Anagnostides Ioannis
Farina Gabriele
Panageas Ioannis
Sandholm Tuomas
Publication venue
Publication date: 18/10/2023
Field of study

Most of the literature on learning in games has focused on the restrictive setting where the underlying repeated game does not change over time. Much less is known about the convergence of no-regret learning algorithms in dynamic multiagent settings. In this paper, we characterize the convergence of optimistic gradient descent (OGD) in time-varying games. Our framework yields sharp convergence bounds for the equilibrium gap of OGD in zero-sum games parameterized on natural variation measures of the sequence of games, subsuming known results for static games. Furthermore, we establish improved second-order variation bounds under strong convexity-concavity, as long as each game is repeated multiple times. Our results also apply to time-varying general-sum multi-player games via a bilinear formulation of correlated equilibria, which has novel implications for meta-learning and for obtaining refined variation-dependent regret bounds, addressing questions left open in prior papers. Finally, we leverage our framework to also provide new insights on dynamic regret guarantees in static games.Comment: To appear at NeurIPS 2023; V3 incorporates reviewers' feedback and minor correction

arXiv.org e-Print Archive

Multiagent Reinforcement Learning with Regret Matching for Robot Soccer

Author: Jiachen Ma
Qiang Liu
Wei Xie
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2013
Field of study

This paper proposes a novel multiagent reinforcement learning (MARL) algorithm Nash- learning with regret matching, in which regret matching is used to speed up the well-known MARL algorithm Nash- learning. It is critical that choosing a suitable strategy for action selection to harmonize the relation between exploration and exploitation to enhance the ability of online learning for Nash- learning. In Markov Game the joint action of agents adopting regret matching algorithm can converge to a group of points of no-regret that can be viewed as coarse correlated equilibrium which includes Nash equilibrium in essence. It is can be inferred that regret matching can guide exploration of the state-action space so that the rate of convergence of Nash- learning algorithm can be increased. Simulation results on robot soccer validate that compared to original Nash- learning algorithm, the use of regret matching during the learning phase of Nash- learning has excellent ability of online learning and results in significant performance in terms of scores, average reward and policy convergence

Crossref

Directory of Open Access Journals

Variance Reduction in Monte Carlo Counterfactual Regret Minimization (VR-MCCFR) for Extensive Form Games using Baselines

Author: Bowling Michael
Burch Neil
Kadlec Rudolf
Lanctot Marc
Moravcik Matej
Schmid Martin
Publication venue
Publication date: 09/09/2018
Field of study

Learning strategies for imperfect information games from samples of interaction is a challenging problem. A common method for this setting, Monte Carlo Counterfactual Regret Minimization (MCCFR), can have slow long-term convergence rates due to high variance. In this paper, we introduce a variance reduction technique (VR-MCCFR) that applies to any sampling variant of MCCFR. Using this technique, per-iteration estimated values and updates are reformulated as a function of sampled values and state-action baselines, similar to their use in policy gradient reinforcement learning. The new formulation allows estimates to be bootstrapped from other estimates within the same episode, propagating the benefits of baselines along the sampled trajectory; the estimates remain unbiased even when bootstrapping from other estimates. Finally, we show that given a perfect baseline, the variance of the value estimates can be reduced to zero. Experimental evaluation shows that VR-MCCFR brings an order of magnitude speedup, while the empirical variance decreases by three orders of magnitude. The decreased variance allows for the first time CFR+ to be used with sampling, increasing the speedup to two orders of magnitude

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

A Parameterisation of Algorithms for Distributed Constraint Optimisation via Potential Games

Author: Chapman Archie
Jennings N.R
Rogers Alex
Publication venue
Publication date: 01/05/2008
Field of study

This paper introduces a parameterisation of learning algorithms for distributed constraint optimisation problems (DCOPs). This parameterisation encompasses many algorithms developed in both the computer science and game theory literatures. It is built on our insight that when formulated as noncooperative games, DCOPs form a subset of the class of potential games. This result allows us to prove convergence properties of algorithms developed in the computer science literature using game theoretic methods. Furthermore, our parameterisation can assist system designers by making the pros and cons of, and the synergies between, the various DCOP algorithm components clear

CiteSeerX

Southampton (e-Prints Soton)

Spiral - Imperial College Digital Repository

An Optimal Online Method of Selecting Source Policies for Reinforcement Learning

Author: Li Siyuan
Zhang Chongjie
Publication venue
Publication date: 24/09/2017
Field of study

Transfer learning significantly accelerates the reinforcement learning process by exploiting relevant knowledge from previous experiences. The problem of optimally selecting source policies during the learning process is of great importance yet challenging. There has been little theoretical analysis of this problem. In this paper, we develop an optimal online method to select source policies for reinforcement learning. This method formulates online source policy selection as a multi-armed bandit problem and augments Q-learning with policy reuse. We provide theoretical guarantees of the optimal selection process and convergence to the optimal policy. In addition, we conduct experiments on a grid-based robot navigation domain to demonstrate its efficiency and robustness by comparing to the state-of-the-art transfer learning method

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications