Search CORE

11 research outputs found

Convergent Learning Algorithms for Unknown Reward Games

Author: Alex Rogers
Archie C. Chapman
David S. Leslie
Foster D. P.
Gottlob G.
Nicholas R. Jennings
Wolpert D. H.
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date
Field of study

Convergent learning algorithms for unknown reward games

Author: Chapman Archie C.
Jennings Nicholas R.
Leslie David S.
Rogers Alex
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2013
Field of study

In this paper, we address the problem of convergence to Nash equilibria in games with rewards that are initially unknown and must be estimated over time from noisy observations. These games arise in many real-world applications, whenever rewards for actions cannot be prespecified and must be learned online, but standard results in game theory do not consider such settings. For this problem, we derive a multiagent version of

\mathcal{Q}

-learning to estimate the reward functions using novel forms of the

\epsilon

-greedy learning policy. Using these

\mathcal{Q}

-learning schemes to estimate reward functions, we then provide conditions guaranteeing the convergence of adaptive play and the better-reply processes to Nash equilibria in potential games and games with more general forms of acyclicity, and of regret matching to the set of correlated equilibria in generic games. A secondary result is that we prove the strong ergoditicity of stochastic adaptive play and stochastic better-reply processes in the case of vanishing perturbations. Finally, we illustrate the efficacy of the algorithms in a set of randomly generated three-player coordination games and show the practical necessity of our results by demonstrating that violations to the derived learning parameter conditions can cause the algorithms to fail to converge

Southampton (e-Prints Soton)

Crossref

Spiral - Imperial College Digital Repository

Lancaster E-Prints

Explore Bristol Research

University of Queensland eSpace

Game-theoretical control with continuous action sets

Author: Leslie David S.
Mertikopoulos Panayotis
Perkins Steven
Publication venue
Publication date: 01/01/2014
Field of study

Motivated by the recent applications of game-theoretical learning techniques to the design of distributed control systems, we study a class of control problems that can be formulated as potential games with continuous action sets, and we propose an actor-critic reinforcement learning algorithm that provably converges to equilibrium in this class of problems. The method employed is to analyse the learning process under study through a mean-field dynamical system that evolves in an infinite-dimensional function space (the space of probability distributions over the players' continuous controls). To do so, we extend the theory of finite-dimensional two-timescale stochastic approximation to an infinite-dimensional, Banach space setting, and we prove that the continuous dynamics of the process converge to equilibrium in the case of potential games. These results combine to give a provably-convergent learning algorithm in which players do not need to keep track of the controls selected by the other agents.Comment: 19 page

arXiv.org e-Print Archive

Lancaster E-Prints

Channel Selection for Network-assisted D2D Communication via No-Regret Bandit Learning with Calibrated Forecasting

Author: Maghsudi Setareh
Stanczak Slawomir
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/04/2014
Field of study

We consider the distributed channel selection problem in the context of device-to-device (D2D) communication as an underlay to a cellular network. Underlaid D2D users communicate directly by utilizing the cellular spectrum but their decisions are not governed by any centralized controller. Selfish D2D users that compete for access to the resources construct a distributed system, where the transmission performance depends on channel availability and quality. This information, however, is difficult to acquire. Moreover, the adverse effects of D2D users on cellular transmissions should be minimized. In order to overcome these limitations, we propose a network-assisted distributed channel selection approach in which D2D users are only allowed to use vacant cellular channels. This scenario is modeled as a multi-player multi-armed bandit game with side information, for which a distributed algorithmic solution is proposed. The solution is a combination of no-regret learning and calibrated forecasting, and can be applied to a broad class of multi-player stochastic learning problems, in addition to the formulated channel selection problem. Analytically, it is established that this approach not only yields vanishing regret (in comparison to the global optimal solution), but also guarantees that the empirical joint frequencies of the game converge to the set of correlated equilibria.Comment: 31 pages (one column), 9 figure

arXiv.org e-Print Archive

Fraunhofer-ePrints