11 research outputs found

    Convergent learning algorithms for unknown reward games

    Get PDF
    In this paper, we address the problem of convergence to Nash equilibria in games with rewards that are initially unknown and must be estimated over time from noisy observations. These games arise in many real-world applications, whenever rewards for actions cannot be prespecified and must be learned online, but standard results in game theory do not consider such settings. For this problem, we derive a multiagent version of Q\mathcal{Q}-learning to estimate the reward functions using novel forms of the ϵ\epsilon-greedy learning policy. Using these Q\mathcal{Q}-learning schemes to estimate reward functions, we then provide conditions guaranteeing the convergence of adaptive play and the better-reply processes to Nash equilibria in potential games and games with more general forms of acyclicity, and of regret matching to the set of correlated equilibria in generic games. A secondary result is that we prove the strong ergoditicity of stochastic adaptive play and stochastic better-reply processes in the case of vanishing perturbations. Finally, we illustrate the efficacy of the algorithms in a set of randomly generated three-player coordination games and show the practical necessity of our results by demonstrating that violations to the derived learning parameter conditions can cause the algorithms to fail to converge

    Game-theoretical control with continuous action sets

    Full text link
    Motivated by the recent applications of game-theoretical learning techniques to the design of distributed control systems, we study a class of control problems that can be formulated as potential games with continuous action sets, and we propose an actor-critic reinforcement learning algorithm that provably converges to equilibrium in this class of problems. The method employed is to analyse the learning process under study through a mean-field dynamical system that evolves in an infinite-dimensional function space (the space of probability distributions over the players' continuous controls). To do so, we extend the theory of finite-dimensional two-timescale stochastic approximation to an infinite-dimensional, Banach space setting, and we prove that the continuous dynamics of the process converge to equilibrium in the case of potential games. These results combine to give a provably-convergent learning algorithm in which players do not need to keep track of the controls selected by the other agents.Comment: 19 page

    Channel Selection for Network-assisted D2D Communication via No-Regret Bandit Learning with Calibrated Forecasting

    Full text link
    We consider the distributed channel selection problem in the context of device-to-device (D2D) communication as an underlay to a cellular network. Underlaid D2D users communicate directly by utilizing the cellular spectrum but their decisions are not governed by any centralized controller. Selfish D2D users that compete for access to the resources construct a distributed system, where the transmission performance depends on channel availability and quality. This information, however, is difficult to acquire. Moreover, the adverse effects of D2D users on cellular transmissions should be minimized. In order to overcome these limitations, we propose a network-assisted distributed channel selection approach in which D2D users are only allowed to use vacant cellular channels. This scenario is modeled as a multi-player multi-armed bandit game with side information, for which a distributed algorithmic solution is proposed. The solution is a combination of no-regret learning and calibrated forecasting, and can be applied to a broad class of multi-player stochastic learning problems, in addition to the formulated channel selection problem. Analytically, it is established that this approach not only yields vanishing regret (in comparison to the global optimal solution), but also guarantees that the empirical joint frequencies of the game converge to the set of correlated equilibria.Comment: 31 pages (one column), 9 figure
    corecore