17 research outputs found

    Mixed-strategy learning with continuous action sets

    Get PDF
    Motivated by the recent applications of game-theoretical learning to the design of distributed control systems, we study a class of control problems that can be formulated as potential games with continuous action sets. We propose an actor-critic reinforcement learning algorithm that adapts mixed strategies over continuous action spaces. To analyse the algorithm we extend the theory of finite-dimensional two-timescale stochastic approximation to a Banach space setting, and prove that the continuous dynamics of the process converge to equilibrium in the case of potential games. These results combine to give a provablyconvergent learning algorithm in which players do not need to keep track of the controls selected by other agents

    Provable Sample Complexity Guarantees for Learning of Continuous-Action Graphical Games with Nonparametric Utilities

    Full text link
    In this paper, we study the problem of learning the exact structure of continuous-action games with non-parametric utility functions. We propose an â„“1\ell_1 regularized method which encourages sparsity of the coefficients of the Fourier transform of the recovered utilities. Our method works by accessing very few Nash equilibria and their noisy utilities. Under certain technical conditions, our method also recovers the exact structure of these utility functions, and thus, the exact structure of the game. Furthermore, our method only needs a logarithmic number of samples in terms of the number of players and runs in polynomial time. We follow the primal-dual witness framework to provide provable theoretical guarantees.Comment: arXiv admin note: text overlap with arXiv:1911.0422

    Multi-Agent Distributed Reinforcement Learning for Making Decentralized Offloading Decisions

    Full text link
    We formulate computation offloading as a decentralized decision-making problem with autonomous agents. We design an interaction mechanism that incentivizes agents to align private and system goals by balancing between competition and cooperation. The mechanism provably has Nash equilibria with optimal resource allocation in the static case. For a dynamic environment, we propose a novel multi-agent online learning algorithm that learns with partial, delayed and noisy state information, and a reward signal that reduces information need to a great extent. Empirical results confirm that through learning, agents significantly improve both system and individual performance, e.g., 40% offloading failure rate reduction, 32% communication overhead reduction, up to 38% computation resource savings in low contention, 18% utilization increase with reduced load variation in high contention, and improvement in fairness. Results also confirm the algorithm's good convergence and generalization property in significantly different environments
    corecore