17 research outputs found
Mixed-strategy learning with continuous action sets
Motivated by the recent applications of game-theoretical learning to the design of distributed control systems, we study a class of control problems that can be formulated as potential games with continuous action sets. We propose an actor-critic reinforcement learning algorithm that adapts mixed strategies over continuous action spaces. To analyse the algorithm we extend the theory of finite-dimensional two-timescale stochastic approximation to a Banach space setting, and prove that the continuous dynamics of the process converge to equilibrium in the case of potential games. These results combine to give a provablyconvergent learning algorithm in which players do not need to keep track of the controls selected by other agents
Provable Sample Complexity Guarantees for Learning of Continuous-Action Graphical Games with Nonparametric Utilities
In this paper, we study the problem of learning the exact structure of
continuous-action games with non-parametric utility functions. We propose an
regularized method which encourages sparsity of the coefficients of
the Fourier transform of the recovered utilities. Our method works by accessing
very few Nash equilibria and their noisy utilities. Under certain technical
conditions, our method also recovers the exact structure of these utility
functions, and thus, the exact structure of the game. Furthermore, our method
only needs a logarithmic number of samples in terms of the number of players
and runs in polynomial time. We follow the primal-dual witness framework to
provide provable theoretical guarantees.Comment: arXiv admin note: text overlap with arXiv:1911.0422
Multi-Agent Distributed Reinforcement Learning for Making Decentralized Offloading Decisions
We formulate computation offloading as a decentralized decision-making
problem with autonomous agents. We design an interaction mechanism that
incentivizes agents to align private and system goals by balancing between
competition and cooperation. The mechanism provably has Nash equilibria with
optimal resource allocation in the static case. For a dynamic environment, we
propose a novel multi-agent online learning algorithm that learns with partial,
delayed and noisy state information, and a reward signal that reduces
information need to a great extent. Empirical results confirm that through
learning, agents significantly improve both system and individual performance,
e.g., 40% offloading failure rate reduction, 32% communication overhead
reduction, up to 38% computation resource savings in low contention, 18%
utilization increase with reduced load variation in high contention, and
improvement in fairness. Results also confirm the algorithm's good convergence
and generalization property in significantly different environments