In this paper, online game is studied, where at each time, a group of players
aim at selfishly minimizing their own time-varying cost function simultaneously
subject to time-varying coupled constraints and local feasible set constraints.
Only local cost functions and local constraints are available to individual
players, who can share limited information with their neighbors through a fixed
and connected graph. In addition, players have no prior knowledge of future
cost functions and future local constraint functions. In this setting, a novel
decentralized online learning algorithm is devised based on mirror descent and
a primal-dual strategy. The proposed algorithm can achieve sublinearly bounded
regrets and constraint violation by appropriately choosing decaying stepsizes.
Furthermore, it is shown that the generated sequence of play by the designed
algorithm can converge to the variational GNE of a strongly monotone game, to
which the online game converges. Additionally, a payoff-based case, i.e., in a
bandit feedback setting, is also considered and a new payoff-based learning
policy is devised to generate sublinear regrets and constraint violation.
Finally, the obtained theoretical results are corroborated by numerical
simulations.Comment: arXiv admin note: text overlap with arXiv:2105.0620