533 research outputs found
Cooperative Online Learning: Keeping your Neighbors Updated
We study an asynchronous online learning setting with a network of agents. At
each time step, some of the agents are activated, requested to make a
prediction, and pay the corresponding loss. The loss function is then revealed
to these agents and also to their neighbors in the network. Our results
characterize how much knowing the network structure affects the regret as a
function of the model of agent activations. When activations are stochastic,
the optimal regret (up to constant factors) is shown to be of order
, where is the horizon and is the independence
number of the network. We prove that the upper bound is achieved even when
agents have no information about the network structure. When activations are
adversarial the situation changes dramatically: if agents ignore the network
structure, a lower bound on the regret can be proven, showing that
learning is impossible. However, when agents can choose to ignore some of their
neighbors based on the knowledge of the network structure, we prove a
sublinear regret bound, where is the clique-covering number of the network
Online Game with Time-Varying Coupled Inequality Constraints
In this paper, online game is studied, where at each time, a group of players
aim at selfishly minimizing their own time-varying cost function simultaneously
subject to time-varying coupled constraints and local feasible set constraints.
Only local cost functions and local constraints are available to individual
players, who can share limited information with their neighbors through a fixed
and connected graph. In addition, players have no prior knowledge of future
cost functions and future local constraint functions. In this setting, a novel
decentralized online learning algorithm is devised based on mirror descent and
a primal-dual strategy. The proposed algorithm can achieve sublinearly bounded
regrets and constraint violation by appropriately choosing decaying stepsizes.
Furthermore, it is shown that the generated sequence of play by the designed
algorithm can converge to the variational GNE of a strongly monotone game, to
which the online game converges. Additionally, a payoff-based case, i.e., in a
bandit feedback setting, is also considered and a new payoff-based learning
policy is devised to generate sublinear regrets and constraint violation.
Finally, the obtained theoretical results are corroborated by numerical
simulations.Comment: arXiv admin note: text overlap with arXiv:2105.0620
Distributed Online Optimization with Coupled Inequality Constraints over Unbalanced Directed Networks
This paper studies a distributed online convex optimization problem, where
agents in an unbalanced network cooperatively minimize the sum of their
time-varying local cost functions subject to a coupled inequality constraint.
To solve this problem, we propose a distributed dual subgradient tracking
algorithm, called DUST, which attempts to optimize a dual objective by means of
tracking the primal constraint violations and integrating dual subgradient and
push sum techniques. Different from most existing works, we allow the
underlying network to be unbalanced with a column stochastic mixing matrix. We
show that DUST achieves sublinear dynamic regret and constraint violations,
provided that the accumulated variation of the optimal sequence grows
sublinearly. If the standard Slater's condition is additionally imposed, DUST
acquires a smaller constraint violation bound than the alternative existing
methods applicable to unbalanced networks. Simulations on a plug-in electric
vehicle charging problem demonstrate the superior convergence of DUST
- …