12 research outputs found
A Practical Algorithm for Multiplayer Bandits when Arm Means Vary Among Players
We study a multiplayer stochastic multi-armed bandit problem in which players
cannot communicate, and if two or more players pull the same arm, a collision
occurs and the involved players receive zero reward. We consider the
challenging heterogeneous setting, in which different arms may have different
means for different players, and propose a new and efficient algorithm that
combines the idea of leveraging forced collisions for implicit communication
and that of performing matching eliminations. We present a finite-time analysis
of our algorithm, giving the first sublinear minimax regret bound for this
problem, and prove that if the optimal assignment of players to arms is unique,
our algorithm attains the optimal regret, solving an open question
raised at NeurIPS 2018.Comment: AISTATS202
Decentralized Learning in Online Queuing Systems
Motivated by packet routing in computer networks, online queuing systems are
composed of queues receiving packets at different rates. Repeatedly, they send
packets to servers, each of them treating only at most one packet at a time. In
the centralized case, the number of accumulated packets remains bounded (i.e.,
the system is \textit{stable}) as long as the ratio between service rates and
arrival rates is larger than . In the decentralized case, individual
no-regret strategies ensures stability when this ratio is larger than . Yet,
myopically minimizing regret disregards the long term effects due to the
carryover of packets to further rounds. On the other hand, minimizing long term
costs leads to stable Nash equilibria as soon as the ratio exceeds
. Stability with decentralized learning strategies with a ratio
below was a major remaining question. We first argue that for ratios up to
, cooperation is required for stability of learning strategies, as selfish
minimization of policy regret, a \textit{patient} notion of regret, might
indeed still be unstable in this case. We therefore consider cooperative queues
and propose the first learning decentralized algorithm guaranteeing stability
of the system as long as the ratio of rates is larger than , thus reaching
performances comparable to centralized strategies.Comment: NeurIPS 2021 camera read