Search CORE

12 research outputs found

A Practical Algorithm for Multiplayer Bandits when Arm Means Vary Among Players

Author: Boursier Etienne
Kaufmann Emilie
Mehrabian Abbas
Perchet Vianney
Publication venue
Publication date: 23/05/2019
Field of study

We study a multiplayer stochastic multi-armed bandit problem in which players cannot communicate, and if two or more players pull the same arm, a collision occurs and the involved players receive zero reward. We consider the challenging heterogeneous setting, in which different arms may have different means for different players, and propose a new and efficient algorithm that combines the idea of leveraging forced collisions for implicit communication and that of performing matching eliminations. We present a finite-time analysis of our algorithm, giving the first sublinear minimax regret bound for this problem, and prove that if the optimal assignment of players to arms is unique, our algorithm attains the optimal

O(\ln(T))

regret, solving an open question raised at NeurIPS 2018.Comment: AISTATS202

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Decentralized Learning in Online Queuing Systems

Author: Boursier Etienne
Perchet Vianney
Sentenac Flore
Publication venue
Publication date: 23/08/2021
Field of study

Motivated by packet routing in computer networks, online queuing systems are composed of queues receiving packets at different rates. Repeatedly, they send packets to servers, each of them treating only at most one packet at a time. In the centralized case, the number of accumulated packets remains bounded (i.e., the system is \textit{stable}) as long as the ratio between service rates and arrival rates is larger than

1

. In the decentralized case, individual no-regret strategies ensures stability when this ratio is larger than

2

. Yet, myopically minimizing regret disregards the long term effects due to the carryover of packets to further rounds. On the other hand, minimizing long term costs leads to stable Nash equilibria as soon as the ratio exceeds

\frac{e}{e-1}

. Stability with decentralized learning strategies with a ratio below

2

was a major remaining question. We first argue that for ratios up to

2

, cooperation is required for stability of learning strategies, as selfish minimization of policy regret, a \textit{patient} notion of regret, might indeed still be unstable in this case. We therefore consider cooperative queues and propose the first learning decentralized algorithm guaranteeing stability of the system as long as the ratio of rates is larger than

1

, thus reaching performances comparable to centralized strategies.Comment: NeurIPS 2021 camera read

arXiv.org e-Print Archive

HAL Descartes