12 research outputs found

    A Practical Algorithm for Multiplayer Bandits when Arm Means Vary Among Players

    Get PDF
    We study a multiplayer stochastic multi-armed bandit problem in which players cannot communicate, and if two or more players pull the same arm, a collision occurs and the involved players receive zero reward. We consider the challenging heterogeneous setting, in which different arms may have different means for different players, and propose a new and efficient algorithm that combines the idea of leveraging forced collisions for implicit communication and that of performing matching eliminations. We present a finite-time analysis of our algorithm, giving the first sublinear minimax regret bound for this problem, and prove that if the optimal assignment of players to arms is unique, our algorithm attains the optimal O(ln(T))O(\ln(T)) regret, solving an open question raised at NeurIPS 2018.Comment: AISTATS202

    Decentralized Learning in Online Queuing Systems

    Full text link
    Motivated by packet routing in computer networks, online queuing systems are composed of queues receiving packets at different rates. Repeatedly, they send packets to servers, each of them treating only at most one packet at a time. In the centralized case, the number of accumulated packets remains bounded (i.e., the system is \textit{stable}) as long as the ratio between service rates and arrival rates is larger than 11. In the decentralized case, individual no-regret strategies ensures stability when this ratio is larger than 22. Yet, myopically minimizing regret disregards the long term effects due to the carryover of packets to further rounds. On the other hand, minimizing long term costs leads to stable Nash equilibria as soon as the ratio exceeds ee1\frac{e}{e-1}. Stability with decentralized learning strategies with a ratio below 22 was a major remaining question. We first argue that for ratios up to 22, cooperation is required for stability of learning strategies, as selfish minimization of policy regret, a \textit{patient} notion of regret, might indeed still be unstable in this case. We therefore consider cooperative queues and propose the first learning decentralized algorithm guaranteeing stability of the system as long as the ratio of rates is larger than 11, thus reaching performances comparable to centralized strategies.Comment: NeurIPS 2021 camera read
    corecore