Search CORE

19 research outputs found

The Gossiping Insert-Eliminate Algorithm for Multi-Agent Bandits

Author: Chawla Ronshee
Ganesh Ayalvadi
Sankararaman Abishek
Shakkottai Sanjay
Publication venue
Publication date: 17/01/2020
Field of study

We consider a decentralized multi-agent Multi Armed Bandit (MAB) setup consisting of

N

agents, solving the same MAB instance to minimize individual cumulative regret. In our model, agents collaborate by exchanging messages through pairwise gossip style communications on an arbitrary connected graph. We develop two novel algorithms, where each agent only plays from a subset of all the arms. Agents use the communication medium to recommend only arm-IDs (not samples), and thus update the set of arms from which they play. We establish that, if agents communicate

\Omega(\log(T))

times through any connected pairwise gossip mechanism, then every agent's regret is a factor of order

N

smaller compared to the case of no collaborations. Furthermore, we show that the communication constraints only have a second order effect on the regret of our algorithm. We then analyze this second order term of the regret to derive bounds on the regret-communication tradeoffs. Finally, we empirically evaluate our algorithm and conclude that the insights are fundamental and not artifacts of our bounds. We also show a lower bound which gives that the regret scaling obtained by our algorithm cannot be improved even in the absence of any communication constraints. Our results thus demonstrate that even a minimal level of collaboration among agents greatly reduces regret for all agents.Comment: To Appear in AISTATS 2020. The first two authors contributed equall

arXiv.org e-Print Archive

Explore Bristol Research

Asymptotic Optimality for Decentralised Bandits

Author: Ganesh Ayalvadi
Newton Conor J.
Reeve Henry W.J.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/06/2022
Field of study

Explore Bristol Research

Multi-Agent Low-Dimensional Linear Bandits

Author: Chawla Ronshee
Sankararaman Abishek
Shakkottai Sanjay
Publication venue
Publication date: 29/10/2020
Field of study

We study a multi-agent stochastic linear bandit with side information, parameterized by an unknown vector

\theta^* \in \mathbb{R}^d

. The side information consists of a finite collection of low-dimensional subspaces, one of which contains

\theta^*

. In our setting, agents can collaborate to reduce regret by sending recommendations across a communication graph connecting them. We present a novel decentralized algorithm, where agents communicate subspace indices with each other, and each agent plays a projected variant of LinUCB on the corresponding (low-dimensional) subspace. Through a combination of collaborative best subspace identification, and per-agent learning of an unknown vector in the corresponding low-dimensional subspace, we show that the per-agent regret is much smaller than the case when agents do not communicate. By collaborating to identify the subspace containing

\theta^*

, we show that each agent effectively solves an easier instance of the linear bandit (compared to the case of no collaboration), thus leading to the reduced per-agent regret. We finally complement these results through simulations

arXiv.org e-Print Archive

Communication-Efficient Collaborative Regret Minimization in Multi-Armed Bandits

Author: Karpov Nikolai
Zhang Qin
Publication venue
Publication date: 11/08/2023
Field of study

In this paper, we study the collaborative learning model, which concerns the tradeoff between parallelism and communication overhead in multi-agent multi-armed bandits. For regret minimization in multi-armed bandits, we present the first set of tradeoffs between the number of rounds of communication among the agents and the regret of the collaborative learning process.Comment: 13 pages, 1 figur

arXiv.org e-Print Archive

Cooperative Thresholded Lasso for Sparse Linear Bandit

Author: Barghi Haniyeh
Cheng Xiaotong
Maghsudi Setareh
Publication venue
Publication date: 30/05/2023
Field of study

We present a novel approach to address the multi-agent sparse contextual linear bandit problem, in which the feature vectors have a high dimension

d

whereas the reward function depends on only a limited set of features - precisely

s_0 \ll d

. Furthermore, the learning follows under information-sharing constraints. The proposed method employs Lasso regression for dimension reduction, allowing each agent to independently estimate an approximate set of main dimensions and share that information with others depending on the network's structure. The information is then aggregated through a specific process and shared with all agents. Each agent then resolves the problem with ridge regression focusing solely on the extracted dimensions. We represent algorithms for both a star-shaped network and a peer-to-peer network. The approaches effectively reduce communication costs while ensuring minimal cumulative regret per agent. Theoretically, we show that our proposed methods have a regret bound of order

\mathcal{O}(s_0 \log d + s_0 \sqrt{T})

with high probability, where

T

is the time horizon. To our best knowledge, it is the first algorithm that tackles row-wise distributed data in sparse linear bandits, achieving comparable performance compared to the state-of-the-art single and multi-agent methods. Besides, it is widely applicable to high-dimensional multi-agent problems where efficient feature extraction is critical for minimizing regret. To validate the effectiveness of our approach, we present experimental results on both synthetic and real-world datasets

arXiv.org e-Print Archive

On-Demand Communication for Asynchronous Multi-Agent Bandits

Author: Chen Yu-Zhen Janice
Hajiesmaili Mohammad
Liu Xutong
Lui John C. S.
Towsley Don
Wang Xuchuang
Yang Lin
Publication venue
Publication date: 14/02/2023
Field of study

This paper studies a cooperative multi-agent multi-armed stochastic bandit problem where agents operate asynchronously -- agent pull times and rates are unknown, irregular, and heterogeneous -- and face the same instance of a K-armed bandit problem. Agents can share reward information to speed up the learning process at additional communication costs. We propose ODC, an on-demand communication protocol that tailors the communication of each pair of agents based on their empirical pull times. ODC is efficient when the pull times of agents are highly heterogeneous, and its communication complexity depends on the empirical pull times of agents. ODC is a generic protocol that can be integrated into most cooperative bandit algorithms without degrading their performance. We then incorporate ODC into the natural extensions of UCB and AAE algorithms and propose two communication-efficient cooperative algorithms. Our analysis shows that both algorithms are near-optimal in regret.Comment: Accepted by AISTATS 202

arXiv.org e-Print Archive

Collaborative Multi-Agent Heterogeneous Multi-Armed Bandits

Author: Chawla Ronshee
Shakkottai Sanjay
Srikant R.
Vial Daniel
Publication venue
Publication date: 30/05/2023
Field of study

The study of collaborative multi-agent bandits has attracted significant attention recently. In light of this, we initiate the study of a new collaborative setting, consisting of

N

agents such that each agent is learning one of

M

stochastic multi-armed bandits to minimize their group cumulative regret. We develop decentralized algorithms which facilitate collaboration between the agents under two scenarios. We characterize the performance of these algorithms by deriving the per agent cumulative regret and group regret upper bounds. We also prove lower bounds for the group regret in this setting, which demonstrates the near-optimal behavior of the proposed algorithms.Comment: To appear in the proceedings of ICML 202

arXiv.org e-Print Archive

Tractable Optimality in Episodic Latent MABs

Author: Caramanis Constantine
Efroni Yonathan
Kwon Jeongyeol
Mannor Shie
Publication venue
Publication date: 05/10/2022
Field of study

We consider a multi-armed bandit problem with

M

latent contexts, where an agent interacts with the environment for an episode of

H

time steps. Depending on the length of the episode, the learner may not be able to estimate accurately the latent context. The resulting partial observation of the environment makes the learning task significantly more challenging. Without any additional structural assumptions, existing techniques to tackle partially observed settings imply the decision maker can learn a near-optimal policy with

O(A)^H

episodes, but do not promise more. In this work, we show that learning with {\em polynomial} samples in

A

is possible. We achieve this by using techniques from experiment design. Then, through a method-of-moments approach, we design a procedure that provably learns a near-optimal policy with

O(\texttt{poly}(A) + \texttt{poly}(M,H)^{\min(M,H)})

interactions. In practice, we show that we can formulate the moment-matching via maximum likelihood estimation. In our experiments, this significantly outperforms the worst-case guarantees, as well as existing practical methods.Comment: NeurIPS 202

arXiv.org e-Print Archive