11 research outputs found
Dynamic social learning under graph constraints
We introduce a model of graph-constrained dynamic choice with reinforcement
modeled by positively -homogeneous rewards. We show that its empirical
process, which can be written as a stochastic approximation recursion with
Markov noise, has the same probability law as a certain vertex reinforced
random walk. We use this equivalence to show that for , the
asymptotic outcome concentrates around the optimum in a certain limiting sense
when `annealed' by letting slowly
Multi-Agent Low-Dimensional Linear Bandits
We study a multi-agent stochastic linear bandit with side information,
parameterized by an unknown vector . The side
information consists of a finite collection of low-dimensional subspaces, one
of which contains . In our setting, agents can collaborate to reduce
regret by sending recommendations across a communication graph connecting them.
We present a novel decentralized algorithm, where agents communicate subspace
indices with each other, and each agent plays a projected variant of LinUCB on
the corresponding (low-dimensional) subspace. Through a combination of
collaborative best subspace identification, and per-agent learning of an
unknown vector in the corresponding low-dimensional subspace, we show that the
per-agent regret is much smaller than the case when agents do not communicate.
By collaborating to identify the subspace containing , we show that
each agent effectively solves an easier instance of the linear bandit (compared
to the case of no collaboration), thus leading to the reduced per-agent regret.
We finally complement these results through simulations
On-Demand Communication for Asynchronous Multi-Agent Bandits
This paper studies a cooperative multi-agent multi-armed stochastic bandit
problem where agents operate asynchronously -- agent pull times and rates are
unknown, irregular, and heterogeneous -- and face the same instance of a
K-armed bandit problem. Agents can share reward information to speed up the
learning process at additional communication costs. We propose ODC, an
on-demand communication protocol that tailors the communication of each pair of
agents based on their empirical pull times. ODC is efficient when the pull
times of agents are highly heterogeneous, and its communication complexity
depends on the empirical pull times of agents. ODC is a generic protocol that
can be integrated into most cooperative bandit algorithms without degrading
their performance. We then incorporate ODC into the natural extensions of UCB
and AAE algorithms and propose two communication-efficient cooperative
algorithms. Our analysis shows that both algorithms are near-optimal in regret.Comment: Accepted by AISTATS 202
The Gossiping Insert-Eliminate Algorithm for Multi-Agent Bandits
We consider a decentralized multi-agent Multi Armed Bandit (MAB) setup
consisting of agents, solving the same MAB instance to minimize individual
cumulative regret. In our model, agents collaborate by exchanging messages
through pairwise gossip style communications on an arbitrary connected graph.
We develop two novel algorithms, where each agent only plays from a subset of
all the arms. Agents use the communication medium to recommend only arm-IDs
(not samples), and thus update the set of arms from which they play. We
establish that, if agents communicate times through any
connected pairwise gossip mechanism, then every agent's regret is a factor of
order smaller compared to the case of no collaborations. Furthermore, we
show that the communication constraints only have a second order effect on the
regret of our algorithm. We then analyze this second order term of the regret
to derive bounds on the regret-communication tradeoffs. Finally, we empirically
evaluate our algorithm and conclude that the insights are fundamental and not
artifacts of our bounds. We also show a lower bound which gives that the regret
scaling obtained by our algorithm cannot be improved even in the absence of any
communication constraints. Our results thus demonstrate that even a minimal
level of collaboration among agents greatly reduces regret for all agents.Comment: To Appear in AISTATS 2020. The first two authors contributed equall