We study a multi-agent stochastic linear bandit with side information,
parameterized by an unknown vector θ∗∈Rd. The side
information consists of a finite collection of low-dimensional subspaces, one
of which contains θ∗. In our setting, agents can collaborate to reduce
regret by sending recommendations across a communication graph connecting them.
We present a novel decentralized algorithm, where agents communicate subspace
indices with each other, and each agent plays a projected variant of LinUCB on
the corresponding (low-dimensional) subspace. Through a combination of
collaborative best subspace identification, and per-agent learning of an
unknown vector in the corresponding low-dimensional subspace, we show that the
per-agent regret is much smaller than the case when agents do not communicate.
By collaborating to identify the subspace containing θ∗, we show that
each agent effectively solves an easier instance of the linear bandit (compared
to the case of no collaboration), thus leading to the reduced per-agent regret.
We finally complement these results through simulations