We consider a stochastic linear bandit problem with multiple users, where the
relationship between users is captured by an underlying graph and user
preferences are represented as smooth signals on the graph. We introduce a
novel bandit algorithm where the smoothness prior is imposed via the
random-walk graph Laplacian, which leads to a single-user cumulative regret
scaling as O~(ΨdT​) with time horizon T,
feature dimensionality d, and the scalar parameter Ψ∈(0,1) that
depends on the graph connectivity. This is an improvement over
O~(dT​) in \algo{LinUCB}~\Ccite{li2010contextual},
where user relationship is not taken into account. In terms of network regret
(sum of cumulative regret over n users), the proposed algorithm leads to a
scaling as O~(ΨdnT​), which is a significant
improvement over O~(ndT​) in the state-of-the-art
algorithm \algo{Gob.Lin} \Ccite{cesa2013gang}. To improve scalability, we
further propose a simplified algorithm with a linear computational complexity
with respect to the number of users, while maintaining the same regret.
Finally, we present a finite-time analysis on the proposed algorithms, and
demonstrate their advantage in comparison with state-of-the-art graph-based
bandit algorithms on both synthetic and real-world data