2,478 research outputs found
Clustered Multi-Agent Linear Bandits
We address in this paper a particular instance of the multi-agent linear
stochastic bandit problem, called clustered multi-agent linear bandits. In this
setting, we propose a novel algorithm leveraging an efficient collaboration
between the agents in order to accelerate the overall optimization problem. In
this contribution, a network controller is responsible for estimating the
underlying cluster structure of the network and optimizing the experiences
sharing among agents within the same groups. We provide a theoretical analysis
for both the regret minimization problem and the clustering quality. Through
empirical evaluation against state-of-the-art algorithms on both synthetic and
real data, we demonstrate the effectiveness of our approach: our algorithm
significantly improves regret minimization while managing to recover the true
underlying cluster partitioning.Comment: 18 pages, 8 figure
Clustered Linear Contextual Bandits with Knapsacks
In this work, we study clustered contextual bandits where rewards and
resource consumption are the outcomes of cluster-specific linear models. The
arms are divided in clusters, with the cluster memberships being unknown to an
algorithm. Pulling an arm in a time period results in a reward and in
consumption for each one of multiple resources, and with the total consumption
of any resource exceeding a constraint implying the termination of the
algorithm. Thus, maximizing the total reward requires learning not only models
about the reward and the resource consumption, but also cluster memberships. We
provide an algorithm that achieves regret sublinear in the number of time
periods, without requiring access to all of the arms. In particular, we show
that it suffices to perform clustering only once to a randomly selected subset
of the arms. To achieve this result, we provide a sophisticated combination of
techniques from the literature of econometrics and of bandits with constraints
Thompson Sampling for Bandits with Clustered Arms
We propose algorithms based on a multi-level Thompson sampling scheme, for the stochastic multi-armed bandit and its contextual variant with linear expected rewards, in the setting where arms are clustered. We show, both theoretically and empirically, how exploiting a given cluster structure can significantly improve the regret and computational cost compared to using standard Thompson sampling. In the case of the stochastic multi-armed bandit we give upper bounds on the expected cumulative regret showing how it depends on the quality of the clustering. Finally, we perform an empirical evaluation showing that our algorithms perform well compared to previously proposed algorithms for bandits with clustered arms
- …