528 research outputs found
Best-Arm Identification in Linear Bandits
We study the best-arm identification problem in linear bandit, where the
rewards of the arms depend linearly on an unknown parameter and the
objective is to return the arm with the largest reward. We characterize the
complexity of the problem and introduce sample allocation strategies that pull
arms to identify the best arm with a fixed confidence, while minimizing the
sample budget. In particular, we show the importance of exploiting the global
linear structure to improve the estimate of the reward of near-optimal arms. We
analyze the proposed strategies and compare their empirical performance.
Finally, as a by-product of our analysis, we point out the connection to the
-optimality criterion used in optimal experimental design.Comment: In Advances in Neural Information Processing Systems 27 (NIPS), 201
Local Clustering in Contextual Multi-Armed Bandits
We study identifying user clusters in contextual multi-armed bandits (MAB).
Contextual MAB is an effective tool for many real applications, such as content
recommendation and online advertisement. In practice, user dependency plays an
essential role in the user's actions, and thus the rewards. Clustering similar
users can improve the quality of reward estimation, which in turn leads to more
effective content recommendation and targeted advertising. Different from
traditional clustering settings, we cluster users based on the unknown bandit
parameters, which will be estimated incrementally. In particular, we define the
problem of cluster detection in contextual MAB, and propose a bandit algorithm,
LOCB, embedded with local clustering procedure. And, we provide theoretical
analysis about LOCB in terms of the correctness and efficiency of clustering
and its regret bound. Finally, we evaluate the proposed algorithm from various
aspects, which outperforms state-of-the-art baselines.Comment: 12 page
- …