9 research outputs found
Trend Detection based Regret Minimization for Bandit Problems
We study a variation of the classical multi-armed bandits problem. In this
problem, the learner has to make a sequence of decisions, picking from a fixed
set of choices. In each round, she receives as feedback only the loss incurred
from the chosen action. Conventionally, this problem has been studied when
losses of the actions are drawn from an unknown distribution or when they are
adversarial. In this paper, we study this problem when the losses of the
actions also satisfy certain structural properties, and especially, do show a
trend structure. When this is true, we show that using \textit{trend
detection}, we can achieve regret of order with
respect to a switching strategy for the version of the problem where a single
action is chosen in each round and when actions
are chosen each round. This guarantee is a significant improvement over the
conventional benchmark. Our approach can, as a framework, be applied in
combination with various well-known bandit algorithms, like Exp3. For both
versions of the problem, we give regret guarantees also for the
\textit{anytime} setting, i.e. when the length of the choice-sequence is not
known in advance. Finally, we pinpoint the advantages of our method by
comparing it to some well-known other strategies
Efficient Change-Point Detection for Tackling Piecewise-Stationary Bandits
International audienceWe introduce GLR-klUCB, a novel algorithm for the piecewise iid non-stationary bandit problem with bounded rewards. This algorithm combines an efficient bandit algorithm, kl-UCB, with an efficient, parameter-free, changepoint detector, the Bernoulli Generalized Likelihood Ratio Test, for which we provide new theoretical guarantees of independent interest. Unlike previous non-stationary bandit algorithms using a change-point detector, GLR-klUCB does not need to be calibrated based on prior knowledge on the arms' means. We prove that this algorithm can attain a regret in rounds on some ``easy'' instances, where A is the number of arms and the number of change-points, without prior knowledge of . In contrast with recently proposed algorithms that are agnostic to , we perform a numerical study showing that GLR-klUCB is also very efficient in practice, beyond easy instances