80 research outputs found
Unimodal Thompson Sampling for Graph-Structured Arms
We study, to the best of our knowledge, the first Bayesian algorithm for
unimodal Multi-Armed Bandit (MAB) problems with graph structure. In this
setting, each arm corresponds to a node of a graph and each edge provides a
relationship, unknown to the learner, between two nodes in terms of expected
reward. Furthermore, for any node of the graph there is a path leading to the
unique node providing the maximum expected reward, along which the expected
reward is monotonically increasing. Previous results on this setting describe
the behavior of frequentist MAB algorithms. In our paper, we design a Thompson
Sampling-based algorithm whose asymptotic pseudo-regret matches the lower bound
for the considered setting. We show that -as it happens in a wide number of
scenarios- Bayesian MAB algorithms dramatically outperform frequentist ones. In
particular, we provide a thorough experimental evaluation of the performance of
our and state-of-the-art algorithms as the properties of the graph vary
Infinite Action Contextual Bandits with Reusable Data Exhaust
For infinite action contextual bandits, smoothed regret and reduction to
regression results in state-of-the-art online performance with computational
cost independent of the action set: unfortunately, the resulting data exhaust
does not have well-defined importance-weights. This frustrates the execution of
downstream data science processes such as offline model selection. In this
paper we describe an online algorithm with an equivalent smoothed regret
guarantee, but which generates well-defined importance weights: in exchange,
the online computational cost increases, but only to order smoothness (i.e.,
still independent of the action set). This removes a key obstacle to adoption
of smoothed regret in production scenarios.Comment: Final version after responding to reviewer
The Influence of Shape Constraints on the Thresholding Bandit Problem
We investigate the stochastic Thresholding Bandit problem (TBP) under several
shape constraints. On top of (i) the vanilla, unstructured TBP, we consider the
case where (ii) the sequence of arm's means is monotonically
increasing MTBP, (iii) the case where is unimodal UTBP and (iv) the
case where is concave CTBP. In the TBP problem the aim is to
output, at the end of the sequential game, the set of arms whose means are
above a given threshold. The regret is the highest gap between a misclassified
arm and the threshold. In the fixed budget setting, we provide problem
independent minimax rates for the expected regret in all settings, as well as
associated algorithms. We prove that the minimax rates for the regret are (i)
for TBP, (ii) for MTBP, (iii)
for UTBP and (iv) for CTBP, where is the
number of arms and is the budget. These rates demonstrate that the
dependence on of the minimax regret varies significantly depending on the
shape constraint. This highlights the fact that the shape constraints modify
fundamentally the nature of the TBP
- …