58 research outputs found
Trend Detection based Regret Minimization for Bandit Problems
We study a variation of the classical multi-armed bandits problem. In this
problem, the learner has to make a sequence of decisions, picking from a fixed
set of choices. In each round, she receives as feedback only the loss incurred
from the chosen action. Conventionally, this problem has been studied when
losses of the actions are drawn from an unknown distribution or when they are
adversarial. In this paper, we study this problem when the losses of the
actions also satisfy certain structural properties, and especially, do show a
trend structure. When this is true, we show that using \textit{trend
detection}, we can achieve regret of order with
respect to a switching strategy for the version of the problem where a single
action is chosen in each round and when actions
are chosen each round. This guarantee is a significant improvement over the
conventional benchmark. Our approach can, as a framework, be applied in
combination with various well-known bandit algorithms, like Exp3. For both
versions of the problem, we give regret guarantees also for the
\textit{anytime} setting, i.e. when the length of the choice-sequence is not
known in advance. Finally, we pinpoint the advantages of our method by
comparing it to some well-known other strategies
First-order regret bounds for combinatorial semi-bandits
We consider the problem of online combinatorial optimization under
semi-bandit feedback, where a learner has to repeatedly pick actions from a
combinatorial decision set in order to minimize the total losses associated
with its decisions. After making each decision, the learner observes the losses
associated with its action, but not other losses. For this problem, there are
several learning algorithms that guarantee that the learner's expected regret
grows as with the number of rounds . In this
paper, we propose an algorithm that improves this scaling to
, where is the total loss of the best
action. Our algorithm is among the first to achieve such guarantees in a
partial-feedback scheme, and the first one to do so in a combinatorial setting.Comment: To appear at COLT 201
- …