519 research outputs found
First-order regret bounds for combinatorial semi-bandits
We consider the problem of online combinatorial optimization under
semi-bandit feedback, where a learner has to repeatedly pick actions from a
combinatorial decision set in order to minimize the total losses associated
with its decisions. After making each decision, the learner observes the losses
associated with its action, but not other losses. For this problem, there are
several learning algorithms that guarantee that the learner's expected regret
grows as with the number of rounds . In this
paper, we propose an algorithm that improves this scaling to
, where is the total loss of the best
action. Our algorithm is among the first to achieve such guarantees in a
partial-feedback scheme, and the first one to do so in a combinatorial setting.Comment: To appear at COLT 201
Matroid Bandits: Fast Combinatorial Optimization with Learning
A matroid is a notion of independence in combinatorial optimization which is
closely related to computational efficiency. In particular, it is well known
that the maximum of a constrained modular function can be found greedily if and
only if the constraints are associated with a matroid. In this paper, we bring
together the ideas of bandits and matroids, and propose a new class of
combinatorial bandits, matroid bandits. The objective in these problems is to
learn how to maximize a modular function on a matroid. This function is
stochastic and initially unknown. We propose a practical algorithm for solving
our problem, Optimistic Matroid Maximization (OMM); and prove two upper bounds,
gap-dependent and gap-free, on its regret. Both bounds are sublinear in time
and at most linear in all other quantities of interest. The gap-dependent upper
bound is tight and we prove a matching lower bound on a partition matroid
bandit. Finally, we evaluate our method on three real-world problems and show
that it is practical
- …