71,625 research outputs found
First-order regret bounds for combinatorial semi-bandits
We consider the problem of online combinatorial optimization under
semi-bandit feedback, where a learner has to repeatedly pick actions from a
combinatorial decision set in order to minimize the total losses associated
with its decisions. After making each decision, the learner observes the losses
associated with its action, but not other losses. For this problem, there are
several learning algorithms that guarantee that the learner's expected regret
grows as with the number of rounds . In this
paper, we propose an algorithm that improves this scaling to
, where is the total loss of the best
action. Our algorithm is among the first to achieve such guarantees in a
partial-feedback scheme, and the first one to do so in a combinatorial setting.Comment: To appear at COLT 201
Slow Learners are Fast
Online learning algorithms have impressive convergence properties when it
comes to risk minimization and convex games on very large problems. However,
they are inherently sequential in their design which prevents them from taking
advantage of modern multi-core architectures. In this paper we prove that
online learning with delayed updates converges well, thereby facilitating
parallel online learning.Comment: Extended version of conference paper - NIPS 200
- …