71,625 research outputs found

    First-order regret bounds for combinatorial semi-bandits

    Get PDF
    We consider the problem of online combinatorial optimization under semi-bandit feedback, where a learner has to repeatedly pick actions from a combinatorial decision set in order to minimize the total losses associated with its decisions. After making each decision, the learner observes the losses associated with its action, but not other losses. For this problem, there are several learning algorithms that guarantee that the learner's expected regret grows as O~(T)\widetilde{O}(\sqrt{T}) with the number of rounds TT. In this paper, we propose an algorithm that improves this scaling to O~(LT∗)\widetilde{O}(\sqrt{{L_T^*}}), where LT∗L_T^* is the total loss of the best action. Our algorithm is among the first to achieve such guarantees in a partial-feedback scheme, and the first one to do so in a combinatorial setting.Comment: To appear at COLT 201

    Slow Learners are Fast

    Full text link
    Online learning algorithms have impressive convergence properties when it comes to risk minimization and convex games on very large problems. However, they are inherently sequential in their design which prevents them from taking advantage of modern multi-core architectures. In this paper we prove that online learning with delayed updates converges well, thereby facilitating parallel online learning.Comment: Extended version of conference paper - NIPS 200
    • …
    corecore