Regularized Contextual Bandits

Berthet, Quentin; Fontaine, Xavier; Perchet, Vianney

Regularized Contextual Bandits

Authors: Quentin Berthet
Xavier Fontaine
Vianney Perchet
Publication date: 1 April 2019
Publisher: HAL CCSD

Abstract

International audienceWe consider the stochastic contextual bandit problem with additional regularization. The motivation comes from problems where the policy of the agent must be close to some baseline policy known to perform well on the task. To tackle this problem we use a nonparametric model and propose an algorithm splitting the context space into bins, solving simultaneously-and independently-regularized multi-armed bandit instances on each bin. We derive slow and fast rates of convergence, depending on the unknown complexity of the problem. We also consider a new relevant margin condition to get problem-independent convergence rates, yielding intermediate rates interpolating between the aforementioned slow and fast rates

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Archive Ouverte en Sciences de l'Information et de la Communication

oai:HAL:hal-02457917v1

Last time updated on 26/02/2020