Contextual Bandits with Cross-learning

Balseiro, Santiago; Golrezaei, Negin; Mahdian, Mohammad; Mirrokni, Vahab; Schneider, Jon

research

Contextual Bandits with Cross-learning

Authors: Santiago Balseiro
Negin Golrezaei
Mohammad Mahdian
Vahab Mirrokni
Jon Schneider
Publication date: 3 January 2020
Publisher

Abstract

In the classical contextual bandits problem, in each round

t

, a learner observes some context

c

, chooses some action

a

to perform, and receives some reward

r_{a,t}(c)

. We consider the variant of this problem where in addition to receiving the reward

r_{a,t}(c)

, the learner also learns the values of

r_{a,t}(c')

for all other contexts

c'

; i.e., the rewards that would have been achieved by performing that action under different contexts. This variant arises in several strategic settings, such as learning how to bid in non-truthful repeated auctions (in this setting the context is the decision maker's private valuation for each auction). We call this problem the contextual bandits problem with cross-learning. The best algorithms for the classical contextual bandits problem achieve