Search CORE

2,144 research outputs found

Contextual Bandits with Cross-learning

Author: Balseiro Santiago
Golrezaei Negin
Mahdian Mohammad
Mirrokni Vahab
Schneider Jon
Publication venue
Publication date: 03/01/2020
Field of study

In the classical contextual bandits problem, in each round

t

, a learner observes some context

c

, chooses some action

a

to perform, and receives some reward

r_{a,t}(c)

. We consider the variant of this problem where in addition to receiving the reward

r_{a,t}(c)

, the learner also learns the values of

r_{a,t}(c')

for all other contexts

c'

; i.e., the rewards that would have been achieved by performing that action under different contexts. This variant arises in several strategic settings, such as learning how to bid in non-truthful repeated auctions (in this setting the context is the decision maker's private valuation for each auction). We call this problem the contextual bandits problem with cross-learning. The best algorithms for the classical contextual bandits problem achieve

\tilde{O}(\sqrt{CKT})

regret against all stationary policies, where

C

is the number of contexts,

K

the number of actions, and

T

the number of rounds. We demonstrate algorithms for the contextual bandits problem with cross-learning that remove the dependence on

C

and achieve regret

O(\sqrt{KT})

(when contexts are stochastic with known distribution),

\tilde{O}(K^{1/3}T^{2/3})

(when contexts are stochastic with unknown distribution), and

\tilde{O}(\sqrt{KT})

(when contexts are adversarial but rewards are stochastic).Comment: 48 pages, 5 figure

arXiv.org e-Print Archive

DSpace@MIT

An Efficient Bandit Algorithm for Realtime Multivariate Optimization

Author: Hill Daniel N
Iyer Anand
Liu Yi
Nassif Houssam
Vishwanathan S V N
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 22/10/2018
Field of study

Optimization is commonly employed to determine the content of web pages, such as to maximize conversions on landing pages or click-through rates on search engine result pages. Often the layout of these pages can be decoupled into several separate decisions. For example, the composition of a landing page may involve deciding which image to show, which wording to use, what color background to display, etc. Such optimization is a combinatorial problem over an exponentially large decision space. Randomized experiments do not scale well to this setting, and therefore, in practice, one is typically limited to optimizing a single aspect of a web page at a time. This represents a missed opportunity in both the speed of experimentation and the exploitation of possible interactions between layout decisions. Here we focus on multivariate optimization of interactive web pages. We formulate an approach where the possible interactions between different components of the page are modeled explicitly. We apply bandit methodology to explore the layout space efficiently and use hill-climbing to select optimal content in realtime. Our algorithm also extends to contextualization and personalization of layout selection. Simulation results show the suitability of our approach to large decision spaces with strong interactions between content. We further apply our algorithm to optimize a message that promotes adoption of an Amazon service. After only a single week of online optimization, we saw a 21% conversion increase compared to the median layout. Our technique is currently being deployed to optimize content across several locations at Amazon.com.Comment: KDD'17 Audience Appreciation Awar

arXiv.org e-Print Archive

Crossref

Nonparametric Stochastic Contextual Bandits

Author: Guan Melody Y.
Jiang Heinrich
Publication venue
Publication date: 05/01/2018
Field of study

We analyze the

K

-armed bandit problem where the reward for each arm is a noisy realization based on an observed context under mild nonparametric assumptions. We attain tight results for top-arm identification and a sublinear regret of

\widetilde{O}\Big(T^{\frac{1+D}{2+D}}\Big)

, where

D

is the context dimension, for a modified UCB algorithm that is simple to implement (

k

NN-UCB). We then give global intrinsic dimension dependent and ambient dimension independent regret bounds. We also discuss recovering topological structures within the context space based on expected bandit performance and provide an extension to infinite-armed contextual bandits. Finally, we experimentally show the improvement of our algorithm over existing multi-armed bandit approaches for both simulated tasks and MNIST image classification.Comment: AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications