Search CORE

4,330 research outputs found

Pure Exploration with Multiple Correct Answers

Author: Degenne Rémy
Koolen Wouter M.
Publication venue
Publication date: 01/01/2019
Field of study

We determine the sample complexity of pure exploration bandit problems with multiple good answers. We derive a lower bound using a new game equilibrium argument. We show how continuity and convexity properties of single-answer problems ensures that the Track-and-Stop algorithm has asymptotically optimal sample complexity. However, that convexity is lost when going to the multiple-answer setting. We present a new algorithm which extends Track-and-Stop to the multiple-answer case and has asymptotic sample complexity matching the lower bound

arXiv.org e-Print Archive

CWI's Institutional Repository

Best-Arm Identification in Linear Bandits

Author: Lazaric Alessandro
Munos Rémi
Soare Marta
Publication venue
Publication date: 04/11/2014
Field of study

We study the best-arm identification problem in linear bandit, where the rewards of the arms depend linearly on an unknown parameter

\theta^*

and the objective is to return the arm with the largest reward. We characterize the complexity of the problem and introduce sample allocation strategies that pull arms to identify the best arm with a fixed confidence, while minimizing the sample budget. In particular, we show the importance of exploiting the global linear structure to improve the estimate of the reward of near-optimal arms. We analyze the proposed strategies and compare their empirical performance. Finally, as a by-product of our analysis, we point out the connection to the

G

-optimality criterion used in optimal experimental design.Comment: In Advances in Neural Information Processing Systems 27 (NIPS), 201

arXiv.org e-Print Archive

HAL - Lille 3

INRIA a CCSD electronic archive server

An Efficient Bandit Algorithm for Realtime Multivariate Optimization

Author: Hill Daniel N
Iyer Anand
Liu Yi
Nassif Houssam
Vishwanathan S V N
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 22/10/2018
Field of study

Optimization is commonly employed to determine the content of web pages, such as to maximize conversions on landing pages or click-through rates on search engine result pages. Often the layout of these pages can be decoupled into several separate decisions. For example, the composition of a landing page may involve deciding which image to show, which wording to use, what color background to display, etc. Such optimization is a combinatorial problem over an exponentially large decision space. Randomized experiments do not scale well to this setting, and therefore, in practice, one is typically limited to optimizing a single aspect of a web page at a time. This represents a missed opportunity in both the speed of experimentation and the exploitation of possible interactions between layout decisions. Here we focus on multivariate optimization of interactive web pages. We formulate an approach where the possible interactions between different components of the page are modeled explicitly. We apply bandit methodology to explore the layout space efficiently and use hill-climbing to select optimal content in realtime. Our algorithm also extends to contextualization and personalization of layout selection. Simulation results show the suitability of our approach to large decision spaces with strong interactions between content. We further apply our algorithm to optimize a message that promotes adoption of an Amazon service. After only a single week of online optimization, we saw a 21% conversion increase compared to the median layout. Our technique is currently being deployed to optimize content across several locations at Amazon.com.Comment: KDD'17 Audience Appreciation Awar

arXiv.org e-Print Archive

Crossref