4,330 research outputs found
Pure Exploration with Multiple Correct Answers
We determine the sample complexity of pure exploration bandit problems with
multiple good answers. We derive a lower bound using a new game equilibrium
argument. We show how continuity and convexity properties of single-answer
problems ensures that the Track-and-Stop algorithm has asymptotically optimal
sample complexity. However, that convexity is lost when going to the
multiple-answer setting. We present a new algorithm which extends
Track-and-Stop to the multiple-answer case and has asymptotic sample complexity
matching the lower bound
Best-Arm Identification in Linear Bandits
We study the best-arm identification problem in linear bandit, where the
rewards of the arms depend linearly on an unknown parameter and the
objective is to return the arm with the largest reward. We characterize the
complexity of the problem and introduce sample allocation strategies that pull
arms to identify the best arm with a fixed confidence, while minimizing the
sample budget. In particular, we show the importance of exploiting the global
linear structure to improve the estimate of the reward of near-optimal arms. We
analyze the proposed strategies and compare their empirical performance.
Finally, as a by-product of our analysis, we point out the connection to the
-optimality criterion used in optimal experimental design.Comment: In Advances in Neural Information Processing Systems 27 (NIPS), 201
An Efficient Bandit Algorithm for Realtime Multivariate Optimization
Optimization is commonly employed to determine the content of web pages, such
as to maximize conversions on landing pages or click-through rates on search
engine result pages. Often the layout of these pages can be decoupled into
several separate decisions. For example, the composition of a landing page may
involve deciding which image to show, which wording to use, what color
background to display, etc. Such optimization is a combinatorial problem over
an exponentially large decision space. Randomized experiments do not scale well
to this setting, and therefore, in practice, one is typically limited to
optimizing a single aspect of a web page at a time. This represents a missed
opportunity in both the speed of experimentation and the exploitation of
possible interactions between layout decisions.
Here we focus on multivariate optimization of interactive web pages. We
formulate an approach where the possible interactions between different
components of the page are modeled explicitly. We apply bandit methodology to
explore the layout space efficiently and use hill-climbing to select optimal
content in realtime. Our algorithm also extends to contextualization and
personalization of layout selection. Simulation results show the suitability of
our approach to large decision spaces with strong interactions between content.
We further apply our algorithm to optimize a message that promotes adoption of
an Amazon service. After only a single week of online optimization, we saw a
21% conversion increase compared to the median layout. Our technique is
currently being deployed to optimize content across several locations at
Amazon.com.Comment: KDD'17 Audience Appreciation Awar
- …