Search CORE

155 research outputs found

An Efficient Bandit Algorithm for Realtime Multivariate Optimization

Author: Hill Daniel N
Iyer Anand
Liu Yi
Nassif Houssam
Vishwanathan S V N
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 22/10/2018
Field of study

Optimization is commonly employed to determine the content of web pages, such as to maximize conversions on landing pages or click-through rates on search engine result pages. Often the layout of these pages can be decoupled into several separate decisions. For example, the composition of a landing page may involve deciding which image to show, which wording to use, what color background to display, etc. Such optimization is a combinatorial problem over an exponentially large decision space. Randomized experiments do not scale well to this setting, and therefore, in practice, one is typically limited to optimizing a single aspect of a web page at a time. This represents a missed opportunity in both the speed of experimentation and the exploitation of possible interactions between layout decisions. Here we focus on multivariate optimization of interactive web pages. We formulate an approach where the possible interactions between different components of the page are modeled explicitly. We apply bandit methodology to explore the layout space efficiently and use hill-climbing to select optimal content in realtime. Our algorithm also extends to contextualization and personalization of layout selection. Simulation results show the suitability of our approach to large decision spaces with strong interactions between content. We further apply our algorithm to optimize a message that promotes adoption of an Amazon service. After only a single week of online optimization, we saw a 21% conversion increase compared to the median layout. Our technique is currently being deployed to optimize content across several locations at Amazon.com.Comment: KDD'17 Audience Appreciation Awar

arXiv.org e-Print Archive

Crossref

Restless bandit marginal productivity indices I: singleproject case and optimal control of a make-to-stock M/G/1 queue

Author: Niño Mora José
Publication venue
Publication date: 01/02/2004
Field of study

This paper develops a framework based on convex optimization and economic ideas to formulate and solve by an index policy the problem of optimal dynamic effort allocation to a generic discrete-state restless bandit (i.e. binary-action: work/rest) project, elucidating a host of issues raised by Whittle (1988)Žs seminal work on the topic. Our contributions include: (i) a unifying definition of a projectŽs marginal productivity index (MPI), characterizing optimal policies; (ii) a complete characterization of indexability (existence of the MPI) as satisfaction by the project of the law of diminishing returns (to effort); (iii) sufficient indexability conditions based on partial conservation laws (PCLs), extending previous results of the author from the finite to the countable state case; (iv) application to a semi-Markov project, including a new MPI for a mixed longrun-average (LRA)/ bias criterion, which exists in relevant queueing control models where the index proposed by Whittle (1988) does not; and (v) optimal MPI policies for service-controlled make-to-order (MTO) and make-to-stock (MTS) M/G/1 queues with convex back order and stock holding cost rates, under discounted and LRA criteria

Universidad Carlos III de Madrid e-Archivo

Lipschitz Bandits: Regret Lower Bounds and Optimal Algorithms

Author: Combes Richard
Magureanu Stefan
Proutiere Alexandre
Publication venue
Publication date: 01/01/2014
Field of study

We consider stochastic multi-armed bandit problems where the expected reward is a Lipschitz function of the arm, and where the set of arms is either discrete or continuous. For discrete Lipschitz bandits, we derive asymptotic problem specific lower bounds for the regret satisfied by any algorithm, and propose OSLB and CKL-UCB, two algorithms that efficiently exploit the Lipschitz structure of the problem. In fact, we prove that OSLB is asymptotically optimal, as its asymptotic regret matches the lower bound. The regret analysis of our algorithms relies on a new concentration inequality for weighted sums of KL divergences between the empirical distributions of rewards and their true distributions. For continuous Lipschitz bandits, we propose to first discretize the action space, and then apply OSLB or CKL-UCB, algorithms that provably exploit the structure efficiently. This approach is shown, through numerical experiments, to significantly outperform existing algorithms that directly deal with the continuous set of arms. Finally the results and algorithms are extended to contextual bandits with similarities.Comment: COLT 201

arXiv.org e-Print Archive

HAL-CentraleSupelec

Publikationer från KTH

CiteSeerX

Digitala Vetenskapliga Arkivet - Academic Archive On-line

HAL-Rennes 1

A mathematical programming approach to stochastic and dynamic optimization problems

Author
Publication venue: Alfred P. Sloan School of Management, Massachusetts Institute of Technology
Publication date: 01/01/1994
Field of study

Includes bibliographical references (p. 46-50).Supported by a Presidential Young Investigator Award. DDM-9158118 Supported by matching funds from Draper Laboratory.Dimitris Bertsimas

DSpace@MIT

A mathematical programming approach to stochastic and dynamic optimization problems

Author
Publication venue: Massachusetts Institute of Technology, Operations Research Center
Publication date: 01/01/1994
Field of study

Includes bibliographical references (p. 46-50).Supported by a Presidential Young Investigator Award. DDM-9158118 Supported by matching funds from Draper Laboratory.Dimitris Bertsimas

DSpace@MIT

The achievable region method in the optimal control of queueing systems : formulations, bounds and policies

Author
Publication venue: Massachusetts Institute of Technology, Operations Research Center
Publication date: 01/01/1995
Field of study

Cover title.Includes bibliographical references (p. 44-48).Supported in part by a Presidential Young Investigator Award, with matching funds from Draper Laboratory. DDM-9158118Dimitris Bertsimas

DSpace@MIT

The achievable region method in the optimal control of queueing systems : formulations, bounds and policies

Author
Publication venue: Sloan School of Management, Massachusetts Institute of Technology]
Publication date: 01/01/1995
Field of study

Cover title.Includes bibliographical references (p. 44-48).Supported in part by a Presidential Young Investigator Award, with matching funds from Draper Laboratory. DDM-9158118Dimitris Bertsimas

DSpace@MIT