Search CORE

11 research outputs found

Multi-armed Bandit Algorithms and Empirical Evaluation

Author: D. Luce
E. Even-Dar
H. Robbins
J.C. Gittins
J.P. Hardwick
L.P. Kaelbling
N. Cesa-Bianchi
N. Meuleau
P. Auer
P. Auer
P. Auer
P. Varaiya
R.S. Sutton
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

Pandora's Box Problem with Order Constraints

Author: Boodaghians Shant
Fusco Federico
Lazos Philip
Leonardi Stefano
Publication venue
Publication date: 01/01/2020
Field of study

The Pandora's Box Problem, originally formalized by Weitzman in 1979, models selection from set of random, alternative options, when evaluation is costly. This includes, for example, the problem of hiring a skilled worker, where only one hire can be made, but the evaluation of each candidate is an expensive procedure. Weitzman showed that the Pandora's Box Problem admits an elegant, simple solution, where the options are considered in decreasing order of reservation value,i.e., the value that reduces to zero the expected marginal gain for opening the box. We study for the first time this problem when order - or precedence - constraints are imposed between the boxes. We show that, despite the difficulty of defining reservation values for the boxes which take into account both in-depth and in-breath exploration of the various options, greedy optimal strategies exist and can be efficiently computed for tree-like order constraints. We also prove that finding approximately optimal adaptive search strategies is NP-hard when certain matroid constraints are used to further restrict the set of boxes which may be opened, or when the order constraints are given as reachability constraints on a DAG. We complement the above result by giving approximate adaptive search strategies based on a connection between optimal adaptive strategies and non-adaptive strategies with bounded adaptivity gap for a carefully relaxed version of the problem

arXiv.org e-Print Archive

Crossref

Archivio della ricerca- Università di Roma La Sapienza

Keeping Your Options Open

Author: Jean Guillaume Forand
Publication venue
Publication date
Field of study

In standard models of experimentation, the costs of project development consist of (i) the direct cost of running trials as well as (ii) the implicit opportunity cost of leaving alternative projects idle. Another natural type of experimentation cost, the cost of holding on to the option of developing a currently inactive project, has not been studied. In a (multi-armed bandit) model of experimentation in which inactive projects have explicit maintenance costs and can be irreversibly discarded, I fully characterise the optimal experimentation policy and show that the decision-maker's incentive to actively manage its options has important implications for the order of project development. In the model, an experimenter searches for a success among a number of projects by choosing both those to develop now and those to maintain for (potential) future development. In the absence of maintenance costs, the optimal experimentation policy has a 'stay-with-the-winner' property: the projects that are more likely to succeed are developed first. Maintenance costs provide incentives to bring the option value of less promising projects forward, and under the optimal experimentation policy, projects that are less likely to succeed are sometimes developed first. A project development strategy of 'going-with-the-loser' strikes a balance between the cost of discarding possibly valuable options and the cost of leaving them open.

Research Papers in Economics

Clustering in Block Markovian Multi-Armed Bandit Problems

Author: Hoogendorp Rens M.
Publication venue
Publication date: 24/07/2021
Field of study

Pure OAI Repository

A survey on the bandit problem with switching costs

Author: A. H. Land
A. Manne
A. McLennan
A. Rustichini
A. T. Ishikida
B. Jovanovic
B. Jovanovic
B. P. McCall
C. Derman
C. H. Loch
C. Lott
C. Santos
D. A. Berry
D. A. Berry
D. A. Black
D. Bergemann
D. Bertsimas
D. Bertsimas
D. Easley
D. G. Pandelis
D. Kr¨ahmer
D. R. Cox
D. T. Mortensen
F. Dusonchet
F. Karaesmen
G. Keller
G. Keller
G. M. MacDonald
G. Weiss
H. He
I. Duenyas
I. Karatzas
J. C. Gittins
J. C. Gittins
J. M. Barron
J. M. Harrison
J. Niño-Mora
J. S. Banks
J. S. Banks
K. H. Schlag
L. Benkherouf
L. Benkherouf
L. Benkherouf
M. Asawa
M. Brezzi
M. Eswaran
M. I. Reiman
M. Kolonko
M. L. Puterman
M. L. Weitzman
M. P. Van Oyen
M. Rothschild
M. Waldman
N. El Karoui
N. Kulatilaka
P. Aghion
P. Kuhn
P. S. Adler
P. Whittle
P. Whittle
P. Whittle
R. A. Miller
R. Agrawal
R. Agrawal
R. Bellman
R. Bellman
R. Cowan
R. H. Thaler
R. McDonald
R. P. Brent
R. R. Weber
R. R. Weber
S. A. Lippman
S. Berninghaus
S. Ross
S. Wilk
T. Jun
T. L. Lai
T. L. Lai
Tackseung Jun
W. E. Smith
W. H. Press
W. K. Viscusi
W. R. Johnson
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Gittins' theorem under uncertainty

Author: Cohen Samuel N.
Treetanthiploet Tanut
Publication venue
Publication date: 12/08/2020
Field of study

We study dynamic allocation problems for discrete time multi-armed bandits under uncertainty, based on the the theory of nonlinear expectations. We show that, under strong independence of the bandits and with some relaxation in the definition of optimality, a Gittins allocation index gives optimal choices. This involves studying the interaction of our uncertainty with controls which determine the filtration. We also run a simple numerical example which illustrates the interaction between the willingness to explore and uncertainty aversion of the agent when making decisions

arXiv.org e-Print Archive

Oxford University Research Archive

Bandit models and Blotto games

Author: Thomas C.D.
Publication venue: UCL (University College London)
Publication date: 01/01/2011
Field of study

In this thesis we present a new take on two classic problems of game theory: the "multiarmed bandit" problem of dynamic learning, and the "Colonel Blotto" game, a multidi- mensional contest. In Chapters 2-4 we treat the questions of experimentation with congestion: how do players search and learn about options when they are competing for access with other players? We consider a bandit model in which two players choose between learning about the quality of a risky option (modelled as a Poisson process with unknown arrival rate), and competing for the use of a single shared safe option that can only be used by one agent at the time. We present the equilibria of the game when switching to the safe option is irrevocable, and when it is not. We show that the equilibrium is always inefficient: it involves too little experimentation when compared to the planner solution. The striking equilibrium dynamics of the game with revocable exit are driven by a strategic option-value arising purely from competition between the players. This constitutes a new result in the bandit literature. Finally we present extensions to the model. In particular we assume that players do not observe the result of their opponent's experimentation. In Chapter 5 we turn to the n-dimensional Blotto game and allow battlefi�elds to have di�fferent values. We describe a geometrical method for constructing equilibrium distribution in the Colonel Blotto game with asymmetric battlfi�eld values. It generalises the 3-dimensional construction method �first described by Gross and Wagner (1950). The proposed method does particularly well in instances of the Colonel Blotto game in which the battlefi�eld weights satisfy some clearly defi�ned regularity conditions. The chapter also explores the parallel between these conditions and the integer partitioning problem in combinatorial optimisation

CiteSeerX

UCL Discovery