11 research outputs found

    Pandora's Box Problem with Order Constraints

    Get PDF
    The Pandora's Box Problem, originally formalized by Weitzman in 1979, models selection from set of random, alternative options, when evaluation is costly. This includes, for example, the problem of hiring a skilled worker, where only one hire can be made, but the evaluation of each candidate is an expensive procedure. Weitzman showed that the Pandora's Box Problem admits an elegant, simple solution, where the options are considered in decreasing order of reservation value,i.e., the value that reduces to zero the expected marginal gain for opening the box. We study for the first time this problem when order - or precedence - constraints are imposed between the boxes. We show that, despite the difficulty of defining reservation values for the boxes which take into account both in-depth and in-breath exploration of the various options, greedy optimal strategies exist and can be efficiently computed for tree-like order constraints. We also prove that finding approximately optimal adaptive search strategies is NP-hard when certain matroid constraints are used to further restrict the set of boxes which may be opened, or when the order constraints are given as reachability constraints on a DAG. We complement the above result by giving approximate adaptive search strategies based on a connection between optimal adaptive strategies and non-adaptive strategies with bounded adaptivity gap for a carefully relaxed version of the problem

    Keeping Your Options Open

    Get PDF
    In standard models of experimentation, the costs of project development consist of (i) the direct cost of running trials as well as (ii) the implicit opportunity cost of leaving alternative projects idle. Another natural type of experimentation cost, the cost of holding on to the option of developing a currently inactive project, has not been studied. In a (multi-armed bandit) model of experimentation in which inactive projects have explicit maintenance costs and can be irreversibly discarded, I fully characterise the optimal experimentation policy and show that the decision-maker's incentive to actively manage its options has important implications for the order of project development. In the model, an experimenter searches for a success among a number of projects by choosing both those to develop now and those to maintain for (potential) future development. In the absence of maintenance costs, the optimal experimentation policy has a 'stay-with-the-winner' property: the projects that are more likely to succeed are developed first. Maintenance costs provide incentives to bring the option value of less promising projects forward, and under the optimal experimentation policy, projects that are less likely to succeed are sometimes developed first. A project development strategy of 'going-with-the-loser' strikes a balance between the cost of discarding possibly valuable options and the cost of leaving them open.

    Gittins' theorem under uncertainty

    Full text link
    We study dynamic allocation problems for discrete time multi-armed bandits under uncertainty, based on the the theory of nonlinear expectations. We show that, under strong independence of the bandits and with some relaxation in the definition of optimality, a Gittins allocation index gives optimal choices. This involves studying the interaction of our uncertainty with controls which determine the filtration. We also run a simple numerical example which illustrates the interaction between the willingness to explore and uncertainty aversion of the agent when making decisions

    Bandit models and Blotto games

    Get PDF
    In this thesis we present a new take on two classic problems of game theory: the "multiarmed bandit" problem of dynamic learning, and the "Colonel Blotto" game, a multidi- mensional contest. In Chapters 2-4 we treat the questions of experimentation with congestion: how do players search and learn about options when they are competing for access with other players? We consider a bandit model in which two players choose between learning about the quality of a risky option (modelled as a Poisson process with unknown arrival rate), and competing for the use of a single shared safe option that can only be used by one agent at the time. We present the equilibria of the game when switching to the safe option is irrevocable, and when it is not. We show that the equilibrium is always inefficient: it involves too little experimentation when compared to the planner solution. The striking equilibrium dynamics of the game with revocable exit are driven by a strategic option-value arising purely from competition between the players. This constitutes a new result in the bandit literature. Finally we present extensions to the model. In particular we assume that players do not observe the result of their opponent's experimentation. In Chapter 5 we turn to the n-dimensional Blotto game and allow battlefi�elds to have di�fferent values. We describe a geometrical method for constructing equilibrium distribution in the Colonel Blotto game with asymmetric battlfi�eld values. It generalises the 3-dimensional construction method �first described by Gross and Wagner (1950). The proposed method does particularly well in instances of the Colonel Blotto game in which the battlefi�eld weights satisfy some clearly defi�ned regularity conditions. The chapter also explores the parallel between these conditions and the integer partitioning problem in combinatorial optimisation
    corecore