27 research outputs found

    Marginal productivity index policies for scheduling restless bandits with switching penalties

    Get PDF

    Two-stage index computation for bandits with switching penalties II : switching delays

    Get PDF
    This paper addresses the multi-armed bandit problem with switching penalties including both costs and delays, extending results of the companion paper [J. Niño-Mora. "Two-Stage Index Computation for Bandits with Switching Penalties I: Switching Costs". Conditionally accepted at INFORMS J. Comp.], which addressed the no switching delays case. Asawa and Teneketzis (1996) introduced an index for bandits with delays that partly characterizes optimal policies, attaching to each bandit state a "continuation index" (its Gittins index) and a "switching index", yet gave no algorithm for it. This paper presents an efficient, decoupled computation method, which in a first stage computes the continuation index and then, in a second stage, computes the switching index an order of magnitude faster in at most (5/2)n^3+O(n) arithmetic operations for an n -state bandit. The paper exploits the fact that the Asawa and Teneketzis index is the Whittle, or marginal productivity, index of a classic bandit with switching penalties in its semi- Markov restless reformulation, by deploying work-reward analysis and LP-indexability methods introduced by the author. A computational study demonstrates the dramatic runtime savings achieved by the new algorithm, the near-optimality of the index policy, and its substantial gains against a benchmark index policy across a wide instance range

    Two-stage index computation for bandits with switching penalties I : switching costs

    Get PDF
    This paper addresses the multi-armed bandit problem with switching costs. Asawa and Teneketzis (1996) introduced an index that partly characterizes optimal policies, attaching to each bandit state a "continuation index" (its Gittins index) and a "switching index". They proposed to jointly compute both as the Gittins index of a bandit having 2n states — when the original bandit has n states — which results in an eight-fold increase in O(n^3) arithmetic operations relative to those to compute the continuation index alone. This paper presents a more efficient, decoupled computation method, which in a first stage computes the continuation index and then, in a second stage, computes the switching index an order of magnitude faster in at most n^2+O(n) arithmetic operations. The paper exploits the fact that the Asawa and Teneketzis index is the Whittle, or marginal productivity, index of a classic bandit with switching costs in its restless reformulation, by deploying work-reward analysis and PCL-indexability methods introduced by the author. A computational study demonstrates the dramatic runtime savings achieved by the new algorithm, the near-optimality of the index policy, and its substantial gains against the benchmark Gittins index policy across a wide range of instances

    Fast two-stage computation of an index policy for multi-armed bandits with setup delays

    Get PDF
    We consider the multi-armed bandit problem with penalties for switching that include setup delays and costs, extending the former results of the author for the special case with no switching delays. A priority index for projects with setup delays that characterizes, in part, optimal policies was introduced by Asawa and Teneketzis in 1996, yet without giving a means of computing it. We present a fast two-stage index computing method, which computes the continuation index (which applies when the project has been set up) in a first stage and certain extra quantities with cubic (arithmetic-operation) complexity in the number of project states and then computes the switching index (which applies when the project is not set up), in a second stage, with quadratic complexity. The approach is based on new methodological advances on restless bandit indexation, which are introduced and deployed herein, being motivated by the limitations of previous results, exploiting the fact that the aforementioned index is the Whittle index of the project in its restless reformulation. A numerical study demonstrates substantial runtime speed-ups of the new two-stage index algorithm versus a general one-stage Whittle index algorithm. The study further gives evidence that, in a multi-project setting, the index policy is consistently nearly optimal

    Two-stage index computation for bandits with switching penalties II : switching delays

    Get PDF
    This paper addresses the multi-armed bandit problem with switching penalties including both costs and delays, extending results of the companion paper [J. Niño-Mora. "Two-Stage Index Computation for Bandits with Switching Penalties I: Switching Costs". Conditionally accepted at INFORMS J. Comp.], which addressed the no switching delays case. Asawa and Teneketzis (1996) introduced an index for bandits with delays that partly characterizes optimal policies, attaching to each bandit state a "continuation index" (its Gittins index) and a "switching index", yet gave no algorithm for it. This paper presents an efficient, decoupled computation method, which in a first stage computes the continuation index and then, in a second stage, computes the switching index an order of magnitude faster in at most (5/2)n3n^{3}+O(n) arithmetic operations for an n -state bandit. The paper exploits the fact that the Asawa and Teneketzis index is the Whittle, or marginal productivity, index of a classic bandit with switching penalties in its semi- Markov restless reformulation, by deploying work-reward analysis and LP-indexability methods introduced by the author. A computational study demonstrates the dramatic runtime savings achieved by the new algorithm, the near-optimality of the index policy, and its substantial gains against a benchmark index policy across a wide instance range.

    Two-stage index computation for bandits with switching penalties I : switching costs

    Get PDF
    This paper addresses the multi-armed bandit problem with switching costs. Asawa and Teneketzis (1996) introduced an index that partly characterizes optimal policies, attaching to each bandit state a "continuation index" (its Gittins index) and a "switching index". They proposed to jointly compute both as the Gittins index of a bandit having 2n states — when the original bandit has n states — which results in an eight-fold increase in O(n3n^{3}) arithmetic operations relative to those to compute the continuation index alone. This paper presents a more efficient, decoupled computation method, which in a first stage computes the continuation index and then, in a second stage, computes the switching index an order of magnitude faster in at most n2n^{2}+O(n) arithmetic operations. The paper exploits the fact that the Asawa and Teneketzis index is the Whittle, or marginal productivity, index of a classic bandit with switching costs in its restless reformulation, by deploying work-reward analysis and PCL-indexability methods introduced by the author. A computational study demonstrates the dramatic runtime savings achieved by the new algorithm, the near-optimality of the index policy, and its substantial gains against the benchmark Gittins index policy across a wide range of instances.

    Dynamic priority allocation via restless bandit marginal productivity indices

    Full text link
    This paper surveys recent work by the author on the theoretical and algorithmic aspects of restless bandit indexation as well as on its application to a variety of problems involving the dynamic allocation of priority to multiple stochastic projects. The main aim is to present ideas and methods in an accessible form that can be of use to researchers addressing problems of such a kind. Besides building on the rich literature on bandit problems, our approach draws on ideas from linear programming, economics, and multi-objective optimization. In particular, it was motivated to address issues raised in the seminal work of Whittle (Restless bandits: activity allocation in a changing world. In: Gani J. (ed.) A Celebration of Applied Probability, J. Appl. Probab., vol. 25A, Applied Probability Trust, Sheffield, pp. 287-298, 1988) where he introduced the index for restless bandits that is the starting point of this work. Such an index, along with previously proposed indices and more recent extensions, is shown to be unified through the intuitive concept of ``marginal productivity index'' (MPI), which measures the marginal productivity of work on a project at each of its states. In a multi-project setting, MPI policies are economically sound, as they dynamically allocate higher priority to those projects where work appears to be currently more productive. Besides being tractable and widely applicable, a growing body of computational evidence indicates that such index policies typically achieve a near-optimal performance and substantially outperform benchmark policies derived from conventional approaches.Comment: 7 figure

    A Fast-Pivoting Algorithm for Whittle's Restless Bandit Index

    Get PDF
    This article belongs to the Special Issue Applied ProbabilityThe Whittle index for restless bandits (two-action semi-Markov decision processes) provides an intuitively appealing optimal policy for controlling a single generic project that can be active (engaged) or passive (rested) at each decision epoch, and which can change state while passive. It further provides a practical heuristic priority-index policy for the computationally intractable multi-armed restless bandit problem, which has been widely applied over the last three decades in multifarious settings, yet mostly restricted to project models with a one-dimensional state. This is due in part to the difficulty of establishing indexability (existence of the index) and of computing the index for projects with large state spaces. This paper draws on the author’s prior results on sufficient indexability conditions and an adaptive-greedy algorithmic scheme for restless bandits to obtain a new fast-pivoting algorithm that computes the n Whittle index values of an n-state restless bandit by performing, after an initialization stage, n steps that entail (2/3)n3+O(n2) arithmetic operations. This algorithm also draws on the parametric simplex method, and is based on elucidating the pattern of parametric simplex tableaux, which allows to exploit special structure to substantially simplify and reduce the complexity of simplex pivoting steps. A numerical study demonstrates substantial runtime speed-ups versus alternative algorithms.This research has been developed over a number of years, and has been funded by the Spanish Government under grants MEC MTM2004-02334 and PID2019-109196GB-I00/AEI/10.13039/501100011033. This research was also funded in part by the Comunidad de Madrid in the setting of the multi-year agreement with Universidad Carlos III de Madrid within the line of activity "Excelencia para el Profesorado Universitario", in the framework of the V Regional Plan of Scientific Research and Technological Innovation 2016-2020

    Decomposition methods for large scale stochastic and robust optimization problems

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 107-112).We propose new decomposition methods for use on broad families of stochastic and robust optimization problems in order to yield tractable approaches for large-scale real world application. We introduce a new type of a Markov decision problem named the Generalized Rest less Bandits Problem that encompasses a broad generalization of the restless bandit problem. For this class of stochastic optimization problems, we develop a nested policy heuristic which iteratively solves a series of sub-problems operating on smaller bandit systems. We also develop linear-optimization based bounds for the Generalized Restless Bandit problem and demonstrate promising computational performance of the nested policy heuristic on a large-scale real world application of search term selection for sponsored search advertising. We further study the distributionally robust optimization problem with known mean, covariance and support. These optimization models are attractive in their real world applications as they require the model consumer to only rely on those statistics of uncertainty that are known with relative confidence rather than making arbitrary assumptions about the exact dynamics of the underlying distribution of uncertainty. Known to be AP - hard, current approaches invoke tractable but often weak relaxations for real-world applications. We develop a decomposition method for this family of problems which recursively derives sub-policies along projected dimensions of uncertainty and provides a sequence of bounds on the value of the derived policy. In the development of this method, we prove that non-convex quadratic optimization in n-dimensions over a box in two-dimensions is efficiently solvable. We also show that this same decomposition method yields a promising heuristic for the MAXCUT problem. We then provide promising computational results in the context of a real world fixed income portfolio optimization problem. The decomposition methods developed in this thesis recursively derive sub-policies on projected dimensions of the master problem. These sub-policies are optimal on relaxations which admit "tight" projections of the master problem; that is, the projection of the feasible region for the relaxation is equivalent to the projection of that of master problem along the dimensions of the sub-policy. Additionally, these decomposition strategies provide a hierarchical solution structure that aids in solving large-scale problems.by Adrian Bernard Druke Becker.Ph.D
    corecore