Search CORE

197 research outputs found

Dynamic priority allocation via restless bandit marginal productivity indices

Author: Niño-Mora José
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 12/04/2023
Field of study

This paper surveys recent work by the author on the theoretical and algorithmic aspects of restless bandit indexation as well as on its application to a variety of problems involving the dynamic allocation of priority to multiple stochastic projects. The main aim is to present ideas and methods in an accessible form that can be of use to researchers addressing problems of such a kind. Besides building on the rich literature on bandit problems, our approach draws on ideas from linear programming, economics, and multi-objective optimization. In particular, it was motivated to address issues raised in the seminal work of Whittle (Restless bandits: activity allocation in a changing world. In: Gani J. (ed.) A Celebration of Applied Probability, J. Appl. Probab., vol. 25A, Applied Probability Trust, Sheffield, pp. 287-298, 1988) where he introduced the index for restless bandits that is the starting point of this work. Such an index, along with previously proposed indices and more recent extensions, is shown to be unified through the intuitive concept of ``marginal productivity index'' (MPI), which measures the marginal productivity of work on a project at each of its states. In a multi-project setting, MPI policies are economically sound, as they dynamically allocate higher priority to those projects where work appears to be currently more productive. Besides being tractable and widely applicable, a growing body of computational evidence indicates that such index policies typically achieve a near-optimal performance and substantially outperform benchmark policies derived from conventional approaches.Comment: 7 figure

arXiv.org e-Print Archive

Restless bandit marginal productivity indices I: singleproject case and optimal control of a make-to-stock M/G/1 queue

Author: Niño Mora José
Publication venue
Publication date: 01/02/2004
Field of study

This paper develops a framework based on convex optimization and economic ideas to formulate and solve by an index policy the problem of optimal dynamic effort allocation to a generic discrete-state restless bandit (i.e. binary-action: work/rest) project, elucidating a host of issues raised by Whittle (1988)Žs seminal work on the topic. Our contributions include: (i) a unifying definition of a projectŽs marginal productivity index (MPI), characterizing optimal policies; (ii) a complete characterization of indexability (existence of the MPI) as satisfaction by the project of the law of diminishing returns (to effort); (iii) sufficient indexability conditions based on partial conservation laws (PCLs), extending previous results of the author from the finite to the countable state case; (iv) application to a semi-Markov project, including a new MPI for a mixed longrun-average (LRA)/ bias criterion, which exists in relevant queueing control models where the index proposed by Whittle (1988) does not; and (v) optimal MPI policies for service-controlled make-to-order (MTO) and make-to-stock (MTS) M/G/1 queues with convex back order and stock holding cost rates, under discounted and LRA criteria

Universidad Carlos III de Madrid e-Archivo

Marginal Productivity Indices and Linear Programming Relaxations for Dynamic Resource Allocation in Queueing Systems

Author: Cao Jianhua
Publication venue
Publication date: 01/01/2008
Field of study

Many problems concerning resource management in modern communication systems can be simplified to queueing models under Markovian assumptions. The computation of the optimal policy is however often hindered by the curse of dimensionality especially for models that support multiple traffic or job classes. The research focus naturally turns to computationally efficient bounds and high performance heuristics. In this thesis, we apply the indexability theory to the study of admission control of a single server queue and to the buffer sharing problem for a multi-class queueing system. Our main contributions are the following: we derive the Marginal Productivity Index (MPI) and give a sufficient indexability condition for the admission control model by viewing the buffer as the resource; we construct hierarchical Linear Programming (LP) relaxations for the buffer sharing problem and propose an MPI based heuristic with its performance evaluated by discrete event simulation. In our study, the admission control model is used as the building block for the MPI heuristic deployed for the buffer sharing problem. Our condition for indexability only requires that the reward function is concavelike. We also give the explicit non-recursive expression for the MPI calculation. We compare with the previous result of the indexability condition and the MPI for the admission control model that penalizes the rejection action. The study of hierarchical LP relaxations for the buffer sharing problem is based on the exact but intractable LP formulation of the continuous-time Markov Decision Process (MDP). The number of hierarchy levels is equal to the number of job classes. The last one in the hierarchy is exact and corresponds to the exponentially sized LP formulation of the MDP. The first order relaxation is obtained by relaxing the constraint that no buffer overflow may occur in any sample path to the constraint that the average buffer utilization does not exceed the available capacity. Based on the Lagrangian decomposition of the first order relaxation, we propose a heuristic policy based on the concept of MPI. Each one of the decomposed subproblems corresponds to the admission control model we described above. The link to the decomposed sub-problems is the Lagrangian multiplier for the relaxed buffer size constraint in the first order relaxation. Our simulation study indicates the near optimal performance of the heuristic in the (randomly generated) instances investigated

Lund University Publications

RESTLESS BANDIT MARGINAL PRODUCTIVITY INDICES I: SINGLEPROJECT CASE AND OPTIMAL CONTROL OF A MAKE-TO-STOCK M/G/1 QUEUE

Author: José Niño-Mora
Publication venue
Publication date
Field of study

This paper develops a framework based on convex optimization and economic ideas to formulate and solve by an index policy the problem of optimal dynamic effort allocation to a generic discrete-state restless bandit (i.e. binary-action: work/rest) project, elucidating a host of issues raised by Whittle (1988)´s seminal work on the topic. Our contributions include: (i) a unifying definition of a project´s marginal productivity index (MPI), characterizing optimal policies; (ii) a complete characterization of indexability (existence of the MPI) as satisfaction by the project of the law of diminishing returns (to effort); (iii) sufficient indexability conditions based on partial conservation laws (PCLs), extending previous results of the author from the finite to the countable state case; (iv) application to a semi-Markov project, including a new MPI for a mixed longrun-average (LRA)/ bias criterion, which exists in relevant queueing control models where the index proposed by Whittle (1988) does not; and (v) optimal MPI policies for service-controlled make-to-order (MTO) and make-to-stock (MTS) M/G/1 queues with convex back order and stock holding cost rates, under discounted and LRA criteria.

Research Papers in Economics

Characterization and computation of restless bandit marginal productivity indices

Author: Jose Nino-Mora
Publication venue
Publication date
Field of study

The Whittle index [P. Whittle (1988). Restless bandits: Activity allocation in a changing world. J. Appl. Probab. 25A, 287-298] yields a practical scheduling rule for the versatile yet intractable multi-armed restless bandit problem, involving the optimal dynamic priority allocation to multiple stochastic projects, modeled as restless bandits, i.e., binary-action (active/passive) (semi-) Markov decision processes. A growing body of evidence shows that such a rule is nearly optimal in a wide variety of applications, which raises the need to efficiently compute the Whittle index and more general marginal productivity index (MPI) extensions in large-scale models. For such a purpose, this paper extends to restless bandits the parametric linear programming (LP) approach deployed in [J. Niño-Mora. A (2/3)

n^{3}

fast-pivoting algorithm for the Gittins index and optimal stopping of a Markov chain, INFORMS J. Comp., in press], which yielded a fast Gittins-index algorithm. Yet the extension is not straightforward, as the MPI is only defined for the limited range of socalled indexable bandits, which motivates the quest for methods to establish indexability. This paper furnishes algorithmic and analytical tools to realize the potential of MPI policies in largescale applications, presenting the following contributions: (i) a complete algorithmic characterization of indexability, for which two block implementations are given; and (ii) more importantly, new analytical conditions for indexability — termed LP-indexability — that leverage knowledge on the structure of optimal policies in particular models, under which the MPI is computed faster by the adaptive-greedy algorithm previously introduced by the author under the more stringent PCL-indexability conditions, for which a new fast-pivoting block implementation is given. The paper further reports on a computational study, measuring the runtime performance of the algorithms, and assessing by a simulation study the high prevalence of indexability and PCL-indexability.

Research Papers in Economics

Exponential penalty function control of loss networks

Author: Iyengar Garud
Sigman Karl
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 24/03/2005
Field of study

We introduce penalty-function-based admission control policies to approximately maximize the expected reward rate in a loss network. These control policies are easy to implement and perform well both in the transient period as well as in steady state. A major advantage of the penalty approach is that it avoids solving the associated dynamic program. However, a disadvantage of this approach is that it requires the capacity requested by individual requests to be sufficiently small compared to total available capacity. We first solve a related deterministic linear program (LP) and then translate an optimal solution of the LP into an admission control policy for the loss network via an exponential penalty function. We show that the penalty policy is a target-tracking policy--it performs well because the optimal solution of the LP is a good target. We demonstrate that the penalty approach can be extended to track arbitrarily defined target sets. Results from preliminary simulation studies are included.Comment: Published at http://dx.doi.org/10.1214/105051604000000936 in the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Optimization of multiclass queueing networks : polyhedral and nonlinear characterization of achievable performance

Author
Publication venue: Massachusetts Institute of Technology, Laboratory for Information and Decision Systems]
Publication date: 01/01/1992
Field of study

Includes bibliographical references (p. 48-50).Supported by the National Science Foundation. ECS-8552419 Supported by the Presidential Young Investigator Award. DDM-9158118 Supported by the Draper Laboratory and by the Leaders for Manufacturing Program at MIT. Supported by the ARO. DAAL03-92-G0309Dimitris Bertsimas, Ioannis Ch. Paschalidis, John N. Tsitsiklis

DSpace@MIT

การควบคุมการเรียกเข้าในระบบ ดีเอส-ซีดีเอ็มเอไร้สายโดยใช้การเรียนรู้แบบรีอินฟอร์สเมนท์

Author: Pitipong Chanloha
Publication venue: School of Telecommuication Engineering, Institute of Engineering, Suranaree University of Technology
Publication date: 01/01/2006
Field of study

Suranaree University of Technology Intellectual Repository