33 research outputs found

    Dynamic priority allocation via restless bandit marginal productivity indices

    Full text link
    This paper surveys recent work by the author on the theoretical and algorithmic aspects of restless bandit indexation as well as on its application to a variety of problems involving the dynamic allocation of priority to multiple stochastic projects. The main aim is to present ideas and methods in an accessible form that can be of use to researchers addressing problems of such a kind. Besides building on the rich literature on bandit problems, our approach draws on ideas from linear programming, economics, and multi-objective optimization. In particular, it was motivated to address issues raised in the seminal work of Whittle (Restless bandits: activity allocation in a changing world. In: Gani J. (ed.) A Celebration of Applied Probability, J. Appl. Probab., vol. 25A, Applied Probability Trust, Sheffield, pp. 287-298, 1988) where he introduced the index for restless bandits that is the starting point of this work. Such an index, along with previously proposed indices and more recent extensions, is shown to be unified through the intuitive concept of ``marginal productivity index'' (MPI), which measures the marginal productivity of work on a project at each of its states. In a multi-project setting, MPI policies are economically sound, as they dynamically allocate higher priority to those projects where work appears to be currently more productive. Besides being tractable and widely applicable, a growing body of computational evidence indicates that such index policies typically achieve a near-optimal performance and substantially outperform benchmark policies derived from conventional approaches.Comment: 7 figure

    Edge Computing in the Dark: Leveraging Contextual-Combinatorial Bandit and Coded Computing

    Full text link
    With recent advancements in edge computing capabilities, there has been a significant increase in utilizing the edge cloud for event-driven and time-sensitive computations. However, large-scale edge computing networks can suffer substantially from unpredictable and unreliable computing resources which can result in high variability of service quality. Thus, it is crucial to design efficient task scheduling policies that guarantee quality of service and the timeliness of computation queries. In this paper, we study the problem of computation offloading over unknown edge cloud networks with a sequence of timely computation jobs. Motivated by the MapReduce computation paradigm, we assume each computation job can be partitioned to smaller Map functions that are processed at the edge, and the Reduce function is computed at the user after the Map results are collected from the edge nodes. We model the service quality (success probability of returning result back to the user within deadline) of each edge device as function of context (collection of factors that affect edge devices). The user decides the computations to offload to each device with the goal of receiving a recoverable set of computation results in the given deadline. Our goal is to design an efficient edge computing policy in the dark without the knowledge of the context or computation capabilities of each device. By leveraging the \emph{coded computing} framework in order to tackle failures or stragglers in computation, we formulate this problem using contextual-combinatorial multi-armed bandits (CC-MAB), and aim to maximize the cumulative expected reward. We propose an online learning policy called \emph{online coded edge computing policy}, which provably achieves asymptotically-optimal performance in terms of regret loss compared with the optimal offline policy for the proposed CC-MAB problem

    Characterization and computation of restless bandit marginal productivity indices

    Get PDF
    The Whittle index [P. Whittle (1988). Restless bandits: Activity allocation in a changing world. J. Appl. Probab. 25A, 287-298] yields a practical scheduling rule for the versatile yet intractable multi-armed restless bandit problem, involving the optimal dynamic priority allocation to multiple stochastic projects, modeled as restless bandits, i.e., binary-action (active/passive) (semi-) Markov decision processes. A growing body of evidence shows that such a rule is nearly optimal in a wide variety of applications, which raises the need to efficiently compute the Whittle index and more general marginal productivity index (MPI) extensions in large-scale models. For such a purpose, this paper extends to restless bandits the parametric linear programming (LP) approach deployed in [J. Niño-Mora. A (2/3)n3n^{3} fast-pivoting algorithm for the Gittins index and optimal stopping of a Markov chain, INFORMS J. Comp., in press], which yielded a fast Gittins-index algorithm. Yet the extension is not straightforward, as the MPI is only defined for the limited range of socalled indexable bandits, which motivates the quest for methods to establish indexability. This paper furnishes algorithmic and analytical tools to realize the potential of MPI policies in largescale applications, presenting the following contributions: (i) a complete algorithmic characterization of indexability, for which two block implementations are given; and (ii) more importantly, new analytical conditions for indexability — termed LP-indexability — that leverage knowledge on the structure of optimal policies in particular models, under which the MPI is computed faster by the adaptive-greedy algorithm previously introduced by the author under the more stringent PCL-indexability conditions, for which a new fast-pivoting block implementation is given. The paper further reports on a computational study, measuring the runtime performance of the algorithms, and assessing by a simulation study the high prevalence of indexability and PCL-indexability.

    Generalized restless bandits and the knapsack problem for perishable inventories

    Get PDF
    In this paper we introduce the Knapsack Problem for Perishable Inventories concerning the optimal dynamic allocation of a collection of products to a limited knapsack. The motivation for designing such a problem comes from retail revenue management, where different products often have an associated lifetime during which they can only be sold, and the managers can regularly select some products to be allocated to a limited promotion space which is expected to attract more customers than the standard shelves. Another motivation comes from scheduling of requests in modern multi-server data centers so that Quality-of-Service requirements given by completion deadlines are satised. Using the Lagrangian approach we derive an optimal index policy for the Whittle relaxation of the problem in which the knapsack capacity is used only on average. Assuming a certain structure of the optimal policy for the single-inventory control, we prove indexability and derive an efficient, linear-time algorithm for computing the index values. To the best of our knowledge, our paper is the first to provide indexability analysis of a restless bandit with bi-dimensional state (lifetime and inventory level). We illustrate that these index values are numerically close to the true index values when such a structure is not present. We test two index-based heuristics for the original, non-relaxed problem: (1) a conventional index rule, which prescribes to order the products according to their current index values and promote as many products as fit in the knapsack, and (2) a recently proposed index-knapsack heuristic, which employs the index values as a proxy for the price of promotion and proposes to solve a deterministic knapsack problem to select the products. By a systematic computational study we show that the performance of both heuristics is nearly-optimal, and that the index-knapsack heuristic outperforms the conventional index rule

    A Dynamic Optimization Approach to the Scheduling Problem in Satellite and Wireless Broadcast Systems

    Get PDF
    The continuous growth in the demand for access to information andthe increasing number of users of the information delivery systemshave sparked the need for highly scalable systems with moreefficient usage of the bandwidth. One of the effective methods forefficient use of the bandwidth is to provide the information to agroup of users simultaneously via broadcast delivery. Generally,all applications that deliver the popular data packages (trafficinformation, weather, stocks, web pages) are suitable candidatesfor broadcast delivery and satellite or wireless networks withtheir inherent broadcast capability are the natural choices forimplementing such applications.In this report, we investigate one of the most important problemsin broadcast delivery i.e., the broadcast scheduling problem. Thisproblem arises in broadcast systems with a large number of datapackages and limited broadcast channels and the goal is to findthe best sequence of broadcasts in order to minimize the average waiting time of the users.We first formulate the problem as a dynamic optimization problemand investigate the properties of the optimal solution. Later, weuse the bandit problem formulation to address a version of theproblem where all packages have equal lengths. We find anasymptotically optimal index policy for that problem and comparethe results with some well-known heuristic methods
    corecore