169 research outputs found

    Dynamic priority allocation via restless bandit marginal productivity indices

    Full text link
    This paper surveys recent work by the author on the theoretical and algorithmic aspects of restless bandit indexation as well as on its application to a variety of problems involving the dynamic allocation of priority to multiple stochastic projects. The main aim is to present ideas and methods in an accessible form that can be of use to researchers addressing problems of such a kind. Besides building on the rich literature on bandit problems, our approach draws on ideas from linear programming, economics, and multi-objective optimization. In particular, it was motivated to address issues raised in the seminal work of Whittle (Restless bandits: activity allocation in a changing world. In: Gani J. (ed.) A Celebration of Applied Probability, J. Appl. Probab., vol. 25A, Applied Probability Trust, Sheffield, pp. 287-298, 1988) where he introduced the index for restless bandits that is the starting point of this work. Such an index, along with previously proposed indices and more recent extensions, is shown to be unified through the intuitive concept of ``marginal productivity index'' (MPI), which measures the marginal productivity of work on a project at each of its states. In a multi-project setting, MPI policies are economically sound, as they dynamically allocate higher priority to those projects where work appears to be currently more productive. Besides being tractable and widely applicable, a growing body of computational evidence indicates that such index policies typically achieve a near-optimal performance and substantially outperform benchmark policies derived from conventional approaches.Comment: 7 figure

    Generalized restless bandits and the knapsack problem for perishable inventories

    Get PDF
    In this paper we introduce the Knapsack Problem for Perishable Inventories concerning the optimal dynamic allocation of a collection of products to a limited knapsack. The motivation for designing such a problem comes from retail revenue management, where different products often have an associated lifetime during which they can only be sold, and the managers can regularly select some products to be allocated to a limited promotion space which is expected to attract more customers than the standard shelves. Another motivation comes from scheduling of requests in modern multi-server data centers so that Quality-of-Service requirements given by completion deadlines are satised. Using the Lagrangian approach we derive an optimal index policy for the Whittle relaxation of the problem in which the knapsack capacity is used only on average. Assuming a certain structure of the optimal policy for the single-inventory control, we prove indexability and derive an efficient, linear-time algorithm for computing the index values. To the best of our knowledge, our paper is the first to provide indexability analysis of a restless bandit with bi-dimensional state (lifetime and inventory level). We illustrate that these index values are numerically close to the true index values when such a structure is not present. We test two index-based heuristics for the original, non-relaxed problem: (1) a conventional index rule, which prescribes to order the products according to their current index values and promote as many products as fit in the knapsack, and (2) a recently proposed index-knapsack heuristic, which employs the index values as a proxy for the price of promotion and proposes to solve a deterministic knapsack problem to select the products. By a systematic computational study we show that the performance of both heuristics is nearly-optimal, and that the index-knapsack heuristic outperforms the conventional index rule

    Weakly Coupled Deep Q-Networks

    Full text link
    We propose weakly coupled deep Q-networks (WCDQN), a novel deep reinforcement learning algorithm that enhances performance in a class of structured problems called weakly coupled Markov decision processes (WCMDP). WCMDPs consist of multiple independent subproblems connected by an action space constraint, which is a structural property that frequently emerges in practice. Despite this appealing structure, WCMDPs quickly become intractable as the number of subproblems grows. WCDQN employs a single network to train multiple DQN "subagents", one for each subproblem, and then combine their solutions to establish an upper bound on the optimal action value. This guides the main DQN agent towards optimality. We show that the tabular version, weakly coupled Q-learning (WCQL), converges almost surely to the optimal action value. Numerical experiments show faster convergence compared to DQN and related techniques in settings with as many as 10 subproblems, 3103^{10} total actions, and a continuous state space.Comment: To appear in proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS 2023

    Energy Efficient Scheduling for Loss Tolerant IoT Applications with Uninformed Transmitter

    Get PDF
    In this work we investigate energy efficient packet scheduling problem for the loss tolerant applications. We consider slow fading channel for a point to point connection with no channel state information at the transmitter side (CSIT). In the absence of CSIT, the slow fading channel has an outage probability associated with every transmit power. As a function of data loss tolerance parameters and peak power constraints, we formulate an optimization problem to minimize the average transmit energy for the user equipment (UE). The optimization problem is not convex and we use stochastic optimization technique to solve the problem. The numerical results quantify the effect of different system parameters on average transmit power and show significant power savings for the loss tolerant applications.Comment: Published in ICC 201

    Marginal productivity index policies for dynamic priority allocation in restless bandit models

    Get PDF
    Esta tesis estudia tres complejos problemas dinámicos y estocásticos de asignación de recursos: (i) Enrutamiento y control de admisión con información retrasada, (ii) Promoción dinámica de productos y el Problema de la mochila para artículos perecederos, y (iii) Control de congestión en “routers” con información del recorrido futuro. Debido a que la solución óptima de estos problemas no es asequible computacionalmente a gran y mediana escala, nos concentramos en cambio en diseñar políticas heurísticas de prioridad que sean computacionalmente tratables y cuyo rendimiento sea cuasi-óptimo. Modelizamos los problemas arriba mencionados como problemas de “multi-armed restless bandit” en el marco de procesos de decisión Markovianos con estructura especial. Empleamos y enriquecemos resultados existentes en la literatura, que constituyen un principio unificador para el diseño de políticas de índices de prioridad basadas en la relajación Lagrangiana y la descomposición de dichos problemas. Esta descomposición permite considerar subproblemas de optimización paramétrica, y en ciertos casos “indexables”, resolverlos de manera óptima mediante el índice de productividad marginal (MP). El índice MP es usado como medida de prioridad dinámica para definir reglas heurísticas de prioridad para los problemas originales intratables. Para cada uno de los problemas bajo consideración realizamos tal descomposición, identificamos las condiciones de indexabilidad, y obtenemos fórmulas para los índices MP o algoritmos computacionalmente tratables para su cálculo. Los índices MP correspondientes a cada uno de estos tres problemas pueden ser interpretados en términos de prioridades como el nivel de: (i) la penalización de dirigir un trabajo a una cola particular, (ii) la necesidad de promocionar un cierto artículo perecedero, y (iii) la utilidad de una transmisión de flujo particular. Además de la contribución práctica de la obtención de reglas heurísticas de prioridad para los tres problemas analizados, las principales contribuciones teóricas son las siguientes: (i) un algoritmo lineal en el tiempo para el cómputo de los índices MP en el problema de control de admisión con información retrasada, igualando, por lo tanto, la complejidad del mejor algoritmo existente para el caso sin retrasos, (ii) un nuevo tipo de política de índice de prioridad basada en la resolución de un problema (determinista) de la mochila, y (iii) una nueva extensión del modelo existente de “multi-armed restless bandit” a través de la incorporación de las llegadas aleatorias de los “restless bandits”.This dissertation addresses three complex stochastic and dynamic resource allocation problems: (i) Admission Control and Routing with Delayed Information, (ii) Dynamic Product Promotion and Knapsack Problem for Perishable Items, and (iii) Congestion Control in Routers with Future-Path Information. Since these problems are intractable for finding an optimal solution at middle and large scale, we instead focus on designing tractable and well-performing heuristic priority rules. We model the above problems as the multi-armed restless bandit problems in the framework of Markov decision processes with special structure. We employ and enrich existing results in the literature, which identified a unifying principle to design dynamic priority index policies based on the Lagrangian relaxation and decomposition of such problems. This decomposition allows one to consider parametric-optimization subproblems and, in certain “indexable” cases, to solve them optimally via the marginal productivity (MP) index. The MP index is then used as a dynamic priority measure to define heuristic priority rules for the original intractable problems. For each of the problems considered we perform such a decomposition, identify indexability conditions, and obtain formulae for the MP indices or tractable algorithms for their computation. The MP indices admit the following priority interpretations in the three respective problems: (i) undesirability for routing a job to a particular queue, (ii) promotion necessity of a particular perishable product, and (iii) usefulness of a particular flow transmission. Apart from the practical contribution of deriving the heuristic priority rules for the three intractable problems considered, our main theoretical contributions are the following: (i) a linear-time algorithm for computing MP indices in the admission control problem with delayed information, matching thus the complexity of the best existing algorithm under no delays, (ii) a new type of priority index policy based on solving a (deterministic) knapsack problem, and (iii) a new extension of the existing multi-armed restless bandit model by incorporating random arrivals of restless bandits
    corecore