    In this paper we consider the problem of admission control of Bernoulli arrivals to a buffer with geometric server, in which the controller’s actions take effect one period after the actual change in the queue length. An optimal policy in terms of marginal productivity indices (MPI) is derived for this problem under the following three performance objectives: (i) minimization of the expected total discounted sum of holding costs and rejection costs, (ii) minimization of the expected time-average sum of holding costs and rejection costs, and (iii) maximization of the expected time-average number of job completions. Our employment of existing theoretical and algorithmic results on restless bandit indexation together with some new results yields a fast algorithm that computes the MPI for a queue with a buffer size of I performing only O(I) arithmetic operations. Such MPI values can be used both to immediately obtain the optimal thresholds for the admission control problem, and to design an index policy for the routing problem (with possible admission control) in the multi-queue system. Thus, this paper further addresses the problem of designing and computing a tractable heuristic policy for dynamic job admission control and/or routing in a discrete time Markovian model of parallel loss queues with one-period delayed state observation and/or action implementation, which comes close to optimizing an infinite-horizon problem under the above three objectives. Our approach seems to be tractable also for the analogous problems with larger delays and, more generally, for arbitrary restless bandits with delays

    In this paper we consider the problem of admission control of Bernoulli arrivals to a buffer with geometric server, in which the controller’s actions take effect one period after the actual change in the queue length. An optimal policy in terms of marginal productivity indices (MPI) is derived for this problem under the following three performance objectives: (i) minimization of the expected total discounted sum of holding costs and rejection costs, (ii) minimization of the expected time-average sum of holding costs and rejection costs, and (iii) maximization of the expected time-average number of job completions. Our employment of existing theoretical and algorithmic results on restless bandit indexation together with some new results yields a fast algorithm that computes the MPI for a queue with a buffer size of I performing only O(I) arithmetic operations. Such MPI values can be used both to immediately obtain the optimal thresholds for the admission control problem, and to design an index policy for the routing problem (with possible admission control) in the multi-queue system. Thus, this paper further addresses the problem of designing and computing a tractable heuristic policy for dynamic job admission control and/or routing in a discrete time Markovian model of parallel loss queues with one-period delayed state observation and/or action implementation, which comes close to optimizing an infinite-horizon problem under the above three objectives. Our approach seems to be tractable also for the analogous problems with larger delays and, more generally, for arbitrary restless bandits with delays.Admission control, Routing, Parallel queues, Delayed information, Delayed action implementation, Index policy, Restless bandits, Marginal productivity index

    Esta tesis estudia tres complejos problemas dinámicos y estocásticos de asignación de recursos: (i) Enrutamiento y control de admisión con información retrasada, (ii) Promoción dinámica de productos y el Problema de la mochila para artículos perecederos, y (iii) Control de congestión en “routers” con información del recorrido futuro. Debido a que la solución óptima de estos problemas no es asequible computacionalmente a gran y mediana escala, nos concentramos en cambio en diseñar políticas heurísticas de prioridad que sean computacionalmente tratables y cuyo rendimiento sea cuasi-óptimo. Modelizamos los problemas arriba mencionados como problemas de “multi-armed restless bandit” en el marco de procesos de decisión Markovianos con estructura especial. Empleamos y enriquecemos resultados existentes en la literatura, que constituyen un principio unificador para el diseño de políticas de índices de prioridad basadas en la relajación Lagrangiana y la descomposición de dichos problemas. Esta descomposición permite considerar subproblemas de optimización paramétrica, y en ciertos casos “indexables”, resolverlos de manera óptima mediante el índice de productividad marginal (MP). El índice MP es usado como medida de prioridad dinámica para definir reglas heurísticas de prioridad para los problemas originales intratables. Para cada uno de los problemas bajo consideración realizamos tal descomposición, identificamos las condiciones de indexabilidad, y obtenemos fórmulas para los índices MP o algoritmos computacionalmente tratables para su cálculo. Los índices MP correspondientes a cada uno de estos tres problemas pueden ser interpretados en términos de prioridades como el nivel de: (i) la penalización de dirigir un trabajo a una cola particular, (ii) la necesidad de promocionar un cierto artículo perecedero, y (iii) la utilidad de una transmisión de flujo particular. Además de la contribución práctica de la obtención de reglas heurísticas de prioridad para los tres problemas analizados, las principales contribuciones teóricas son las siguientes: (i) un algoritmo lineal en el tiempo para el cómputo de los índices MP en el problema de control de admisión con información retrasada, igualando, por lo tanto, la complejidad del mejor algoritmo existente para el caso sin retrasos, (ii) un nuevo tipo de política de índice de prioridad basada en la resolución de un problema (determinista) de la mochila, y (iii) una nueva extensión del modelo existente de “multi-armed restless bandit” a través de la incorporación de las llegadas aleatorias de los “restless bandits”.This dissertation addresses three complex stochastic and dynamic resource allocation problems: (i) Admission Control and Routing with Delayed Information, (ii) Dynamic Product Promotion and Knapsack Problem for Perishable Items, and (iii) Congestion Control in Routers with Future-Path Information. Since these problems are intractable for finding an optimal solution at middle and large scale, we instead focus on designing tractable and well-performing heuristic priority rules. We model the above problems as the multi-armed restless bandit problems in the framework of Markov decision processes with special structure. We employ and enrich existing results in the literature, which identified a unifying principle to design dynamic priority index policies based on the Lagrangian relaxation and decomposition of such problems. This decomposition allows one to consider parametric-optimization subproblems and, in certain “indexable” cases, to solve them optimally via the marginal productivity (MP) index. The MP index is then used as a dynamic priority measure to define heuristic priority rules for the original intractable problems. For each of the problems considered we perform such a decomposition, identify indexability conditions, and obtain formulae for the MP indices or tractable algorithms for their computation. The MP indices admit the following priority interpretations in the three respective problems: (i) undesirability for routing a job to a particular queue, (ii) promotion necessity of a particular perishable product, and (iii) usefulness of a particular flow transmission. Apart from the practical contribution of deriving the heuristic priority rules for the three intractable problems considered, our main theoretical contributions are the following: (i) a linear-time algorithm for computing MP indices in the admission control problem with delayed information, matching thus the complexity of the best existing algorithm under no delays, (ii) a new type of priority index policy based on solving a (deterministic) knapsack problem, and (iii) a new extension of the existing multi-armed restless bandit model by incorporating random arrivals of restless bandits

    In this work we propose a roadmap towards the analytical understanding of Device-to-Device (D2D) communications in LTE-A networks. Various D2D solutions have been proposed, which include inband and outband D2D transmission modes, each of which exhibits different pros and cons in terms of complexity, interference, and spectral efficiency achieved. We go beyond traditional mode optimization and mode-selection schemes. Specifically, we formulate a general problem for the joint per-user mode selection, connection activation and resource scheduling of connections.Comment: A shorter version of this manuscript is accepted for publication in MAMA workshop collocated with Sigmetrics'1

    This paper introduces the knapsack problem for perishable items (KPPI), which concerns the optimal dynamic allocation of a limited promotion space to a collection of perishable items. Such a problem is motivated by applications in a variety of industries, where products have an associated lifetime after which they cannot be sold. The paper builds on recent developments on restless bandit indexation and gives an optimal marginal productivity index policy for the dynamic (single) product promotion problem with closed-form indices that yield estructural insights. The performance of the proposed policy for KPPI is investigated in a computational study.Dynamic promotion, Perishable items, Index policies, Knapsack problem, Festless bandits, Finite horizon, Marginal productivity index

    In this paper we present a generic Markov decision process model of optimal single resource allocation to a collection of stochastic dynamic competitors. The main goal is to identify sufficient conditions under which this problem is optimally solved by an index rule. The main focus is on the frozen-if-not-allocated assumption, which is notoriously found in problems including the multi-armed bandit problem, tax problem, Klimov network, job sequencing, object search and detection. The problem is approached by a Lagrangian relaxation and decomposed into a collection of normalized parametric single-competitor subproblems, which are then optimally solved by the well-known Gittins index. We show that the problem is equivalent to solving a time sequence of its Lagrangian relaxations. We further show that our approach gives insights on sufficient conditions for optimality of index rules in restless problems (in which the frozen-if-not-allocated assumption is dropped) with single resource; this paper is the first to prove such conditions

    In this paper we introduce the Knapsack Problem for Perishable Inventories concerning the optimal dynamic allocation of a collection of products to a limited knapsack. The motivation for designing such a problem comes from retail revenue management, where different products often have an associated lifetime during which they can only be sold, and the managers can regularly select some products to be allocated to a limited promotion space which is expected to attract more customers than the standard shelves. Another motivation comes from scheduling of requests in modern multi-server data centers so that Quality-of-Service requirements given by completion deadlines are satised. Using the Lagrangian approach we derive an optimal index policy for the Whittle relaxation of the problem in which the knapsack capacity is used only on average. Assuming a certain structure of the optimal policy for the single-inventory control, we prove indexability and derive an efficient, linear-time algorithm for computing the index values. To the best of our knowledge, our paper is the first to provide indexability analysis of a restless bandit with bi-dimensional state (lifetime and inventory level). We illustrate that these index values are numerically close to the true index values when such a structure is not present. We test two index-based heuristics for the original, non-relaxed problem: (1) a conventional index rule, which prescribes to order the products according to their current index values and promote as many products as fit in the knapsack, and (2) a recently proposed index-knapsack heuristic, which employs the index values as a proxy for the price of promotion and proposes to solve a deterministic knapsack problem to select the products. By a systematic computational study we show that the performance of both heuristics is nearly-optimal, and that the index-knapsack heuristic outperforms the conventional index rule

    We address the problem of developing a well-performing and implementable scheduler of users with wireless connections to the central controller, which arise in areas such as mobile data networks, heterogeneous networks, or vehicular communications systems. The main feature of such systems is that the connection quality of each user is time-varying, resulting in time-varying transmission rate corresponding to available channel states. We assume that this evolution is Markovian, relaxing the common but unrealistic assumption of stationary channels. We first focus on the three-state channel and study the optimal policy, showing that threshold policies (of giving higher priority to users with higher transmission rate) are not necessarily optimal. For the general channel we design a scheduler which generalizes the recently proposed Potential Improvement (PI) scheduler, and propose its two practical approximations, whose performance is analyzed and compared to existing alternative schedulers in a variety of simulation scenarios. We suggest and give evidence that the variant of PI which only relies on the steady-state distribution of the channel, performs extremely well, and therefore should be used for practical implementation

    We consider Content Centric Network (CCN) interest forwarding problem as a Multi-Armed Bandit (MAB) problem with delays. We investigate the transient behaviour of the epsilon-greedy, tuned epsilon-greedy and Upper Confidence Bound (UCB) interest forwarding policies. Surprisingly, for all the three policies very short initial exploratory phase is needed. We demonstrate that the tuned epsilon-greedy algorithm is nearly as good as the UCB algorithm, commonly reported as the best currently available algorithm. We prove the uniform logarithmic bound for the tuned epsilon-greedy algorithm in the presence of delays. In addition to its immediate application to CCN interest forwarding, the new theoretical results for MAB problem with delays represent significant theoretical advances in machine learning discipline

    Motivated by the frequency assignment problem we study the d-distant coloring of the vertices of an infinite plane hexagonal lattice H. Let d be a positive integer. A d-distant coloring of the lattice H is a coloring of the vertices of H such that each pair of vertices distance at most d apart have different colors. The d-distant chromatic number of H, denoted χd(H), is the minimum number of colors needed for a d-distant coloring of H. We give the exact value of χd(H) for any d odd and estimations for any d even