60 research outputs found

    Asymptotically optimal index policies for an abandonment queue with convex holding cost.

    Get PDF
    International audienceWe investigate a resource allocation problem in a multi-class server with convex holding costs and user impatience under the average cost criterion. In general, the optimal policy has a complex dependency on all the input parameters and state information. Our main contribution is to derive index policies that can serve as heuristics and are shown to give good performance. Our index policy attributes to each class an index, which depends on the number of customers currently present in that class. The index values are obtained by solving a relaxed version of the optimal stochastic control problem and combining results from restless multi-armed bandits and queueing theory. They can be expressed as a function of the steady-state distribution probabilities of a one-dimensional birth-and-death process. For linear holding cost, the index can be calculated in closed-form and turns out to be independent of the arrival rates and the number of customers present. In the case of no abandonments and linear holding cost, our index coincides with the cμc\mu-rule, which is known to be optimal in this simple setting. For general convex holding cost we derive properties of the index value in limiting regimes: we consider the behavior of the index (i) as the number of customers in a class grows large, which allows us to derive the asymptotic structure of the index policies, (ii) as the abandonment rate vanishes, which allows us to retrieve an index policy proposed for the multi-class M/M/1 queue with convex holding cost and no abandonments, and (iii) as the arrival rate goes to either 0 or \infty, representing light-traffic and heavy-traffic regimes, respectively. We show that Whittle's index policy is asymptotically optimal in both light-traffic and heavy-traffic regimes. To obtain further insights into the index policy, we consider the fluid version of the relaxed problem and derive a closed-form expression for the fluid index. The latter is shown to coincide with the index values for the stochastic model in asymptotic regimes. For arbitrary convex holding cost the fluid index can be seen as the Gcμ/θGc\mu/\theta-rule, that is, including abandonments into the generalized cμc\mu-rule (GcμGc\mu-rule). Numerical experiments for a wide range of parameters have shown that the Whittle index policy and the fluid index policy perform very well for a broad range of parameters

    Resource allocation with observable and unobservable environments

    Get PDF
    Cette thèse étudie les problèmes d'allocation des ressources dans les réseaux stochastiques à grande échelle dans lesquels les paramètres fluctuent dans le temps. Nous supposons que l'état du système est formé de deux processus, une partie contrôlable dont l'évolution dépend de l'action du décideur et la partie environnement dont l'évolution est exogène. L'évolution stochastique du processus contrôlable dépend de l'état actuel de l'environnement. Selon que le décideur observe l'état de l'environnement, nous disons que l'environnement est observable ou non observable. La thèse suit trois axes de recherche principaux. Dans le premier problème, nous étudions le contrôle optimal d'un problème de bandit agité multi-bras MARBP avec un environnement inobservable. L'objectif est de caractériser la politique optimale de maîtrise du processus contrôlable malgré le fait que l'environnement ne peut pas être observé. Nous considérons le régime asymptotique à grande échelle dans lequel le nombre de bandits et la vitesse de l'environnement tendent tous deux à l'infini. Dans notre résultat principal, nous établissons qu'un ensemble de politiques prioritaires est asymptotiquement optimal. Nous montrons que cet ensemble comprend notamment l'indice de Whittle d'un système dont les paramètres sont moyennés sur le comportement stationnaire de l'environnement. Dans le second problème, nous considérons un MARBP avec un environnement observable. L'objectif est de tirer parti des informations sur l'environnement pour dériver une politique optimale pour le processus contrôlable. En supposant que la condition technique d'indexabilité soit vérifiée, nous développons un algorithme pour calculer numériquement l'indice de Whittle. Nous appliquons ensuite ce résultat au cas particulier d'une file d'attente avec abandon. Nous établissons une indexabilité, et nous obtenons des caractérisations de l'indice de Whittle sous forme fermée. Dans le troisième problème, nous considérons un modèle d'allocation de fichiers dans un grand système de stockage, où il y a des fichiers répartis sur un ensemble de nœuds. Chaque nœud tombe en panne selon une loi qui dépend de la charge qu'il gère. Chaque fois qu'un nœud tombe en panne, tous les fichiers qu'il possédait sont réalloués selon une stratégie d'allocation fixe, et le nœud redémarre son travail en étant vide. Nous étudions l'évolution de la charge d'un nœud dans le régime de champ moyen, lorsque le nombre de fichiers et le nombre de nœuds deviennent importants. Nous prouvons l'existence et l'unicité de la mesure de probabilité stationnaire du processus, et la convergence dans la distribution de cette mesure.This thesis studies resource allocation problems in large-scale stochastic networks. We work on problems where the availability of resources is subject to time fluctuations, a situation that one may encounter, for example, in load balancing systems or in wireless downlink scheduling systems. The time fluctuations are modelled considering two types of processes, controllable processes, whose evolution depends on the action of the decision maker, and environment processes, whose evolution is exogenous. The stochastic evolution of the controllable process depends on the the current state of the environment. Depending on whether the decision maker observes the state of the environment, we say that the environment is observable or unobservable. The mathematical formulation used is the Markov Decision Processes (MDPs). The thesis follows three main research axes. In the first problem we study the optimal control of a Multi-armed restless bandit problem (MARBP) with an unobservable environment. The objective is to characterise the optimal policy for the controllable process in spite of the fact that the environment cannot be observed. We consider the large-scale asymptotic regime in which the number of bandits and the speed of the environment both tend to infinity. In our main result we establish that a set of priority policies is asymptotically optimal. We show that, in particular, this set includes Whittle index policy of a system whose parameters are averaged over the stationary behaviour of the environment. In the second problem, we consider an MARBP with an observable environment. The objective is to leverage information on the environment to derive an optimal policy for the controllable process. Assuming that the technical condition of indexability holds, we develop an algorithm to compute Whittle's index. We then apply this result to the particular case of a queue with abandonments. We prove indexability, and we provide closed-form expressions of Whittle's index. In the third problem we consider a model of a large-scale storage system, where there are files distributed across a set of nodes. Each node breaks down following a law that depends on the load it handles. Whenever a node breaks down, all the files it had are reallocated to other nodes. We study the evolution of the load of a single node in the mean-field regime, when the number of nodes and files grow large. We prove the existence of the process in the mean-field regime. We further show the convergence in distribution of the load in steady state as the average number of files per node tends to infinity

    Asymptotically optimal priority policies for indexable and non-indexable restless bandits

    Get PDF
    We study the asymptotic optimal control of multi-class restless bandits. A restless bandit is a controllable stochastic process whose state evolution depends on whether or not the bandit is made active. Since finding the optimal control is typically intractable, we propose a class of priority policies that are proved to be asymptotically optimal under a global attractor property and a technical condition. We consider both a fixed population of bandits as well as a dynamic population where bandits can depart and arrive. As an example of a dynamic population of bandits, we analyze a multi-class M/M/S+M queue for which we show asymptotic optimality of an index policy.We combine fluid-scaling techniques with linear programming results to prove that when bandits are indexable, Whittle's index policy is included in our class of priority policies. We thereby generalize a result of Weber and Weiss (1990) about asymptotic optimality of Whittle's index policy to settings with (i) several classes of bandits, (ii) arrivals of new bandits, and (iii) multiple actions. Indexability of the bandits is not required for our results to hold. For non-indexable bandits we describe how to select priority policies from the class of asymptotically optimal policies and present numerical evidence that, outside the asymptotic regime, the performance of our proposed priority policies is nearly optimal

    Developing effective service policies for multiclass queues with abandonment:asymptotic optimality and approximate policy improvement

    Get PDF
    We study a single server queuing model with multiple classes and impatient customers. The goal is to determine a service policy to maximize the long-run reward rate earned from serving customers net of holding costs and penalties respectively due to customers waiting for and leaving before receiving service. We first show that it is without loss of generality to study a pure-reward model. Since standard methods can usually only compute the optimal policy for problems with up to three customer classes, our focus is to develop a suite of heuristic approaches, with a preference for operationally simple policies with good reward characteristics. One such heuristic is the Rμθ rule—a priority policy that ranks all customer classes based on the product of reward R, service rate μ, and abandonment rate θ. We show that the Rμθ rule is asymptotically optimal as customer abandonment rates approach zero and often performs well in cases where the simpler Rμ rule performs poorly. The paper also develops an approximate policy improvement method that uses simulation and interpolation to estimate the bias function for use in a dynamic programming recursion. For systems with two or three customer classes, our numerical study indicates that the best of our simple priority policies is near optimal in most cases; when it is not, the approximate policy improvement method invariably tightens up the gap substantially. For systems with five customer classes, our heuristics typically achieve within 4% of an upper bound for the optimal value, which is computed via a linear program that relies on a relaxation of the original system. The computational requirement of the approximate policy improvement method grows rapidly when the number of customer classes or the traffic intensity increases

    Dynamic control of stochastic and fluid resource-sharing systems

    Get PDF
    In this thesis we study the dynamic control of resource-sharing systems that arise in various domains: e.g. inventory management, healthcare and communication networks. We aim at efficiently allocating the available resources among competing projects according to a certain performance criteria. These type of problems have a stochastic nature and may be very complex to solve. We therefore focus on developing well-performing heuristics. In Part I, we consider the framework of Restless Bandit Problems, which is a general class of dynamic stochastic optimization problems. Relaxing the sample-path constraint in the optimization problem enables to define an index-based heuristic for the original constrained model, the so-called Whittle index policy. We derive a closed-form expression for the Whittle index as a function of the steady-state probabilities for the case in which bandits (projects) evolve in a birth-and-death fashion. This expression requires several technical conditions to be verified, and in addition, it can only be computed explicitly in specific cases. In the particular case of a multi-class abandonment queue, we further prove that the Whittle index policy is asymptotically optimal in the light-traffic and heavy-traffic regimes. In Part II, we derive heuristics by approximating the stochastic resource-sharing systems with deterministic fluid models. We first formulate a fluid version of the relaxed optimization problem introduced in Part I, and we develop a fluid index policy. The fluid index can always be computed explicitly and hence overcomes the technical issues that arise when calculating the Whittle index. We apply the Whittle index and the fluid index policies to several systems: e.g. power-aware server-farms, opportunistic scheduling in wireless systems, and make-to-stock problems with perishable items. We show numerically that both index policies are nearly optimal. Secondly, we study the optimal scheduling control for the fluid version of a multi-class abandonment queue. We derive the fluid optimal control when there are two classes of customers competing for a single resource. Based on the insights provided by this result we build a heuristic for the general multi-class setting. This heuristic shows near-optimal performance when applied to the original stochastic model for high workloads. In Part III, we further investigate the abandonment phenomena in the context of a content delivery problem. We characterize an optimal grouping policy so that requests, which are impatient, are efficiently transmitted in a multi-cast mode

    Optimal Control of Parallel Queues for Managing Volunteer Convergence

    Full text link
    Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/163497/2/poms13224.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/163497/1/poms13224_am.pd

    Control of multiclass queueing systems with abandonments and adversarial customers

    Get PDF
    This thesis considers the defensive surveillance of multiple public areas which are the open, exposed targets of adversarial attacks. We address the operational problem of identifying a real time decision-making rule for a security team in order to minimise the damage an adversary can inflict within the public areas. We model the surveillance scenario as a multiclass queueing system with customer abandonments, wherein the operational problem translates into developing service policies for a server in order to minimise the expected damage an adversarial customer can inflict on the system. We consider three different surveillance scenarios which may occur in realworld security operations. In each scenario it is only possible to calculate optimal policies in small systems or in special cases, hence we focus on developing heuristic policies which can be computed and demonstrate their effectiveness in numerical experiments. In the random adversary scenario, the adversary attacks the system according to a probability distribution known to the server. This problem is a special case of a more general stochastic scheduling problem. We develop new results which complement the existing literature based on priority policies and an effective approximate policy improvement algorithm. We also consider the scenario of a strategic adversary who chooses where to attack. We model the interaction of the server and adversary as a two-person zero-sum game. We develop an effective heuristic based on an iterative algorithm which populates a small set of service policies to be randomised over. Finally, we consider the scenario of a strategic adversary who chooses both where and when to attack and formulate it as a robust optimisation problem. In this case, we demonstrate the optimality of the last-come first-served policy in single queue systems. In systems with multiple queues, we develop effective heuristic policies based on the last-come first-served policy which incorporates randomisation both within service policies and across service policies

    Asymptotically optimal priority policies for indexable and nonindexable restless bandits

    Get PDF
    We study the asymptotic optimal control of multi-class restless bandits. A restless bandit is a controllable stochastic process whose state evolution depends on whether or not the bandit is made active. Since finding the optimal control is typically intractable, we propose a class of priority policies that are proved to be asymptotically optimal under a global attractor property and a technical condition. We consider both a fixed population of bandits as well as a dynamic population where bandits can depart and arrive. As an example of a dynamic population of bandits, we analyze a multi-class M/M/S+M queue for which we show asymptotic optimality of an index policy. We combine fluid-scaling techniques with linear programming results to prove that when bandits are indexable, Whittle's index policy is included in our class of priority policies. We thereby generalize a result of Weber and Weiss (1990) about asymptotic optimality of Whittle's index policy to settings with (i) several classes of bandits, (ii) arrivals of new bandits, and (iii) multiple actions. Indexability of the bandits is not required for our results to hold. For non-indexable bandits we describe how to select priority policies from the class of asymptotically optimal policies and present numerical evidence that, outside the asymptotic regime, the performance of our proposed priority policies is nearly optimal
    corecore