104 research outputs found

    Distributed Optimisation in Wireless Sensor Networks: A Hierarchical Learning Approachs

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Characterization and computation of restless bandit marginal productivity indices

    Get PDF
    The Whittle index [P. Whittle (1988). Restless bandits: Activity allocation in a changing world. J. Appl. Probab. 25A, 287-298] yields a practical scheduling rule for the versatile yet intractable multi-armed restless bandit problem, involving the optimal dynamic priority allocation to multiple stochastic projects, modeled as restless bandits, i.e., binary-action (active/passive) (semi-) Markov decision processes. A growing body of evidence shows that such a rule is nearly optimal in a wide variety of applications, which raises the need to efficiently compute the Whittle index and more general marginal productivity index (MPI) extensions in large-scale models. For such a purpose, this paper extends to restless bandits the parametric linear programming (LP) approach deployed in [J. Niño-Mora. A (2/3)n3n^{3} fast-pivoting algorithm for the Gittins index and optimal stopping of a Markov chain, INFORMS J. Comp., in press], which yielded a fast Gittins-index algorithm. Yet the extension is not straightforward, as the MPI is only defined for the limited range of socalled indexable bandits, which motivates the quest for methods to establish indexability. This paper furnishes algorithmic and analytical tools to realize the potential of MPI policies in largescale applications, presenting the following contributions: (i) a complete algorithmic characterization of indexability, for which two block implementations are given; and (ii) more importantly, new analytical conditions for indexability — termed LP-indexability — that leverage knowledge on the structure of optimal policies in particular models, under which the MPI is computed faster by the adaptive-greedy algorithm previously introduced by the author under the more stringent PCL-indexability conditions, for which a new fast-pivoting block implementation is given. The paper further reports on a computational study, measuring the runtime performance of the algorithms, and assessing by a simulation study the high prevalence of indexability and PCL-indexability.

    Control of multiclass queueing systems with abandonments and adversarial customers

    Get PDF
    This thesis considers the defensive surveillance of multiple public areas which are the open, exposed targets of adversarial attacks. We address the operational problem of identifying a real time decision-making rule for a security team in order to minimise the damage an adversary can inflict within the public areas. We model the surveillance scenario as a multiclass queueing system with customer abandonments, wherein the operational problem translates into developing service policies for a server in order to minimise the expected damage an adversarial customer can inflict on the system. We consider three different surveillance scenarios which may occur in realworld security operations. In each scenario it is only possible to calculate optimal policies in small systems or in special cases, hence we focus on developing heuristic policies which can be computed and demonstrate their effectiveness in numerical experiments. In the random adversary scenario, the adversary attacks the system according to a probability distribution known to the server. This problem is a special case of a more general stochastic scheduling problem. We develop new results which complement the existing literature based on priority policies and an effective approximate policy improvement algorithm. We also consider the scenario of a strategic adversary who chooses where to attack. We model the interaction of the server and adversary as a two-person zero-sum game. We develop an effective heuristic based on an iterative algorithm which populates a small set of service policies to be randomised over. Finally, we consider the scenario of a strategic adversary who chooses both where and when to attack and formulate it as a robust optimisation problem. In this case, we demonstrate the optimality of the last-come first-served policy in single queue systems. In systems with multiple queues, we develop effective heuristic policies based on the last-come first-served policy which incorporates randomisation both within service policies and across service policies

    Statistical Applications to the Management of Intensive Care and Step-down Units

    Get PDF
    This thesis proposes three contributing manuscripts related to patient flow management, server decision-making, and ventilation time in the intensive care and step-down units system. First, a Markov decision process (MDP) model with a Monte Carlo simulation was performed to compare two patient flow policies: prioritizing premature step-down and prioritizing rejection of patients when the intensive care unit is congested. The optimal decisions were obtained under the two strategies. The simulation results based on these optimal decisions show that a premature step-down strategy contributes to higher congestion downstream. Counter-intuitively, premature step-down should be discouraged, and patient rejection or divergence actions should be further explored as a viable alternative for congested intensive care units (ICUs). Secondly, an investigation of the length of stay (LOS) competition between the intensive care unit (ICU) and the step-down unit (SDU), two servers in tandem without a buffer between them was proposed using queuing games. Analysis of the competition was done under four different scenarios: (i) both servers cooperate; (ii) the servers do not cooperate and make decisions simultaneously; (iii) the servers do not cooperate but the first server, the ICU is the leader; (iv) the servers do not cooperate, the second server the SDU is the leader. Finally, a numerical analysis was performed. The results show that the length of stay decisions of each server depends critically on the payoff function’s form and the exogenous demand. Secondly, with a linear payoff function, the SDU is only beneficial to the system if the unit cost is greater than its unit reward at the ICU. Perhaps most importantly, the critical care pathway performs better under coordination and or leadership at the ICU level. Finally, first-day ventilated patients\u27 ventilation time was analyzed using survival analysis. The probabilistic behaviour of the ventilation time duration was analyzed and the predictors of the ventilation time duration were determined based on available first-day covariates. Data were obtained from the Critical Care Information System (CCIS) about patients admitted to the ICUs in Ontario between July 2015 and December 2016. The log-logistic AFT model was found to be the best to relate the association between first-day covariates and the ventilation time

    Routing in multi-class queueing networks

    Get PDF
    PhD ThesisWe consider the problem of routing (incorporating local scheduling) in a distributed network. Dedicated jobs arrive directly at their specified station for processing. The choice of station for generic jobs is open. Each job class has an associated holding cost rate. We aim to develop routing policies to minimise the long-run average holding cost rate. We first consider the class of static policies. Dacre, Glazebrook and Nifio-Mora (1999) developed an approach to the formulation of static routing policies, in which the work at each station is scheduled optimally, using the achievable region approach. The achievable region approach attempts to solve stochastic optimisation problems by characterising the space of all possible performances and optimising the performance objective over this space. Optimal local scheduling takes the form of a priority policy. Such static routing policies distribute the generic traffic to the stations via a simple Bernoulli routing mechanism. We provide an overview of the achievements made in following this approach to static routing. In the course of this discussion we expand upon the study of Becker et al. (2000) in which they considered routing to a collection of stations specialised in processing certain job classes and we consider how the composition of the available stations affects the system performance for this particular problem. We conclude our examination of static routing policies with an investigation into a network design problem in which the number of stations available for processing remains to be determined. The second class of policies of interest is the class of dynamic policies. General DP theory asserts the existence of a deterministic, stationary and Markov optimal dynamic policy. However, a full DP solution may be unobtainable and theoretical difficulties posed by simple routing problems suggest that a closed form optimal policy may not be available. This motivates a requirement for good heuristic policies. We consider two approaches to the development of dynamic routing heuristics. We develop an idea proposed, in the context of simple single class systems, by Krishnan (1987) by applying a single policy improvement step to some given static policy. The resulting dynamic policy is shown to be of simple structure and easily computable. We include an investigation into the comparative performance of the dynamic policy with a number of competitor policies and of the performance of the heuristic as the number of stations in the network changes. In our second approach the generic traffic may only access processing when the station has been cleared of all (higher priority) jobs and can be considered as background work. We deploy a prescription of Whittle (1988) developed for RBPs to develop a suitable approach to station indexation. Taking an approximative approach to Whittle's proposal results in a very simple form of index policy for routing the generic traffic. We investigate the closeness to optimality of the index policy and compare the performance of both of the dynamic routing policies developed here

    Asymptotically optimal priority policies for indexable and non-indexable restless bandits

    Get PDF
    We study the asymptotic optimal control of multi-class restless bandits. A restless bandit is a controllable stochastic process whose state evolution depends on whether or not the bandit is made active. Since finding the optimal control is typically intractable, we propose a class of priority policies that are proved to be asymptotically optimal under a global attractor property and a technical condition. We consider both a fixed population of bandits as well as a dynamic population where bandits can depart and arrive. As an example of a dynamic population of bandits, we analyze a multi-class M/M/S+M queue for which we show asymptotic optimality of an index policy.We combine fluid-scaling techniques with linear programming results to prove that when bandits are indexable, Whittle's index policy is included in our class of priority policies. We thereby generalize a result of Weber and Weiss (1990) about asymptotic optimality of Whittle's index policy to settings with (i) several classes of bandits, (ii) arrivals of new bandits, and (iii) multiple actions. Indexability of the bandits is not required for our results to hold. For non-indexable bandits we describe how to select priority policies from the class of asymptotically optimal policies and present numerical evidence that, outside the asymptotic regime, the performance of our proposed priority policies is nearly optimal

    Resource allocation with observable and unobservable environments

    Get PDF
    Cette thèse étudie les problèmes d'allocation des ressources dans les réseaux stochastiques à grande échelle dans lesquels les paramètres fluctuent dans le temps. Nous supposons que l'état du système est formé de deux processus, une partie contrôlable dont l'évolution dépend de l'action du décideur et la partie environnement dont l'évolution est exogène. L'évolution stochastique du processus contrôlable dépend de l'état actuel de l'environnement. Selon que le décideur observe l'état de l'environnement, nous disons que l'environnement est observable ou non observable. La thèse suit trois axes de recherche principaux. Dans le premier problème, nous étudions le contrôle optimal d'un problème de bandit agité multi-bras MARBP avec un environnement inobservable. L'objectif est de caractériser la politique optimale de maîtrise du processus contrôlable malgré le fait que l'environnement ne peut pas être observé. Nous considérons le régime asymptotique à grande échelle dans lequel le nombre de bandits et la vitesse de l'environnement tendent tous deux à l'infini. Dans notre résultat principal, nous établissons qu'un ensemble de politiques prioritaires est asymptotiquement optimal. Nous montrons que cet ensemble comprend notamment l'indice de Whittle d'un système dont les paramètres sont moyennés sur le comportement stationnaire de l'environnement. Dans le second problème, nous considérons un MARBP avec un environnement observable. L'objectif est de tirer parti des informations sur l'environnement pour dériver une politique optimale pour le processus contrôlable. En supposant que la condition technique d'indexabilité soit vérifiée, nous développons un algorithme pour calculer numériquement l'indice de Whittle. Nous appliquons ensuite ce résultat au cas particulier d'une file d'attente avec abandon. Nous établissons une indexabilité, et nous obtenons des caractérisations de l'indice de Whittle sous forme fermée. Dans le troisième problème, nous considérons un modèle d'allocation de fichiers dans un grand système de stockage, où il y a des fichiers répartis sur un ensemble de nœuds. Chaque nœud tombe en panne selon une loi qui dépend de la charge qu'il gère. Chaque fois qu'un nœud tombe en panne, tous les fichiers qu'il possédait sont réalloués selon une stratégie d'allocation fixe, et le nœud redémarre son travail en étant vide. Nous étudions l'évolution de la charge d'un nœud dans le régime de champ moyen, lorsque le nombre de fichiers et le nombre de nœuds deviennent importants. Nous prouvons l'existence et l'unicité de la mesure de probabilité stationnaire du processus, et la convergence dans la distribution de cette mesure.This thesis studies resource allocation problems in large-scale stochastic networks. We work on problems where the availability of resources is subject to time fluctuations, a situation that one may encounter, for example, in load balancing systems or in wireless downlink scheduling systems. The time fluctuations are modelled considering two types of processes, controllable processes, whose evolution depends on the action of the decision maker, and environment processes, whose evolution is exogenous. The stochastic evolution of the controllable process depends on the the current state of the environment. Depending on whether the decision maker observes the state of the environment, we say that the environment is observable or unobservable. The mathematical formulation used is the Markov Decision Processes (MDPs). The thesis follows three main research axes. In the first problem we study the optimal control of a Multi-armed restless bandit problem (MARBP) with an unobservable environment. The objective is to characterise the optimal policy for the controllable process in spite of the fact that the environment cannot be observed. We consider the large-scale asymptotic regime in which the number of bandits and the speed of the environment both tend to infinity. In our main result we establish that a set of priority policies is asymptotically optimal. We show that, in particular, this set includes Whittle index policy of a system whose parameters are averaged over the stationary behaviour of the environment. In the second problem, we consider an MARBP with an observable environment. The objective is to leverage information on the environment to derive an optimal policy for the controllable process. Assuming that the technical condition of indexability holds, we develop an algorithm to compute Whittle's index. We then apply this result to the particular case of a queue with abandonments. We prove indexability, and we provide closed-form expressions of Whittle's index. In the third problem we consider a model of a large-scale storage system, where there are files distributed across a set of nodes. Each node breaks down following a law that depends on the load it handles. Whenever a node breaks down, all the files it had are reallocated to other nodes. We study the evolution of the load of a single node in the mean-field regime, when the number of nodes and files grow large. We prove the existence of the process in the mean-field regime. We further show the convergence in distribution of the load in steady state as the average number of files per node tends to infinity

    Cross-layer design of multi-hop wireless networks

    Get PDF
    MULTI -hop wireless networks are usually defined as a collection of nodes equipped with radio transmitters, which not only have the capability to communicate each other in a multi-hop fashion, but also to route each others’ data packets. The distributed nature of such networks makes them suitable for a variety of applications where there are no assumed reliable central entities, or controllers, and may significantly improve the scalability issues of conventional single-hop wireless networks. This Ph.D. dissertation mainly investigates two aspects of the research issues related to the efficient multi-hop wireless networks design, namely: (a) network protocols and (b) network management, both in cross-layer design paradigms to ensure the notion of service quality, such as quality of service (QoS) in wireless mesh networks (WMNs) for backhaul applications and quality of information (QoI) in wireless sensor networks (WSNs) for sensing tasks. Throughout the presentation of this Ph.D. dissertation, different network settings are used as illustrative examples, however the proposed algorithms, methodologies, protocols, and models are not restricted in the considered networks, but rather have wide applicability. First, this dissertation proposes a cross-layer design framework integrating a distributed proportional-fair scheduler and a QoS routing algorithm, while using WMNs as an illustrative example. The proposed approach has significant performance gain compared with other network protocols. Second, this dissertation proposes a generic admission control methodology for any packet network, wired and wireless, by modeling the network as a black box, and using a generic mathematical 0. Abstract 3 function and Taylor expansion to capture the admission impact. Third, this dissertation further enhances the previous designs by proposing a negotiation process, to bridge the applications’ service quality demands and the resource management, while using WSNs as an illustrative example. This approach allows the negotiation among different service classes and WSN resource allocations to reach the optimal operational status. Finally, the guarantees of the service quality are extended to the environment of multiple, disconnected, mobile subnetworks, where the question of how to maintain communications using dynamically controlled, unmanned data ferries is investigated
    corecore