39 research outputs found

    Analysis of join-the-shortest-queue routing for web server farms

    Get PDF
    Join the Shortest Queue (JSQ) is a popular routing policy for server farms. However, until now all analysis of JSQ has been limited to First-Come-First-Serve (FCFS) server farms, whereas it is known that web server farms are better modeled as Processor Sharing (PS) server farms. We provide the first approximate analysis of JSQ in the PS server farm model for general job-size distributions, obtaining the distribution of queue length at each queue. To do this, we approximate the queue length of each queue in the server farm by a one-dimensional Markov chain, in a novel fashion. We also discover some interesting insensitivity properties of PS server farms with JSQ routing, and discuss the near-optimality of JSQ

    Routing policies for a partially observable two-server queueing system

    Get PDF
    We consider a queueing system controlled by decisions based on partial state information. The motivation for this work stems from road traffic, in which drivers may, or may not, be subscribed to a smartphone application for dynamic route planning. Our model consists of two queues with independent ex-ponential service times, serving two types of jobs. Arrivals occur according to a Poisson process; a fraction of the jobs (type X) is observable and controllable. At all times the number of X jobs in each queue and their individual po-sitions are known. Upon its arrival a router decides which queue the next X job should join. Y jobs are non-observable and non-controllable. They randomly join a queue according to some static routing probability. We address the following main research questions: 1) what penetration level is needed for effective control, 2) which policy should be implemented at the router, and 3) what is the added value of having more system information (e.g., average service times)? An extensive simulation study re-veals that for heavily loaded systems a low penetration level suucces and that the performance (in terms of the average sojourn time) of a simple policy that relies on little system information is close to w-JSQ (weighted join-the-shortest- queue policy) which is optimal in a fully controllable and observable system. The latter result is confirmed by the analysis of deterministic uid models that approximate the stochastic evolution under large loads

    Token Redundancy in Distributed JIQ

    Get PDF

    Mean-field analysis of load balancing principles in large scale systems

    Full text link
    Load balancing plays a crucial role in many large scale systems. Several different load balancing principles have been proposed in the literature, such as Join-Shortest-Queue (JSQ) and its variations, or Join-Below-Threshold. We provide a high level mathematical framework to examine heterogeneous server clusters in the mean-field limit as the system load and the number of servers scale proportionally. We aim to identify both the transient mean-field limit and the stationary mean-field limit for various choices of load balancing principles, compute relevant performance measures such as the distribution and mean of the system time of jobs, and conduct a comparison from a performance point of view

    Stochastic methods for measurement-based network control

    Get PDF
    The main task of network administrators is to ensure that their network functions properly. Whether they manage a telecommunication or a road network, they generally base their decisions on the analysis of measurement data. Inspired by such network control applications, this dissertation investigates several stochastic modelling techniques for data analysis. The focus is on two areas within the field of stochastic processes: change point detection and queueing theory. Part I deals with statistical methods for the automatic detection of change points, being changes in the probability distribution underlying a data sequence. This part starts with a review of existing change point detection methods for data sequences consisting of independent observations. The main contribution of this part is the generalisation of the classic cusum method to account for dependence within data sequences. We analyse the false alarm probability of the resulting methods using a large deviations approach. The part also discusses numerical tests of the new methods and a cyber attack detection application, in which we investigate how to detect dns tunnels. The main contribution of Part II is the application of queueing models (probabilistic models for waiting lines) to situations in which the system to be controlled can only be observed partially. We consider two types of partial information. Firstly, we develop a procedure to get insight into the performance of queueing systems between consecutive system-state measurements and apply it in a numerical study, which was motivated by capacity management in cable access networks. Secondly, inspired by dynamic road control applications, we study routing policies in a queueing system for which just part of the jobs are observable and controllable

    Dynamical Modeling of Cloud Applications for Runtime Performance Management

    Get PDF
    Cloud computing has quickly grown to become an essential component in many modern-day software applications. It allows consumers, such as a provider of some web service, to quickly and on demand obtain the necessary computational resources to run their applications. It is desirable for these service providers to keep the running cost of their cloud application low while adhering to various performance constraints. This is made difficult due to the dynamics imposed by, e.g., resource contentions or changing arrival rate of users, and the fact that there exist multiple ways of influencing the performance of a running cloud application. To facilitate decision making in this environment, performance models can be introduced that relate the workload and different actions to important performance metrics.In this thesis, such performance models of cloud applications are studied. In particular, we focus on modeling using queueing theory and on the fluid model for approximating the often intractable dynamics of the queue lengths. First, existing results on how the fluid model can be obtained from the mean-field approximation of a closed queueing network are simplified and extended to allow for mixed networks. The queues are allowed to follow the processor sharing or delay disciplines, and can have multiple classes with phase-type service times. An improvement to this fluid model is then presented to increase accuracy when the \emph{system size}, i.e., number of servers, initial population, and arrival rate, is small. Furthermore, a closed-form approximation of the response time CDF is presented. The methods are tested in a series of simulation experiments and shown to be accurate. This mean-field fluid model is then used to derive a general fluid model for microservices with interservice delays. The model is shown to be completely extractable at runtime in a distributed fashion. It is further evaluated on a simple microservice application and found to accurately predict important performance metrics in most cases. Furthermore, a method is devised to reduce the cost of a running application by tuning load balancing parameters between replicas. The method is built on gradient stepping by applying automatic differentiation to the fluid model. This allows for arbitrarily defined cost functions and constraints, most notably including different response time percentiles. The method is tested on a simple application distributed over multiple computing clusters and is shown to reduce costs while adhering to percentile constraints. Finally, modeling of request cloning is studied using the novel concept of synchronized service. This allows certain forms of cloning over servers, each modeled with a single queue, to be equivalently expressed as one single queue. The concept is very general regarding the involved queueing discipline and distributions, but instead introduces new, less realistic assumptions. How the equivalent queue model is affected by relaxing these assumptions is studied considering the processor sharing discipline, and an extension to enable modeling of speculative execution is made. In a simulation campaign, it is shown that these relaxations only has a minor effect in certain cases

    Scalable Load Balancing Algorithms in Networked Systems

    Get PDF
    A fundamental challenge in large-scale networked systems viz., data centers and cloud networks is to distribute tasks to a pool of servers, using minimal instantaneous state information, while providing excellent delay performance. In this thesis we design and analyze load balancing algorithms that aim to achieve a highly efficient distribution of tasks, optimize server utilization, and minimize communication overhead.Comment: Ph.D. thesi
    corecore