10 research outputs found
Load Balancing in the Non-Degenerate Slowdown Regime
We analyse Join-the-Shortest-Queue in a contemporary scaling regime known as
the Non-Degenerate Slowdown regime. Join-the-Shortest-Queue (JSQ) is a
classical load balancing policy for queueing systems with multiple parallel
servers. Parallel server queueing systems are regularly analysed and
dimensioned by diffusion approximations achieved in the Halfin-Whitt scaling
regime. However, when jobs must be dispatched to a server upon arrival, we
advocate the Non-Degenerate Slowdown regime (NDS) to compare different
load-balancing rules.
In this paper we identify novel diffusion approximation and timescale
separation that provides insights into the performance of JSQ. We calculate the
price of irrevocably dispatching jobs to servers and prove this to within 15%
(in the NDS regime) of the rules that may manoeuvre jobs between servers. We
also compare ours results for the JSQ policy with the NDS approximations of
many modern load balancing policies such as Idle-Queue-First and
Power-of--choices policies which act as low information proxies for the JSQ
policy. Our analysis leads us to construct new rules that have identical
performance to JSQ but require less communication overhead than
power-of-2-choices.Comment: Revised journal submission versio
On Occupancy Based Randomized Load Balancing for Large Systems with General Distributions
Multi-server architectures are ubiquitous in today's information infrastructure whether for supporting cloud services, web servers, or for distributed storage. The performance of multi-server systems is highly dependent on the load distribution. This is affected by the use of load balancing strategies. Since both latency and blocking are important features, it is most reasonable to route an incoming job to a server that is lightly loaded. Hence a good load balancing policy should be dependent on the states of servers. Since obtaining information about the remaining workload of servers for every arrival is very hard, it is preferable to design load balancing policies that depend on occupancy or the number of progressing jobs of servers. Furthermore, if the system has a large number of servers, it is not practical to use the occupancy information of all the servers to dispatch or route an arrival due to high communication cost. In large-scale systems that have tens of thousands of servers, the policies which use the occupancy information of only a finite number of randomly selected servers to dispatch an arrival result in lower implementation cost than the policies which use the occupancy information of all the servers. Such policies are referred to as occupancy based randomized load balancing policies.
Motivated by cloud computing systems and web-server farms, we study two types of models. In the first model, each server is an Erlang loss server, and this model is an abstraction of Infrastructure-as-a-Service (IaaS) clouds. The second model we consider is one with processor sharing servers that is an abstraction of web-server farms which serve requests in a round-robin manner with small time granularity. The performance criterion for web-servers is the response time or the latency for the request to be processed. In most prior works, the analysis of these models was restricted to the case of exponential job length distributions and in this dissertation we study the case of general job length distributions.
To analyze the impact of a load balancing policy, we need to develop models for the system's dynamics. In this dissertation, we show that one can construct useful Markovian models. For occupancy based randomized routing policies, due to complex inter-dependencies between servers, an exact analysis is mostly intractable. However, we show that the multi-server systems that have an occupancy based randomized load balancing policy are examples of weakly interacting particle systems. In these systems, servers are interacting particles whose states lie in an uncountable state space. We develop a mean-field analysis to understand a server's behavior as the number of servers becomes large. We show that under certain assumptions, as the number of servers increases, the sequence of empirical measure-valued Markov processes which model the systems' dynamics converges to a deterministic measure-valued process referred to as the mean-field limit. We observe that the mean-field equations correspond to the dynamics of the distribution of a non-linear Markov process. A consequence of having the mean-field limit is that under minor and natural assumptions on the initial states of servers, any finite set of servers can be shown to be independent of each other as the number of servers goes to infinity. Furthermore, the mean-field limit approximates each server's distribution in the transient regime when the number of servers is large.
A salient feature of loss and processor sharing systems in the setting where their time evolution can be modeled by reversible Markov processes is that their stationary occupancy distribution is insensitive to the type of job length distribution; it depends only on the average job length but not on the type of the distribution. This property does not hold when the number of servers is finite in our context due to lack of reversibility. We show however that the fixed-point of the mean-field is insensitive to the job length distributions for all occupancy based randomized load balancing policies when the fixed-point is unique for job lengths that have exponential distributions. We also provide some deeper insights into the relationship between the mean-field and the distributions of servers and the empirical measure in the stationary regime.
Finally, we address the accuracy of mean-field approximations in the case of loss models. To do so we establish a functional central limit theorem under the assumption that the job lengths have exponential distributions. We show that a suitably scaled fluctuation of the stochastic empirical process around the mean-field converges to an Ornstein-Uhlenbeck process. Our analysis is also valid for the Halfin-Whitt regime in which servers are critically loaded. We then exploit the functional central limit theorem to quantify the error between the actual blocking probability of the system with a large number of servers and the blocking probability obtained from the fixed-point of the mean-field. In the Halfin-Whitt regime, the error is of the order inverse square root of the number of servers. On the other hand, for a light load regime, the error is smaller than the inverse square root of the number of servers
ASYMPTOTIC ANALYSIS OF SINGLE-HOP STOCHASTIC PROCESSING NETWORKS USING THE DRIFT METHOD
Today’s era of cloud computing and big data is powered by massive data centers. The
focus of my dissertation is on resource allocation problems that arise in the operation of
these large-scale data centers. Analyzing these systems exactly is usually intractable, and
a usual approach is to study them in various asymptotic regimes with heavy traffic being a
popular one. We use the drift method, which is a two-step procedure to obtain bounds that
are asymptotically tight. In the first step, one shows state-space collapse, which, intuitively,
means that one detects the bottleneck(s) of the system. In the second step, one sets to zero
the drift of a carefully chosen test function. Then, using state-space collapse, one can obtain
the desired bounds.
This dissertation focuses on exploiting the properties of the drift method and providing
conditions under which one can completely determine the asymptotic distribution of the
queue lengths. In chapter 1 we present the motivation, research background, and main
contributions.
In chapter 2 we revisit some well-known definitions and results that will be repeatedly
used in the following chapters.
In chapter 3, chapter 4, and chapter 5 we focus on load-balancing systems, also known as
supermarket checkout systems. In the load-balancing system, there are a certain number of
servers, and jobs arrive in a single stream. Once they come, they join the queue associated
with one of the servers, and they wait in line until the corresponding server processes them.
In chapter 3 we introduce the moment generating function (MGF) method. The MGF,
also known as two-sided Laplace form, is an invertible transformation of the random variable’s
distribution and, hence, it provides the same information as the cumulative distribution
function or the density (when it exists). The MGF method is a two-step procedure to
compute the MGF of the delay in stochastic processing networks (SPNs) that satisfy the
complete resource pooling (CRP) condition. Intuitively, CRP means that the SPN has a
single bottleneck in heavy traffic.
A popular routing algorithm is power-of-d choices, under which one selects d servers
at random and routes the new arrivals to the shortest queue among those d. The power-of-d
choices algorithm has been widely studied in load-balancing systems with homogeneous
servers. However, it is not well understood when the servers are different. In chapter 4 we
study this routing policy under heterogeneous servers. Specifically, we provide necessary
and sufficient conditions on the service rates so that the load-balancing system achieves
throughput and heavy-traffic optimality. We use the MGF method to show heavy-traffic
optimality.
In chapter 5 we study the load-balancing system in the many-server heavy-traffic regime,
which means that we analyze the limit as the number of servers and the load increase together.
Specifically, we are interested in studying how fast the number of servers can grow
with respect to the load if we want to observe the same probabilistic behavior of the delay
as a system with a fixed number of servers in heavy traffic. We show two approaches to
obtain the results: the MGF method and Stein’s method.
In chapter 6 we apply the MGF method to a generalized switch, which is one of the
most general single-hop SPNs with control on the service process. Many systems, such
as ad hoc wireless networks, input-queued switches, and parallel-server systems, can be
modeled as special cases of the generalized switch.
Most of the literature in SPNs (including the previous chapters of this thesis) focuses on
systems that satisfy the CRP condition in heavy traffic, i.e., systems that behave as single-server
queues in the limit. In chapter 7 we study systems that do not satisfy this condition
and, hence, may have multiple bottlenecks. We specify conditions under which the drift
method is sufficient to obtain the distribution function of the delay, and when it can only be
used to obtain information about its mean value. Our results are valid for both, the CRP and
non-CRP cases and they are immediately applicable to a variety of systems. Additionally,
we provide a mathematical proof that shows a limitation of the drift method.Ph.D
Optimal Scheduling in the Multiserver-job Model under Heavy Traffic
Multiserver-job systems, where jobs require concurrent service at many
servers, occur widely in practice. Essentially all of the theoretical work on
multiserver-job systems focuses on maximizing utilization, with almost nothing
known about mean response time. In simpler settings, such as various known-size
single-server-job settings, minimizing mean response time is merely a matter of
prioritizing small jobs. However, for the multiserver-job system, prioritizing
small jobs is not enough, because we must also ensure servers are not
unnecessarily left idle. Thus, minimizing mean response time requires
prioritizing small jobs while simultaneously maximizing throughput. Our
question is how to achieve these joint objectives.
We devise the ServerFilling-SRPT scheduling policy, which is the first policy
to minimize mean response time in the multiserver-job model in the heavy
traffic limit. In addition to proving this heavy-traffic result, we present
empirical evidence that ServerFilling-SRPT outperforms all existing scheduling
policies for all loads, with improvements by orders of magnitude at higher
loads.
Because ServerFilling-SRPT requires knowing job sizes, we also define the
ServerFilling-Gittins policy, which is optimal when sizes are unknown or
partially known.Comment: 32 pages, to appear in ACM SIGMETRICS 202
Scheduling for today’s computer systems: bridging theory and practice
Scheduling is a fundamental technique for improving performance in computer systems. From web servers
to routers to operating systems, how the bottleneck device is scheduled has an enormous impact on the performance of the system as a whole. Given the immense literature studying scheduling, it is easy to think that we already understand enough about scheduling. But, modern computer system designs have highlighted a number of disconnects between traditional analytic results and the needs of system designers.
In particular, the idealized policies, metrics, and models used by analytic researchers do not match the policies, metrics, and scenarios that appear in real systems.
The goal of this thesis is to take a step towards modernizing the theory of scheduling in order to provide
results that apply to today’s computer systems, and thus ease the burden on system designers. To accomplish
this goal, we provide new results that help to bridge each of the disconnects mentioned above. We will move beyond the study of idealized policies by introducing a new analytic framework where the focus is on scheduling heuristics and techniques rather than individual policies. By moving beyond the study of individual policies, our results apply to the complex hybrid policies that are often used in practice. For example, our results enable designers to understand how the policies that favor small job sizes are affected by the fact that real systems only have estimates of job sizes. In addition, we move beyond the study of mean response time
and provide results characterizing the distribution of response time and the fairness of scheduling policies.
These results allow us to understand how scheduling affects QoS guarantees and whether favoring small job sizes results in large job sizes being treated unfairly. Finally, we move beyond the simplified models traditionally used in scheduling research and provide results characterizing the effectiveness of scheduling in multiserver systems and when users are interactive. These results allow us to answer questions about the how to design multiserver systems and how to choose a workload generator when evaluating new scheduling designs
Recommended from our members
Many-Server Queues with Time-Varying Arrivals, Customer Abandonment, and non-Exponential Distributions
This thesis develops deterministic heavy-traffic fluid approximations for many-server stochastic queueing models. The queueing models, with many homogeneous servers working independently in parallel, are intended to model large-scale service systems such as call centers and health care systems. Such models also have been employed to study communication, computing and manufacturing systems. The heavy-traffic approximations yield relatively simple formulas for quantities describing system performance, such as the expected number of customers waiting in the queue. The new performance approximations are valuable because, in the generality considered, these complex systems are not amenable to exact mathematical analysis. Since the approximate performance measures can be computed quite rapidly, they usefully complement more cumbersome computer simulation. Thus these heavy-traffic approximations can be used to improve capacity planning and operational control. More specifically, the heavy-traffic approximations here are for large-scale service systems, having many servers and a high arrival rate. The main focus is on systems that have time-varying arrival rates and staffing functions.
The system is considered under the assumption that there are alternating periods of overloading and underloading, which commonly occurs when service providers are unable to adjust the staffing frequently enough to economically meet demand at all times. The models also allow the realistic features of customer abandonment and non-exponential probability distributions for the service times and the times customers are willing to wait before abandoning. These features make the overall stochastic model non-Markovian and thus thus very difficult to analyze directly. This thesis provides effective algorithms to compute approximate performance descriptions for these complex systems. These algorithms are based on ordinary differential equations and fixed point equations associated with contraction operators. Simulation experiments are conducted to verify that the approximations are effective.
This thesis consists of four pieces of work, each presented in one chapter.
The first chapter (Chapter 2) develops the basic fluid approximation for a non-Markovian many-server queue with time-varying arrival rate and staffing. The second chapter (Chapter 3) extends the fluid approximation to systems with complex network structure and Markovian routing to other queues of customers after completing service from each queue. The extension to open networks of queues has important applications. For one example, in hospitals, patients usually move among different units such as emergency rooms, operating rooms, and intensive care units. For another example, in manufacturing systems, individual products visit different work stations one or more times. The open network fluid model has multiple queues each of which has a time-varying arrival rate and staffing function.
The third chapter (Chapter 4) studies the large-time asymptotic dynamics of a single fluid queue. When the model parameters are constant, convergence to the steady state as time evolves is established. When the arrival rates are periodic functions, such as in service systems with daily or seasonal cycles, the existence of a periodic steady state and the convergence to that periodic steady state as time evolves are established. Conditions are provided under which this convergence is exponentially fast. The fourth chapter (Chapter 5) uses a fluid approximation to gain insight into nearly periodic behavior seen in overloaded stationary many-server queues with customer abandonment and nearly deterministic service times. Deterministic service times are of applied interest because computer-generated service times, such as automated messages, may well be deterministic, and computer-generated service is becoming more prevalent. With deterministic service times, if all the servers remain busy for a long interval of time, then the times customers enter service assumes a periodic behavior throughout that interval. In overloaded large-scale systems, these intervals tend to persist for a long time, producing nearly periodic behavior.
To gain insight, a heavy-traffic limit theorem is established showing that the fluid model arises as the many-server heavy-traffic limit of a sequence of appropriately scaled queueing models, all having these deterministic service times. Simulation experiments confirm that the transient behavior of the limiting fluid model provides a useful description of the transient performance of the queueing system. However, unlike the asymptotic loss of memory results in the previous chapter for service times with densities, the stationary fluid model with deterministic service times does not approach steady state as time evolves independent of the initial conditions. Since the queueing model with deterministic service times approaches a proper steady state as time evolves, this model with deterministic service times provides an example where the limit interchange (limiting steady state as time evolves and heavy traffic as scale increases) is not valid
Performance Evaluation of Transition-based Systems with Applications to Communication Networks
Since the beginning of the twenty-first century, communication systems have witnessed a revolution in terms of their hardware capabilities. This transformation has enabled modern networks to stand up to the diversity and the scale of the requirements of the applications that they support. Compared to their predecessors that primarily consisted of a handful of homogeneous devices communicating via a single communication technology, today's networks connect myriads of systems that are intrinsically different in their functioning and purpose. In addition, many of these devices communicate via different technologies or a combination of them at a time. All these developments, coupled with the geographical disparity of the physical infrastructure, give rise to network environments that are inherently dynamic and unpredictable. To cope with heterogeneous environments and the growing demands, network units have taken a leap from the paradigm of static functioning to that of adaptivity. In this thesis, we refer to adaptive network units as transition-based systems (TBSs) and the act of adapting is termed as transition. We note that TBSs not only reside in diverse environment conditions, their need to adapt also arises following different phenomena. Such phenomena are referred to as triggers and they can occur at different time scales. We additionally observe that the nature of a transition is dictated by the specified performance objective of the relevant TBS and we seek to build an analytical framework that helps us derive a policy for performance optimization. As the state of the art lacks a unified approach to modelling the diverse functioning of the TBSs and their varied performance objectives, we first propose a general framework based on the theory of Markov Decision Processes. This framework facilitates optimal policy derivation in TBSs in a principled manner. In addition, we note the importance of bespoke analyses in specific classes of TBSs where the general formulation leads to a high-dimensional optimization problem.
Specifically, we consider performance optimization in open systems employing parallelism and closed systems exploiting the benefits of service batching. In these examples, we resort to approximation techniques such as a mean-field limit for the state evolution whenever the underlying TBS deals with a large number of entities. Our formulation enables calculation of optimal policies and provides tangible alternatives to existing frameworks for Quality of Service evaluation. Compared to the state of the art, the derived policies facilitate transitions in Communication Systems that yield superior performance as shown through extensive evaluations in this thesis
Functional limit theorems, branching processes and stochastic networks
This manuscript describes some of the work I have been doing since 2010 and the end of my PhD. As the title suggests, it contains three main parts. 1. Functional limit theorems: Chapter 2 presents two theoretical results on the weak convergence of stochastic processes: one is a sufficient condition for the tightness of a sequence of stochastic processes and the other provides a sufficient condition for the weak convergence of a sequence of regenerative processes; 2. Branching processes: in Chapter 3, scaling limits of three particular types of branching processes are discussed: 1) Galton-Watson processes in varying environments, 2) binary and homogeneous Crump-Mode-Jagers processes and 3) Crump-Mode-Jagers processes with short edges;3. Stochastic networks: Chapter 4 presents three results on stochastic networks: 1) scaling limits of the M/G/1 Processor-Sharing queue length process, 2) study of a model of stochastic network with mobile customers and 3) heavy traffic delay performance of queue-based scheduling algorithms