27 research outputs found
Analysis of join-the-shortest-queue routing for web server farms
Join the Shortest Queue (JSQ) is a popular routing policy for server farms. However, until now all analysis of JSQ has been limited to First-Come-First-Serve (FCFS) server farms, whereas it is known that web server farms are better modeled as Processor Sharing (PS) server farms. We provide the first approximate analysis of JSQ in the PS server farm model for general job-size distributions, obtaining the distribution of queue length at each queue. To do this, we approximate the queue length of each queue in the server farm by a one-dimensional Markov chain, in a novel fashion. We also discover some interesting insensitivity properties of PS server farms with JSQ routing, and discuss the near-optimality of JSQ
Approximate performance analysis of generalized join the shortest queue routing
In this paper we propose a highly accurate approximate performance analysis
of a heterogeneous server system with a processor sharing service discipline
and a general job-size distribution under a generalized join the shortest queue
(GJSQ) routing protocol. The GJSQ routing protocol is a natural extension of
the well-known join the shortest queue routing policy that takes into account
the non-identical service rates in addition to the number of jobs at each
server. The performance metrics that are of interest here are the equilibrium
distribution and the mean and standard deviation of the number of jobs at each
server. We show that the latter metrics are near-insensitive to the job-size
distribution using simulation experiments. By applying a single queue
approximation we model each server as a single server queue with a
state-dependent arrival process, independent of other servers in the system,
and derive the distribution of the number of jobs at the server. These
state-dependent arrival rates are intended to capture the inherent correlation
between servers in the original system and behave in a rather atypical way.Comment: 16 pages, 5 figures -- version 2 incorporates minor textual change
Analysis of randomized join-the-shortest-queue (JSQ) schemes in large heterogeneous processor-sharing systems
In this paper, we investigate the stability and performance
of randomized dynamic routing schemes for jobs based on
the Join-the-Shortest Queue (JSQ) criterion in a heterogeneous
system of many parallel servers. In particular, we consider servers
that use processor sharing but with different server rates, and
jobs are routed to the server with the smallest occupancy among
a finite number of randomly sampled servers. We focus on the
case of two servers that is often referred to as a Power-of-Two
scheme. We first show that in the heterogeneous setting, uniform
sampling of servers can cause a loss in the stability region and thus
such randomized dynamic schemes need not outperform static
randomized schemes in terms of mean delay in opposition to
the homogeneous case of equal server speeds where the stability
region is maximal and coincides with that of the static randomized
routing. We explicitly characterize the stationary distributions
of the server occupancies and show that the tail distribution
of the server occupancy has a super-exponential behavior as in
the homogeneous case as the number of servers goes to infinity.
To overcome the stability issue, we show that it is possible to
combine the static state-independent scheme with a randomized
JSQ scheme that allows us to recover the maximal stability region
combined with the benefits of JSQ, and such a scheme is preferable
in terms of average delay. The techniques are based on a mean field
analysis where we show that the stationary distributions coincide
with those obtained under asymptotic independence of the servers
and, moreover, the stationary distributions are insensitive to the
job-size distribution
On Occupancy Based Randomized Load Balancing for Large Systems with General Distributions
Multi-server architectures are ubiquitous in today's information infrastructure whether for supporting cloud services, web servers, or for distributed storage. The performance of multi-server systems is highly dependent on the load distribution. This is affected by the use of load balancing strategies. Since both latency and blocking are important features, it is most reasonable to route an incoming job to a server that is lightly loaded. Hence a good load balancing policy should be dependent on the states of servers. Since obtaining information about the remaining workload of servers for every arrival is very hard, it is preferable to design load balancing policies that depend on occupancy or the number of progressing jobs of servers. Furthermore, if the system has a large number of servers, it is not practical to use the occupancy information of all the servers to dispatch or route an arrival due to high communication cost. In large-scale systems that have tens of thousands of servers, the policies which use the occupancy information of only a finite number of randomly selected servers to dispatch an arrival result in lower implementation cost than the policies which use the occupancy information of all the servers. Such policies are referred to as occupancy based randomized load balancing policies.
Motivated by cloud computing systems and web-server farms, we study two types of models. In the first model, each server is an Erlang loss server, and this model is an abstraction of Infrastructure-as-a-Service (IaaS) clouds. The second model we consider is one with processor sharing servers that is an abstraction of web-server farms which serve requests in a round-robin manner with small time granularity. The performance criterion for web-servers is the response time or the latency for the request to be processed. In most prior works, the analysis of these models was restricted to the case of exponential job length distributions and in this dissertation we study the case of general job length distributions.
To analyze the impact of a load balancing policy, we need to develop models for the system's dynamics. In this dissertation, we show that one can construct useful Markovian models. For occupancy based randomized routing policies, due to complex inter-dependencies between servers, an exact analysis is mostly intractable. However, we show that the multi-server systems that have an occupancy based randomized load balancing policy are examples of weakly interacting particle systems. In these systems, servers are interacting particles whose states lie in an uncountable state space. We develop a mean-field analysis to understand a server's behavior as the number of servers becomes large. We show that under certain assumptions, as the number of servers increases, the sequence of empirical measure-valued Markov processes which model the systems' dynamics converges to a deterministic measure-valued process referred to as the mean-field limit. We observe that the mean-field equations correspond to the dynamics of the distribution of a non-linear Markov process. A consequence of having the mean-field limit is that under minor and natural assumptions on the initial states of servers, any finite set of servers can be shown to be independent of each other as the number of servers goes to infinity. Furthermore, the mean-field limit approximates each server's distribution in the transient regime when the number of servers is large.
A salient feature of loss and processor sharing systems in the setting where their time evolution can be modeled by reversible Markov processes is that their stationary occupancy distribution is insensitive to the type of job length distribution; it depends only on the average job length but not on the type of the distribution. This property does not hold when the number of servers is finite in our context due to lack of reversibility. We show however that the fixed-point of the mean-field is insensitive to the job length distributions for all occupancy based randomized load balancing policies when the fixed-point is unique for job lengths that have exponential distributions. We also provide some deeper insights into the relationship between the mean-field and the distributions of servers and the empirical measure in the stationary regime.
Finally, we address the accuracy of mean-field approximations in the case of loss models. To do so we establish a functional central limit theorem under the assumption that the job lengths have exponential distributions. We show that a suitably scaled fluctuation of the stochastic empirical process around the mean-field converges to an Ornstein-Uhlenbeck process. Our analysis is also valid for the Halfin-Whitt regime in which servers are critically loaded. We then exploit the functional central limit theorem to quantify the error between the actual blocking probability of the system with a large number of servers and the blocking probability obtained from the fixed-point of the mean-field. In the Halfin-Whitt regime, the error is of the order inverse square root of the number of servers. On the other hand, for a light load regime, the error is smaller than the inverse square root of the number of servers
Towards Optimality in Parallel Scheduling
To keep pace with Moore's law, chip designers have focused on increasing the
number of cores per chip rather than single core performance. In turn, modern
jobs are often designed to run on any number of cores. However, to effectively
leverage these multi-core chips, one must address the question of how many
cores to assign to each job. Given that jobs receive sublinear speedups from
additional cores, there is an obvious tradeoff: allocating more cores to an
individual job reduces the job's runtime, but in turn decreases the efficiency
of the overall system. We ask how the system should schedule jobs across cores
so as to minimize the mean response time over a stream of incoming jobs.
To answer this question, we develop an analytical model of jobs running on a
multi-core machine. We prove that EQUI, a policy which continuously divides
cores evenly across jobs, is optimal when all jobs follow a single speedup
curve and have exponentially distributed sizes. EQUI requires jobs to change
their level of parallelization while they run. Since this is not possible for
all workloads, we consider a class of "fixed-width" policies, which choose a
single level of parallelization, k, to use for all jobs. We prove that,
surprisingly, it is possible to achieve EQUI's performance without requiring
jobs to change their levels of parallelization by using the optimal fixed level
of parallelization, k*. We also show how to analytically derive the optimal k*
as a function of the system load, the speedup curve, and the job size
distribution.
In the case where jobs may follow different speedup curves, finding a good
scheduling policy is even more challenging. We find that policies like EQUI
which performed well in the case of a single speedup function now perform
poorly. We propose a very simple policy, GREEDY*, which performs near-optimally
when compared to the numerically-derived optimal policy
Load Balancing in the Non-Degenerate Slowdown Regime
We analyse Join-the-Shortest-Queue in a contemporary scaling regime known as
the Non-Degenerate Slowdown regime. Join-the-Shortest-Queue (JSQ) is a
classical load balancing policy for queueing systems with multiple parallel
servers. Parallel server queueing systems are regularly analysed and
dimensioned by diffusion approximations achieved in the Halfin-Whitt scaling
regime. However, when jobs must be dispatched to a server upon arrival, we
advocate the Non-Degenerate Slowdown regime (NDS) to compare different
load-balancing rules.
In this paper we identify novel diffusion approximation and timescale
separation that provides insights into the performance of JSQ. We calculate the
price of irrevocably dispatching jobs to servers and prove this to within 15%
(in the NDS regime) of the rules that may manoeuvre jobs between servers. We
also compare ours results for the JSQ policy with the NDS approximations of
many modern load balancing policies such as Idle-Queue-First and
Power-of--choices policies which act as low information proxies for the JSQ
policy. Our analysis leads us to construct new rules that have identical
performance to JSQ but require less communication overhead than
power-of-2-choices.Comment: Revised journal submission versio
Optimal file splitting for wireless networks with concurrent access
Abstract. The fundamental limits on channel capacity form a barrier to the sustained growth on the use of wireless networks. To cope with this, multi-path communication solutions provide a promising means to improve reliability and boost Quality of Service (QoS) in areas that are covered by a multitude of wireless access networks. Today, little is known about how to effectively exploit this potential. Motivated by this, we consider N parallel communication networks, each of which is modeled as a processor sharing (PS) queue that handles two types of traffic: foreground and background. We consider a foreground traffic stream of files, each of which is split into N fragments according to a fixed splitting rule (α1, . . . , αN ), where P αi = 1 and αi ≥ 0 is the fraction of the file that is directed to network i. Upon completion of transmission of all fragments of a file, it is re-assembled at the receiving end. The background streams use dedicated networks without being split. We study the sojourn time tail behavior of the foreground traffic. For the case of light foreground traffic and regularly varying foreground filesize distributions, we obtain a reduced-load approximation (RLA) for the sojourn times, similar to that of a single PS-queue. An important implication of the RLA is that the tail-optimal splitting rule is simply to choose αi proportional to ci − ρi, where ci is the capacity of network i and ρi is the load offered to network i by the corresponding background stream. This result provides a theoretical foundation for the effectiveness of such a simple splitting rule. Extensive simulations demonstrate that this simple rule indeed performs well, not only with respect to the tail asymptotics, but also with respect to the mean sojourn times. The simulations further support our conjecture that the same splitting rule is also tail-optimal for non-light foreground traffic. Finally, we observe near-insensitivity of the mean sojourn times with respect to the file-size distribution