19 research outputs found
Hyper-Scalable JSQ with Sparse Feedback
Load balancing algorithms play a vital role in enhancing performance in data
centers and cloud networks. Due to the massive size of these systems,
scalability challenges, and especially the communication overhead associated
with load balancing mechanisms, have emerged as major concerns. Motivated by
these issues, we introduce and analyze a novel class of load balancing schemes
where the various servers provide occasional queue updates to guide the load
assignment.
We show that the proposed schemes strongly outperform JSQ() strategies
with comparable communication overhead per job, and can achieve a vanishing
waiting time in the many-server limit with just one message per job, just like
the popular JIQ scheme. The proposed schemes are particularly geared however
towards the sparse feedback regime with less than one message per job, where
they outperform corresponding sparsified JIQ versions.
We investigate fluid limits for synchronous updates as well as asynchronous
exponential update intervals. The fixed point of the fluid limit is identified
in the latter case, and used to derive the queue length distribution. We also
demonstrate that in the ultra-low feedback regime the mean stationary waiting
time tends to a constant in the synchronous case, but grows without bound in
the asynchronous case
Optimal Hyper-Scalable Load Balancing with a Strict Queue Limit
Load balancing plays a critical role in efficiently dispatching jobs in
parallel-server systems such as cloud networks and data centers. A fundamental
challenge in the design of load balancing algorithms is to achieve an optimal
trade-off between delay performance and implementation overhead (e.g.
communication or memory usage). This trade-off has primarily been studied so
far from the angle of the amount of overhead required to achieve asymptotically
optimal performance, particularly vanishing delay in large-scale systems. In
contrast, in the present paper, we focus on an arbitrarily sparse communication
budget, possibly well below the minimum requirement for vanishing delay,
referred to as the hyper-scalable operating region. Furthermore, jobs may only
be admitted when a specific limit on the queue position of the job can be
guaranteed.
The centerpiece of our analysis is a universal upper bound for the achievable
throughput of any dispatcher-driven algorithm for a given communication budget
and queue limit. We also propose a specific hyper-scalable scheme which can
operate at any given message rate and enforce any given queue limit, while
allowing the server states to be captured via a closed product-form network, in
which servers act as customers traversing various nodes. The product-form
distribution is leveraged to prove that the bound is tight and that the
proposed hyper-scalable scheme is throughput-optimal in a many-server regime
given the communication and queue limit constraints. Extensive simulation
experiments are conducted to illustrate the results
Asymptotically optimal load balancing in large-scale heterogeneous systems with multiple dispatchers
We consider the load balancing problem in large-scale heterogeneous systems with multiple dispatchers. We introduce a general framework called Local-Estimation-Driven (LED). Under this framework, each dispatcher keeps local (possibly outdated) estimates of the queue lengths for all the servers, and the dispatching decision is made purely based on these local estimates. The local estimates are updated via infrequent communications between dispatchers and servers. We derive sufficient conditions for LED policies to achieve throughput optimality and delay optimality in heavy-traffic, respectively. These conditions directly imply delay optimality for many previous local-memory based policies in heavy traffic. Moreover, the results enable us to design new delay optimal policies for heterogeneous systems with multiple dispatchers. Finally, the heavy-traffic delay optimality of the LED framework also sheds light on a recent open question on how to design optimal load balancing schemes using delayed information
Improved Load Balancing in Large Scale Systems using Attained Service Time Reporting
Our interest lies in load balancing jobs in large scale systems consisting of
multiple dispatchers and FCFS servers. In the absence of any information on job
sizes, dispatchers typically use queue length information reported by the
servers to assign incoming jobs. When job sizes are highly variable, using only
queue length information is clearly suboptimal and performance can be improved
if some indication can be provided to the dispatcher about the size of an
ongoing job. In a FCFS server measuring the attained service time of the
ongoing job is easy and servers can therefore report this attained service time
together with the queue length when queried by a dispatcher.
In this paper we propose and analyse a variety of load balancing policies
that exploit both the queue length and attained service time to assign jobs, as
well as policies for which only the attained service time of the job in service
is used. We present a unified analysis for all these policies in a large scale
system under the usual asymptotic independence assumptions. The accuracy of the
proposed analysis is illustrated using simulation.
We present extensive numerical experiments which clearly indicate that a
significant improvement in waiting (and thus also in response) time may be
achieved by using the attained service time information on top of the queue
length of a server. Moreover, the policies which do not make use of the queue
length still provide an improved waiting time for moderately loaded systems
Scalable Load Balancing Algorithms in Networked Systems
A fundamental challenge in large-scale networked systems viz., data centers
and cloud networks is to distribute tasks to a pool of servers, using minimal
instantaneous state information, while providing excellent delay performance.
In this thesis we design and analyze load balancing algorithms that aim to
achieve a highly efficient distribution of tasks, optimize server utilization,
and minimize communication overhead.Comment: Ph.D. thesi
Aggregate matrix-analytic techniques and their applications
The complexity of computer systems affects the complexity of modeling techniques that can be used for their performance analysis. In this dissertation, we develop a set of techniques that are based on tractable analytic models and enable efficient performance analysis of computer systems. Our approach is three pronged: first, we propose new techniques to parameterize measurement data with Markovian-based stochastic processes that can be further used as input into queueing systems; second, we propose new methods to efficiently solve complex queueing models; and third, we use the proposed methods to evaluate the performance of clustered Web servers and propose new load balancing policies based on this analysis.;We devise two new techniques for fitting measurement data that exhibit high variability into Phase-type (PH) distributions. These techniques apply known fitting algorithms in a divide-and-conquer fashion. We evaluate the accuracy of our methods from both the statistics and the queueing systems perspective. In addition, we propose a new methodology for fitting measurement data that exhibit long-range dependence into Markovian Arrival Processes (MAPs).;We propose a new methodology, ETAQA, for the exact solution of M/G/1-type processes, (GI/M/1-type processes, and their intersection, i.e., quasi birth-death (QBD) processes. ETAQA computes an aggregate steady state probability distribution and a set of measures of interest. E TAQA is numerically stable and computationally superior to alternative solution methods. Apart from ETAQA, we propose a new methodology for the exact solution of a class of GI/G/1-type processes based on aggregation/decomposition.;Finally, we demonstrate the applicability of the proposed techniques by evaluating load balancing policies in clustered Web servers. We address the high variability in the service process of Web servers by dedicating the servers of a cluster to requests of similar sizes and propose new, content-aware load balancing policies. Detailed analysis shows that the proposed policies achieve high user-perceived performance and, by continuously adapting their scheduling parameters to the current workload characteristics, provide good performance under conditions of transient overload