2,703 research outputs found
Universality of Load Balancing Schemes on Diffusion Scale
We consider a system of parallel queues with identical exponential
service rates and a single dispatcher where tasks arrive as a Poisson process.
When a task arrives, the dispatcher always assigns it to an idle server, if
there is any, and to a server with the shortest queue among randomly
selected servers otherwise . This load balancing scheme
subsumes the so-called Join-the-Idle Queue (JIQ) policy and the
celebrated Join-the-Shortest Queue (JSQ) policy as two crucial
special cases. We develop a stochastic coupling construction to obtain the
diffusion limit of the queue process in the Halfin-Whitt heavy-traffic regime,
and establish that it does not depend on the value of , implying that
assigning tasks to idle servers is sufficient for diffusion level optimality
Join-Idle-Queue with Service Elasticity: Large-Scale Asymptotics of a Non-monotone System
We consider the model of a token-based joint auto-scaling and load balancing
strategy, proposed in a recent paper by Mukherjee, Dhara, Borst, and van
Leeuwaarden (SIGMETRICS '17, arXiv:1703.08373), which offers an efficient
scalable implementation and yet achieves asymptotically optimal steady-state
delay performance and energy consumption as the number of servers .
In the above work, the asymptotic results are obtained under the assumption
that the queues have fixed-size finite buffers, and therefore the fundamental
question of stability of the proposed scheme with infinite buffers was left
open. In this paper, we address this fundamental stability question. The system
stability under the usual subcritical load assumption is not automatic.
Moreover, the stability may not even hold for all . The key challenge stems
from the fact that the process lacks monotonicity, which has been the powerful
primary tool for establishing stability in load balancing models. We develop a
novel method to prove that the subcritically loaded system is stable for large
enough , and establish convergence of steady-state distributions to the
optimal one, as . The method goes beyond the state of the art
techniques -- it uses an induction-based idea and a "weak monotonicity"
property of the model; this technique is of independent interest and may have
broader applicability.Comment: 30 page
Asymptotically Optimal Load Balancing Topologies
We consider a system of servers inter-connected by some underlying graph
topology . Tasks arrive at the various servers as independent Poisson
processes of rate . Each incoming task is irrevocably assigned to
whichever server has the smallest number of tasks among the one where it
appears and its neighbors in . Tasks have unit-mean exponential service
times and leave the system upon service completion.
The above model has been extensively investigated in the case is a
clique. Since the servers are exchangeable in that case, the queue length
process is quite tractable, and it has been proved that for any ,
the fraction of servers with two or more tasks vanishes in the limit as . For an arbitrary graph , the lack of exchangeability severely
complicates the analysis, and the queue length process tends to be worse than
for a clique. Accordingly, a graph is said to be -optimal or
-optimal when the occupancy process on is equivalent to that on
a clique on an -scale or -scale, respectively.
We prove that if is an Erd\H{o}s-R\'enyi random graph with average
degree , then it is with high probability -optimal and
-optimal if and as , respectively. This demonstrates that optimality can
be maintained at -scale and -scale while reducing the number of
connections by nearly a factor and compared to a
clique, provided the topology is suitably random. It is further shown that if
contains bounded-degree nodes, then it cannot be -optimal.
In addition, we establish that an arbitrary graph is -optimal when its
minimum degree is , and may not be -optimal even when its minimum
degree is for any .Comment: A few relevant results from arXiv:1612.00723 are included for
convenienc
Asymptotic optimality of power-of- load balancing in large-scale systems
We consider a system of identical server pools and a single dispatcher where tasks arrive as a Poisson process of rate . Arriving tasks cannot be queued, and must immediately be assigned to one of the server pools to start execution, or discarded. The execution times are assumed to be exponentially distributed with unit mean, and do not depend on the number of other tasks receiving service. However, the experienced performance (e.g. in terms of received throughput) does degrade with an increasing number of concurrent tasks at the same server pool. The dispatcher therefore aims to evenly distribute the tasks across the various server pools. Specifically, when a task arrives, the dispatcher assigns it to the server pool with the minimum number of tasks among randomly selected server pools. This assignment strategy is called the JSQ scheme, as it resembles the power-of- version of the Join-the-Shortest-Queue (JSQ) policy, and will also be referred to as such in the special case . We construct a stochastic coupling to bound the difference in the system occupancy processes between the JSQ policy and a scheme with an arbitrary value of . We use the coupling to derive the fluid limit in case and as , along with the associated fixed point. The fluid limit turns out to be insensitive to the exact growth rate of , and coincides with that for the JSQ policy. We further leverage the coupling to establish that the diffusion limit corresponds to that for the JSQ policy as well, as long as , and characterize the common limiting diffusion process. These results indicate that the JSQ optimality can be preserved at the fluid-level and diffusion-level while reducing the overhead by nearly a factor O() and O(), respectively
Asymptotically optimal load balancing in large-scale heterogeneous systems with multiple dispatchers
We consider the load balancing problem in large-scale heterogeneous systems with multiple dispatchers. We introduce a general framework called Local-Estimation-Driven (LED). Under this framework, each dispatcher keeps local (possibly outdated) estimates of the queue lengths for all the servers, and the dispatching decision is made purely based on these local estimates. The local estimates are updated via infrequent communications between dispatchers and servers. We derive sufficient conditions for LED policies to achieve throughput optimality and delay optimality in heavy-traffic, respectively. These conditions directly imply delay optimality for many previous local-memory based policies in heavy traffic. Moreover, the results enable us to design new delay optimal policies for heterogeneous systems with multiple dispatchers. Finally, the heavy-traffic delay optimality of the LED framework also sheds light on a recent open question on how to design optimal load balancing schemes using delayed information
Large-scale Join-Idle-Queue system with general service times
A parallel server system with identical servers is considered. The
service time distribution has a finite mean , but otherwise is
arbitrary. Arriving customers are be routed to one of the servers immediately
upon arrival. Join-Idle-Queue routing algorithm is studied, under which an
arriving customer is sent to an idle server, if such is available, and to a
randomly uniformly chosen server, otherwise. We consider the asymptotic regime
where and the customer input flow rate is . Under the
condition , we prove that, as , the sequence of
(appropriately scaled) stationary distributions concentrates at the natural
equilibrium point, with the fraction of occupied servers being constant equal
. In particular, this implies that the steady-state probability of
an arriving customer waiting for service vanishes.Comment: Revision. 11 page
Hyper-Scalable JSQ with Sparse Feedback
Load balancing algorithms play a vital role in enhancing performance in data
centers and cloud networks. Due to the massive size of these systems,
scalability challenges, and especially the communication overhead associated
with load balancing mechanisms, have emerged as major concerns. Motivated by
these issues, we introduce and analyze a novel class of load balancing schemes
where the various servers provide occasional queue updates to guide the load
assignment.
We show that the proposed schemes strongly outperform JSQ() strategies
with comparable communication overhead per job, and can achieve a vanishing
waiting time in the many-server limit with just one message per job, just like
the popular JIQ scheme. The proposed schemes are particularly geared however
towards the sparse feedback regime with less than one message per job, where
they outperform corresponding sparsified JIQ versions.
We investigate fluid limits for synchronous updates as well as asynchronous
exponential update intervals. The fixed point of the fluid limit is identified
in the latter case, and used to derive the queue length distribution. We also
demonstrate that in the ultra-low feedback regime the mean stationary waiting
time tends to a constant in the synchronous case, but grows without bound in
the asynchronous case
- …