Search CORE

2,703 research outputs found

Universality of Load Balancing Schemes on Diffusion Scale

Author: Borst S. C.
Mukherjee D.
van Leeuwaarden J. S. H.
Whiting P. A.
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2015
Field of study

We consider a system of

N

parallel queues with identical exponential service rates and a single dispatcher where tasks arrive as a Poisson process. When a task arrives, the dispatcher always assigns it to an idle server, if there is any, and to a server with the shortest queue among

d

randomly selected servers otherwise

(1 \leq d \leq N)

. This load balancing scheme subsumes the so-called Join-the-Idle Queue (JIQ) policy

(d = 1)

and the celebrated Join-the-Shortest Queue (JSQ) policy

(d = N)

as two crucial special cases. We develop a stochastic coupling construction to obtain the diffusion limit of the queue process in the Halfin-Whitt heavy-traffic regime, and establish that it does not depend on the value of

d

, implying that assigning tasks to idle servers is sufficient for diffusion level optimality

arXiv.org e-Print Archive

Repository TU/e

Pure OAI Repository

Macquarie University ResearchOnline

Join-Idle-Queue with Service Elasticity: Large-Scale Asymptotics of a Non-monotone System

Author: Mukherjee Debankur
Stolyar Alexander
Publication venue: 'Institute for Operations Research and the Management Sciences (INFORMS)'
Publication date: 01/03/2018
Field of study

We consider the model of a token-based joint auto-scaling and load balancing strategy, proposed in a recent paper by Mukherjee, Dhara, Borst, and van Leeuwaarden (SIGMETRICS '17, arXiv:1703.08373), which offers an efficient scalable implementation and yet achieves asymptotically optimal steady-state delay performance and energy consumption as the number of servers

N\to\infty

. In the above work, the asymptotic results are obtained under the assumption that the queues have fixed-size finite buffers, and therefore the fundamental question of stability of the proposed scheme with infinite buffers was left open. In this paper, we address this fundamental stability question. The system stability under the usual subcritical load assumption is not automatic. Moreover, the stability may not even hold for all

N

. The key challenge stems from the fact that the process lacks monotonicity, which has been the powerful primary tool for establishing stability in load balancing models. We develop a novel method to prove that the subcritically loaded system is stable for large enough

N

, and establish convergence of steady-state distributions to the optimal one, as

N \to \infty

. The method goes beyond the state of the art techniques -- it uses an induction-based idea and a "weak monotonicity" property of the model; this technique is of independent interest and may have broader applicability.Comment: 30 page

arXiv.org e-Print Archive

Pure OAI Repository

Asymptotically Optimal Load Balancing Topologies

Author: Borst Sem C.
Mukherjee Debankur
van Leeuwaarden Johan S. H.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

We consider a system of

N

servers inter-connected by some underlying graph topology

G_N

. Tasks arrive at the various servers as independent Poisson processes of rate

\lambda

. Each incoming task is irrevocably assigned to whichever server has the smallest number of tasks among the one where it appears and its neighbors in

G_N

. Tasks have unit-mean exponential service times and leave the system upon service completion. The above model has been extensively investigated in the case

G_N

is a clique. Since the servers are exchangeable in that case, the queue length process is quite tractable, and it has been proved that for any

\lambda < 1

, the fraction of servers with two or more tasks vanishes in the limit as

N \to \infty

. For an arbitrary graph

G_N

, the lack of exchangeability severely complicates the analysis, and the queue length process tends to be worse than for a clique. Accordingly, a graph

G_N

is said to be

N

-optimal or

\sqrt{N}

-optimal when the occupancy process on

G_N

is equivalent to that on a clique on an

N

-scale or

\sqrt{N}

-scale, respectively. We prove that if

G_N

is an Erd\H{o}s-R\'enyi random graph with average degree

d(N)

, then it is with high probability

N

-optimal and

\sqrt{N}

-optimal if

d(N) \to \infty

and

d(N) / (\sqrt{N} \log(N)) \to \infty

N \to \infty

, respectively. This demonstrates that optimality can be maintained at

N

-scale and

\sqrt{N}

-scale while reducing the number of connections by nearly a factor

N

and

\sqrt{N} / \log(N)

compared to a clique, provided the topology is suitably random. It is further shown that if

G_N

contains

\Theta(N)

bounded-degree nodes, then it cannot be

N

-optimal. In addition, we establish that an arbitrary graph

G_N

N

-optimal when its minimum degree is

N - o(N)

, and may not be

N

-optimal even when its minimum degree is

c N + o(N)

for any

0 < c < 1/2

.Comment: A few relevant results from arXiv:1612.00723 are included for convenienc

arXiv.org e-Print Archive

Crossref

Repository TU/e

Pure OAI Repository

Asymptotic optimality of power-of- $d$ load balancing in large-scale systems

Author: Borst S.C.
Mukherjee D.
van Leeuwaarden J.S.H.
Whiting P.A.
Publication venue
Publication date: 01/01/2016
Field of study

We consider a system of

N

identical server pools and a single dispatcher where tasks arrive as a Poisson process of rate

\lambda(N)

. Arriving tasks cannot be queued, and must immediately be assigned to one of the server pools to start execution, or discarded. The execution times are assumed to be exponentially distributed with unit mean, and do not depend on the number of other tasks receiving service. However, the experienced performance (e.g. in terms of received throughput) does degrade with an increasing number of concurrent tasks at the same server pool. The dispatcher therefore aims to evenly distribute the tasks across the various server pools. Specifically, when a task arrives, the dispatcher assigns it to the server pool with the minimum number of tasks among

d(N)

randomly selected server pools. This assignment strategy is called the JSQ

(d(N))

scheme, as it resembles the power-of-

d

version of the Join-the-Shortest-Queue (JSQ) policy, and will also be referred to as such in the special case

d(N) = N

. We construct a stochastic coupling to bound the difference in the system occupancy processes between the JSQ policy and a scheme with an arbitrary value of

d(N)

. We use the coupling to derive the fluid limit in case

d(N) \to \infty

and

\lambda(N)/N \to \lambda

N \to \infty

, along with the associated fixed point. The fluid limit turns out to be insensitive to the exact growth rate of

d(N)

, and coincides with that for the JSQ policy. We further leverage the coupling to establish that the diffusion limit corresponds to that for the JSQ policy as well, as long as

d(N)/\sqrt{N} \log(N) \to \infty

, and characterize the common limiting diffusion process. These results indicate that the JSQ optimality can be preserved at the fluid-level and diffusion-level while reducing the overhead by nearly a factor O(

N

) and O(

\sqrt{N}/\log(N)

), respectively

arXiv.org e-Print Archive

Repository TU/e

Pure OAI Repository

Asymptotically optimal load balancing in large-scale heterogeneous systems with multiple dispatchers

Author: Shroff Ness
Wierman Adam
Zhou Xingyu
Publication venue: 'Elsevier BV'
Publication date: 20/02/2020
Field of study

We consider the load balancing problem in large-scale heterogeneous systems with multiple dispatchers. We introduce a general framework called Local-Estimation-Driven (LED). Under this framework, each dispatcher keeps local (possibly outdated) estimates of the queue lengths for all the servers, and the dispatching decision is made purely based on these local estimates. The local estimates are updated via infrequent communications between dispatchers and servers. We derive sufficient conditions for LED policies to achieve throughput optimality and delay optimality in heavy-traffic, respectively. These conditions directly imply delay optimality for many previous local-memory based policies in heavy traffic. Moreover, the results enable us to design new delay optimal policies for heterogeneous systems with multiple dispatchers. Finally, the heavy-traffic delay optimality of the LED framework also sheds light on a recent open question on how to design optimal load balancing schemes using delayed information

arXiv.org e-Print Archive

Caltech Authors

Large-scale Join-Idle-Queue system with general service times

Author: Foss Sergey
Stolyar Alexander
Publication venue
Publication date: 14/02/2017
Field of study

A parallel server system with

n

identical servers is considered. The service time distribution has a finite mean

1/\mu

, but otherwise is arbitrary. Arriving customers are be routed to one of the servers immediately upon arrival. Join-Idle-Queue routing algorithm is studied, under which an arriving customer is sent to an idle server, if such is available, and to a randomly uniformly chosen server, otherwise. We consider the asymptotic regime where

n\to\infty

and the customer input flow rate is

\lambda n

. Under the condition

\lambda/\mu<1/2

, we prove that, as

n\to\infty

, the sequence of (appropriately scaled) stationary distributions concentrates at the natural equilibrium point, with the fraction of occupied servers being constant equal

\lambda/\mu

. In particular, this implies that the steady-state probability of an arriving customer waiting for service vanishes.Comment: Revision. 11 page

arXiv.org e-Print Archive

Heriot Watt Pure

Hyper-Scalable JSQ with Sparse Feedback

Author: Borst Sem
van der Boor Mark
van Leeuwaarden Johan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 06/03/2019
Field of study

Load balancing algorithms play a vital role in enhancing performance in data centers and cloud networks. Due to the massive size of these systems, scalability challenges, and especially the communication overhead associated with load balancing mechanisms, have emerged as major concerns. Motivated by these issues, we introduce and analyze a novel class of load balancing schemes where the various servers provide occasional queue updates to guide the load assignment. We show that the proposed schemes strongly outperform JSQ(

d

) strategies with comparable communication overhead per job, and can achieve a vanishing waiting time in the many-server limit with just one message per job, just like the popular JIQ scheme. The proposed schemes are particularly geared however towards the sparse feedback regime with less than one message per job, where they outperform corresponding sparsified JIQ versions. We investigate fluid limits for synchronous updates as well as asynchronous exponential update intervals. The fixed point of the fluid limit is identified in the latter case, and used to derive the queue length distribution. We also demonstrate that in the ultra-low feedback regime the mean stationary waiting time tends to a constant in the synchronous case, but grows without bound in the asynchronous case

arXiv.org e-Print Archive

Crossref

Pure OAI Repository