Search CORE

13 research outputs found

Large-scale Join-Idle-Queue system with general service times

Author: Foss Sergey
Stolyar Alexander
Publication venue
Publication date: 14/02/2017
Field of study

A parallel server system with

n

identical servers is considered. The service time distribution has a finite mean

1/\mu

, but otherwise is arbitrary. Arriving customers are be routed to one of the servers immediately upon arrival. Join-Idle-Queue routing algorithm is studied, under which an arriving customer is sent to an idle server, if such is available, and to a randomly uniformly chosen server, otherwise. We consider the asymptotic regime where

n\to\infty

and the customer input flow rate is

\lambda n

. Under the condition

\lambda/\mu<1/2

, we prove that, as

n\to\infty

, the sequence of (appropriately scaled) stationary distributions concentrates at the natural equilibrium point, with the fraction of occupied servers being constant equal

\lambda/\mu

. In particular, this implies that the steady-state probability of an arriving customer waiting for service vanishes.Comment: Revision. 11 page

arXiv.org e-Print Archive

Heriot Watt Pure

Crossref

Steady-State Analysis of Load Balancing with Coxian- $2$ Distributed Service Times

Author: Benny V. H.
Lei Y.
Miklós T.
Mor H.‐B.
Reza A.
Takayuki O.
Tim H.
Vvedenskaya N. D.
Wentao W.
Wentao W.
Publication venue
Publication date: 17/02/2021
Field of study

This paper studies load balancing for many-server (

N

servers) systems. Each server has a buffer of size

b-1,

and can have at most one job in service and

b-1

jobs in the buffer. The service time of a job follows the Coxian-2 distribution. We focus on steady-state performance of load balancing policies in the heavy traffic regime such that the normalized load of system is

\lambda = 1 - N^{-\alpha}

for

0<\alpha<0.5.

We identify a set of policies that achieve asymptotic zero waiting. The set of policies include several classical policies such as join-the-shortest-queue (JSQ), join-the-idle-queue (JIQ), idle-one-first (I1F) and power-of-

d

-choices (Po

d

) with

d=O(N^\alpha\log N)

. The proof of the main result is based on Stein's method and state space collapse. A key technical contribution of this paper is the iterative state space collapse approach that leads to a simple generator approximation when applying Stein's method

arXiv.org e-Print Archive

Crossref

Deep Blue Documents at the University of Michigan

Join-Idle-Queue with Service Elasticity: Large-Scale Asymptotics of a Non-monotone System

Author: Mukherjee Debankur
Stolyar Alexander
Publication venue: 'Institute for Operations Research and the Management Sciences (INFORMS)'
Publication date: 01/03/2018
Field of study

We consider the model of a token-based joint auto-scaling and load balancing strategy, proposed in a recent paper by Mukherjee, Dhara, Borst, and van Leeuwaarden (SIGMETRICS '17, arXiv:1703.08373), which offers an efficient scalable implementation and yet achieves asymptotically optimal steady-state delay performance and energy consumption as the number of servers

N\to\infty

. In the above work, the asymptotic results are obtained under the assumption that the queues have fixed-size finite buffers, and therefore the fundamental question of stability of the proposed scheme with infinite buffers was left open. In this paper, we address this fundamental stability question. The system stability under the usual subcritical load assumption is not automatic. Moreover, the stability may not even hold for all

N

. The key challenge stems from the fact that the process lacks monotonicity, which has been the powerful primary tool for establishing stability in load balancing models. We develop a novel method to prove that the subcritically loaded system is stable for large enough

N

, and establish convergence of steady-state distributions to the optimal one, as

N \to \infty

. The method goes beyond the state of the art techniques -- it uses an induction-based idea and a "weak monotonicity" property of the model; this technique is of independent interest and may have broader applicability.Comment: 30 page

arXiv.org e-Print Archive

Pure OAI Repository

Performance Analysis of Load Balancing Policies with Memory

Author: Hellemans Tim
Van Houdt Benny
Publication venue
Publication date: 01/01/2020
Field of study

Joining the shortest or least loaded queue among

d

randomly selected queues are two fundamental load balancing policies. Under both policies the dispatcher does not maintain any information on the queue length or load of the servers. In this paper we analyze the performance of these policies when the dispatcher has some memory available to store the ids of some of the idle servers. We consider methods where the dispatcher discovers idle servers as well as methods where idle servers inform the dispatcher about their state. We focus on large-scale systems and our analysis uses the cavity method. The main insight provided is that the performance measures obtained via the cavity method for a load balancing policy {\it with} memory reduce to the performance measures for the same policy {\it without} memory provided that the arrival rate is properly scaled. Thus, we can study the performance of load balancers with memory in the same manner as load balancers without memory. In particular this entails closed form solutions for joining the shortest or least loaded queue among

d

randomly selected queues with memory in case of exponential job sizes. Moreover, we obtain a simple closed form expression for the (scaled) expected waiting time as the system tends towards instability. We present simulation results that support our belief that the approximation obtained by the cavity method becomes exact as the number of servers tends to infinity.Comment: 30 pages, 3 figure

arXiv.org e-Print Archive

Institutional Repository Universiteit Antwerpen

Delay, memory, and messaging tradeoffs in distributed service systems

Author: Bertsimas D.
David Gamarnik
John N. Tsitsiklis
Lobanov S.
Martin Zubeldia
Mitzenmacher M.
Vvedenskaya N.
Publication venue
Publication date: 01/06/2016
Field of study

We consider the following distributed service model: jobs with unit mean, exponentially distributed, and independent processing times arrive as a Poisson process of rate

\lambda n

, with

0<\lambda<1

, and are immediately dispatched by a centralized dispatcher to one of

n

First-In-First-Out queues associated with

n

identical servers. The dispatcher is endowed with a finite memory, and with the ability to exchange messages with the servers. We propose and study a resource-constrained "pull-based" dispatching policy that involves two parameters: (i) the number of memory bits available at the dispatcher, and (ii) the average rate at which servers communicate with the dispatcher. We establish (using a fluid limit approach) that the asymptotic, as

n\to\infty

, expected queueing delay is zero when either (i) the number of memory bits grows logarithmically with

n

and the message rate grows superlinearly with

n

, or (ii) the number of memory bits grows superlogarithmically with

n

and the message rate is at least

\lambda n

. Furthermore, when the number of memory bits grows only logarithmically with

n

and the message rate is proportional to

n

, we obtain a closed-form expression for the (now positive) asymptotic delay. Finally, we demonstrate an interesting phase transition in the resource-constrained regime where the asymptotic delay is non-zero. In particular, we show that for any given

\alpha>0

(no matter how small), if our policy only uses a linear message rate

\alpha n

, the resulting asymptotic delay is upper bounded, uniformly over all

\lambda<1

; this is in sharp contrast to the delay obtained when no messages are used (

\alpha = 0

), which grows as

1/(1-\lambda)

when

\lambda\uparrow 1

, or when the popular power-of-

d

-choices is used, in which the delay grows as

\log(1/(1-\lambda))

arXiv.org e-Print Archive

Crossref

DSpace@MIT

Load Balancing in the Non-Degenerate Slowdown Regime

Author: Gupta Varun
Walton Neil
Publication venue
Publication date: 12/02/2018
Field of study

We analyse Join-the-Shortest-Queue in a contemporary scaling regime known as the Non-Degenerate Slowdown regime. Join-the-Shortest-Queue (JSQ) is a classical load balancing policy for queueing systems with multiple parallel servers. Parallel server queueing systems are regularly analysed and dimensioned by diffusion approximations achieved in the Halfin-Whitt scaling regime. However, when jobs must be dispatched to a server upon arrival, we advocate the Non-Degenerate Slowdown regime (NDS) to compare different load-balancing rules. In this paper we identify novel diffusion approximation and timescale separation that provides insights into the performance of JSQ. We calculate the price of irrevocably dispatching jobs to servers and prove this to within 15% (in the NDS regime) of the rules that may manoeuvre jobs between servers. We also compare ours results for the JSQ policy with the NDS approximations of many modern load balancing policies such as Idle-Queue-First and Power-of-

d

-choices policies which act as low information proxies for the JSQ policy. Our analysis leads us to construct new rules that have identical performance to JSQ but require less communication overhead than power-of-2-choices.Comment: Revised journal submission versio

arXiv.org e-Print Archive

The University of Manchester - Institutional Repository

Delay, Memory, and Messaging Tradeoffs in Distributed Service Systems

Author: Gamarnik David
Tsitsiklis John N.
Zubeldia Martin
Publication venue: 'Institute for Operations Research and the Management Sciences (INFORMS)'
Publication date: 01/03/2018
Field of study

We consider the following distributed service model: jobs with unit mean, exponentially distributed, and independent processing times arrive as a Poisson process of rate λn, with 0 < λ < 1, and are immediately dispatched by a centralized dispatcher to one of n First-In-First-Out queues associated with n identical servers. The dispatcher is endowed with a finite memory, and with the ability to exchange messages with the servers. We propose and study a resource-constrained “pull-based” dispatching policy that involves two parameters: (i) the number of memory bits available at the dispatcher, and (ii) the average rate at which servers communicate with the dispatcher. We establish (using a fluid limit approach) that the asymptotic, as n → ∞, expected queueing delay is zero when either (i) the number of memory bits grows logarithmically with n and the message rate grows superlinearly with n, or (ii) the number of memory bits grows superlogarithmically with n and the message rate is at least λn. Furthermore, when the number of memory bits grows only logarithmically with n and the message rate is proportional to n, we obtain a closed-form expression for the (now positive) asymptotic delay. Finally, we demonstrate an interesting phase transition in the resource-constrained regime where the asymptotic delay is non-zero. In particular, we show that for any given α > 0 (no matter how small), if our policy only uses a linear message rate αn, the resulting asymptotic delay is upper bounded, uniformly over all λ < 1; this is in sharp contrast to the delay obtained when no messages are used (α = 0), which grows as 1/(1 − λ) when λ ↑ 1, or when the popular power-of-d-choices is used, in which the delay grows as log(1/(1 − λ)).National Science Foundation (U.S.) (Grant CMMI-1234062

DSpace@MIT

Crossref