Optimal Divisible Load Scheduling for Resource-Sharing Network by Wu, Fei et al.
1Optimal Divisible Load Scheduling for
Resource-Sharing Network
Fei Wu, Yang Cao, and Thomas Robertazzi, Fellow, IEEE
Abstract—Scheduling is an important task allowing parallel systems to perform efficiently and reliably. For modern computation
systems, divisible load is a special type of data which can be divided into arbitrary sizes and independently processed in parallel. Such
loads are commonly encountered in applications which are processing a great amount of similar data units. For a multi-task processor,
the processor’s speed may be time-varying due to the arrival and departure of other background jobs. This paper studies an optimal
divisible loads scheduling problem on a single level tree network, whose processing speeds and channel speeds are time-varying. Two
recursive algorithms are provided to solve this problem when the arrival and departure times of the background jobs are known a priori
and an iterative algorithm is provided to solve the case where such times are not known. Numerical tests and evaluations are
performed for these three algorithms under different numbers of background jobs and processors.
Index Terms—Divisible load, scheduling, single level tree network, multi-task processors, resource-sharing, virtualization, time-varying
system.
F
1 INTRODUCTION
1.1 Background
THE design of efficient load scheduling algorithms haslong been a pivotal concern in parallel processing ap-
plications. A parallel system refers to all classes of parallel
computers from multicore CPUs to wide area computa-
tional grids comprising distributed and heterogeneous in-
stallations owned by mutually unrelated institutions [1]. A
schedule is an assignment of tasks to processors in time.
Parallel systems cannot be fully utilized if the applications
are not properly scheduled. In modern networked systems,
scheduling becomes more crucial due to the increasing
prevalence of data-intensive computing. To deal with the
large amount of data in modern computation system, divis-
ible load theory (DLT) has emerged as a potential tool.
DLT assumes that computation and communication
loads can be divided into parts of arbitrary sizes, which
can be processed independently in parallel [2]. There are
two assumptions for the loads in DLT. First is arbitrary
divisibility and second is independence of execution. Loads
may be divisible in fact or as an approximation. Such
loads are commonly encountered in applications which are
processing great amount of similar data units, such as image
processing, signal processing, processing of massive exper-
imental data, and so on [3]. In classic DLT models, there
is usually a control processor holding all the data originally
and then one can distribute such loads to several processors.
The main problem is to decide the optimal schedule of
loads distribution to the processors to achieve the minimum
solution time. Many DLT applications allow users to model
the parallel system with linear equations or recursion, which
can be solved efficiently.
• Fei Wu, Yang Cao and T. Robertazzi are with the Department of Electrical
and Computer Engineering, Stony Brook Univesity, Stony Brook, NY,
11794.
E-mail: {fei.wu,yang.cao,thomas.robertazzi}@stonybrook.edu
Analysis in DLT was first studied by Cheng and Rober-
tazzi in [4], which was designed originally for intelligent
sensor networks. The formal proof of the DLT optimality
principle was in [5], where a linear daisy chain network
was applied. Since then, DLT has been well established and
used in many scheduling problems. An analytic proof for
a bus network that all processors must stop computing at
the same time to obtain a minimal time solution was pro-
vided in [6]. In [7], optimal load distribution sequences for
tree networks were investigated and in [8] computing cost
was considered along with job finishing time. Closed-form
expression for the processing time in the nonblocking mode
of communication was derived in [9]. Scheduling divisible
loads in a single-level tree network was considered in [9-
14]. An optimal time-varying load scheduling for divisible
loads was studied in [15], where the computing system was
modeled as a bus-oriented network.
Most previous works assume that the channel speed
and processing speed are constant throughout the whole
processing time. It is often assumed that one processor
can only process a single job at a time, which may not be
true since in most practical computer systems one processor
can both communicate with multiple networks and process
multiple jobs. Such multi-task processors are commonly
encountered in resource-sharing systems such as virtualized
networks. In Wireless Sensor Networks (WSN) the same
piece of WSN’s physical resources can be virtualized into
logical units, which can be used by multiple users [24]. Also,
in the network-slicing technology for 5G networks, resource
sharing among slices is sometimes permitted in order to
maintain certain performance levels [25]. As a result, in
such resource sharing systems those extra connections and
jobs will take up the system resources and hence hinder the
system processing of a specific job of our interest. In other
word, the system speed may be time-varying according to
the number of those extra loads. In this paper, the extra jobs
running on a certain processor in addition to the job of our
ar
X
iv
:1
90
2.
01
89
8v
1 
 [c
s.D
C]
  5
 Fe
b 2
01
9
2interest are called background jobs. For the area of time-
varying scheduling studies, a data gathering problem is
discussed in [23] where only data transmission is considered
and the communication speed is time-varying.
1.2 Our Contribution
The first work examining time-varying DLT is in [15]
by Sohn and Robertazzi, where the loads are distributed
through a bus network. The control processor, however,
does not process data. In [15], the arrivals and departures
times of background jobs are assumed to be exactly same for
every processor, which is usually not true in practical situa-
tions. In our paper, each processor has its own background
arrival and departure sequence, which is independent from
others. Also, the processor sharing rule is updated in our
paper. Instead as the processor (channel) devoting all its
computational (transmission) power evenly to each job in
[15], we assume that the processor (the channel) can assign
an arbitrary ratio of its computational power (transmission
power) to each job, as long as the sum of these ratio does
not exceed one. Such an assumption is more realistic since
modern virtualization technique allows users to divide the
processor’s computational (transmission) power according
to their preference when a single physical processor is
virtualized into multiple virtual processors.
Furthermore, a single level tree network with hetero-
geneous channels is used instead of the bus network at
[15]. The single level tree network can model a variety of
parallel systems using master-slave, or controller-worker
paradigm. For instance, [16] models the case where several
computers interconnected with an Ethernet as a single level
tree network. Moreover, in [13] a single level tree network
can be modeled as a set of computing clusters connected to a
master controller via Internet. Moreover, this paper provides
two algorithms for the stochastic analysis, which delivers
superior performance compared to the one in [15].
Also, in this paper, unlike [15], the control processor
is equipped with a front-end sub-processor, which means
it not only transfers data to other processors, but also
processes data as well.
Our objective is to determine the optimal partitions of
the full load for each processor to achieve the minimum
finishing time (makespan). Two cases are discussed in our
paper: whether the control processor is a time-invariant
processor or a time-varying processor, where the former
one is a special case and latter one is more general. We
first studied the deterministic model where the arrival and
departure time points for the background jobs and extra
connections are exactly known a priori. Two algorithms are
provided for the two cases to solve the scheduling problem.
Then a stochastic analysis is performed when those time
points are not known a priori.
1.3 Organization
The rest of this paper is organized as follows. Section 2
first briefly introduces the classic solution of DLT schedul-
ing problem in a time-invariant single level tree network.
Then two time-varying cases are studied, respectively. The
stochastic model is studied in section 3 and section 4 pro-
vides verification and evaluation of our method via different
criterion. The conclusion appears in section 5.
The following notations are used in this paper:
αi The partition of the entire divisible load that is assigned
to processor i.
Wi Inverse of processing speed of ith processor when
there is only one job.
Wi(t) Inverse of time-varying processing speed of ith
processor applied to the divisible job at interest.
W¯i Equivalent constant value of Wi(t) during the process-
ing time.
Tcp Time to process the entire load when Wi = 1 for the
ith processor.
Zi Inverse of channel speed when control processor is only
communicating with ith processor.
Zi(t) Inverse of time-varying channel speed applied to the
divisible job at interest.
Z¯i Equivalent constant value of Zi(t) when control pro-
cessor is communicating with ith worker processor
Exp(λ) Negative exponential distribution with parameter
λ.
Unif(a, b) Uniform distribution with parameter a, b.
Tcm Time to transmit the entire load when Z = 1.
Tf The finishing time of processing the entire load.
2 DETERMINISTIC ANALYSIS
In this section, we assume that the arrival and departure
times of background jobs are exactly known, which is re-
ferred as the deterministic model. We study the optimal
scheduling for a time-varying single level tree system. To
this end, we first briefly introduce the classic time-invariant
problem, which will be helpful to understand the time-
varying problem. The case that the exact arrival and depar-
ture times of background jobs are not known will be studied
in the next section.
2.1 Preliminaries
Let’s consider the single level tree network in Fig. 1. As-
sume that there are totally N + 1 processors for the whole
system. The processor P0 is the control processor where
P0
P1 P2 PN
Fig. 1: Single level tree network
the divisible load first arrived at. The control processor
P0 divides the divisible load to N parts which is indi-
cated by α1, α2, ...., αN and assigns those N parts to the
worker processors P1, P2, ...., PN . In this paper we nor-
malize the total amount of loads to be 1, which means
that α1 + α2 + .... + αN = 1. The worker processors are
numbered in the order of receiving the loads. There are
several assumptions for the processor:
3• A processor can only compute after it has finished the
communication unless it is equipped with a front-end
processor.
• The control processor can only communicate with one
worker processor at a time (sequential load distribu-
tion).
• There is no communication between the worker proces-
sors.
In this case, we assume that due to a limitation of resources,
only the control processor has a front-end processor, which
means that it can compute at the same time as it com-
municates with other worker processors. According to the
notation we define in section 1, the piece of load αi is
transferred to worker processor Pi in time αiZTcm and is
processed in time αiWiTcp. All the processors should finish
computing at the same moment to achieve the smallest
Tf by the optimality principle proved in [4,13,17-20]. Our
problem is to find the load partitions α1, α2, ...., αN when
the optimality principle is achieved.
We can draw the timing diagram according to those con-
ditions in Fig. 2. For each time axis in the timing diagram,
P0
α1Z1Tcm α2Z2Tcm αNZNTcm
W0
α0W0Tcp
P1
W1
α1W1Tcp
T1
P2
W2
α2W2Tcp
T2
PN
WN
αNWNTcp
TN
Tf
Fig. 2: Timing diagram for single level tree network
communication appears above the axis while computing
appears below the axis. At t = 0, the control processor
starts sending partition α1 to worker P1 in time α1Z1Tcm.
At t = T1, after receiving the loads, P1 starts processing and
finishes in time α1W1Tcp. This procedure repeats for every
worker processor and all the processors finish computing at
the same time t = Tf . The linear system equations can then
be expressed as:
Tf = α0W0Tcp (1a)
Tf =
i∑
k=1
αkZkTcm + αiWiTcp, i = 1, 2, ...N (1b)
N∑
k=1
αk = 1 (1c)
Since there are N +2 unknowns and N +2 linear equations,
load partitions α1, α2, ...., αN can be uniquely solved as well
as the Tf .
2.2 Time-varying System with A Time-invariant Control
Processor
Now we consider that the processors can simultaneously
process multiple jobs, which means in addition to the divisi-
ble job we studied in section 2.1, the processor also processes
some other jobs. We call those jobs as background jobs. The
background jobs will take the computing power from the
processor and as a result the processor’s processing speed
will vary according to the amount of workload over time.
In this section, we only consider that the worker processors
are time-varying. The time-varying control processor will be
discussed in the next section.
Fig. 3: Processor virtualization
When a processor processes multiple jobs in parallel,
the processor is virtualized into multiple virtual processors.
In this way, each user of the system feels that it is the
exclusive user of the processor. As shown in Fig. 3, there
is a hypervisor controlling the virtualization process. The
hypervisor can assign any ratio of computation power to
any virtual processor, as long as the sum computation
power of all virtual processors does not exceed the physical
processor’s maximum computation power. The protocol for
the hypervisor to assign the physical processor’s computa-
tion power is pre-defined in the hypervisor. As a result, the
processing speed for the divisible load job of our interest is
a function of the number of jobs in the processor defined
by the hypervisor. For the case that n jobs in the processor
i, we use Whi (n) to denote the inverse of computing speed
applied to the divisible job of interest. This Whi (n) is sup-
posed to be known once n is given. If n = 1 and there is
only the divisible load job in the processor, we denote the
processing speed as Wi for simplicity. We also use Wi(t)
to represent the general time-varying inverse of computing
speed applied to the divisible job at interest and W¯i to
represent the equivalent constant value of Wi(t) during
processing for processor i. The background jobs arrive and
leave independently on different processor. The method to
define W¯i will be introduced in this section. By adapting the
time-varying processing speed to Fig. 2, the timing diagram
for this condition can be depicted in Fig. 4.
In Fig. 4, we use steps to represent the arrival and
departure of the background jobs, and the value of W (t) is
noted on the vertical axis. For example, processor P1 starts
processing the data at time T1, then one background job
4P0
α1Z1Tcm α2Z2Tcm αNZNTcm
α0W0Tcp
W0
P1
α1W¯1Tcp
T1 T11 T12
W1
Wh1 (2)
P2
α2W¯2TcpW2
Wh2 (2)
T2
PN
αNW¯NTcp
WN
TN
Tf
Fig. 4: Timing diagram for single level tree network with
time-varying worker processor speed
arrives at time T11 where a down step appears. Note that
W is the inverse of processing speed, and processing speed
jumps from 1W1 to
1
Wh1 (2)
at this time point, thus W1(t)
jumps from W1 to Wh1 (2). Afterwards, this background
job departs at time T12 where an up step appears. W1(t)
also jumps back from Wh1 (2) to W1. In this section we
assume that the time points of arrival and departure of
the background jobs are exactly known, which mean that
Wi(t), i = 1, 2, ..., N are exactly known.
Theorem I shows how to achieve the W¯i, i = 1, 2, ..., N .
Theorem 1. The equivalent constant value of Wi(t) during the
processing of ith processor equals to:
W¯i =
Tf − Ti∫ Tf
Ti
1
Wi(t)
dt
where the Ti denotes the start time of ith processor’s computation
and Tf denotes the finishing time.
Proof. Since the changes in Wi(t) are all steps at certain
time points where background jobs arrive and depart, let’s
assume that for ith processor there are k changes in Wi(t)
between time Ti to Tf and let Tij , j = 1, 2, 3...k denote the
jth change time point. For example in Fig. 4, T11 is the first
time point of the change in W1(t) after T1 and T12 is the
last time point of change in W1(t) before Tf . Let Wi(j+1)
denotes the value of Wi(t) between Tij to Ti(j+1), where
j = 1, 2, ...k − 1 and Wi1 between time Ti to Ti1, Wi(k+1)
between time Tik to Tf . Also, in the same manner, let αi(j+1)
denotes the partition of loads that processed between time
Tij to Ti(j+1), where j = 1, 2, ...k − 1 and αi1 between time
Ti to Ti1, αi(k+1) between time Tik to Tf . Then we have the
equations:
Ti1 − Ti = αi1Wi1Tcp (2a)
Ti(j+1) − Tij = αi(j+1)Wi(j+1)Tcp, j = 1, 2, .., k − 1 (2b)
Tf − Tik = αi(k+1)Wi(k+1)Tcp (2c)
αi =
k+1∑
j=1
αij (2d)
Since by definition Tf − Ti = αiW¯iTcp, then:
W¯i =
Tf − Ti
αiTcp
(3)
By substituting equation (2) into equation (3), we can get:
W¯i =
Tf − Ti
αiTcp
=
Tf − Ti
(
∑k+1
j=1 αij)Tcp
=
Tf − Ti
Ti1−Ti
Wi1
+
∑k−1
j=1
Ti(j+1)−Tij
Wi(j+1)
+
Tf−Tik
Wi(k+1)
=
Tf − Ti∫ Tf
Ti
1
Wi(t)
dt
This completes the proof of Theorem I.
Remark. The inverse of W¯i equals to
∫ Tf
Ti
1
Wi(t)
dt
Tf−Ti , which is the
average value of 1Wi(t) between Ti to Tf . Since Wi(t) is defined
as the inverse of computation speed of the ith processor, W¯i can
also be taken as the inverse of the average computing speed, which
is the inverse of the average value of 1Wi(t) .
Based on the expression of W¯i, the system equations can
be written as:
α0W0Tcp = Tf (4a)
Ti =
i∑
k=1
αkZkTcm, i = 1, 2, ..., N (4b)
Tf − Ti = αiW¯iTcp, i = 1, 2, ..., N (4c)
α0 + α1 + ...+ αN = 1 (4d)
where equation (4b) represents the communication time for
each processor and equation (4a) and (4c) represent the
computation time. Equation (4d) guarantees that all the
partitions sum up to 1. From equation (4b), we can express
αi as a function of Ti−1 and Ti as αi =
Ti−Ti−1
ZiTcm
. By
substituting this transformation into equation (4c) we have:
Tf = Ti +
Ti − Ti−1
ZiTcm
W¯iTcp (5a)
= Ti +
Ti − Ti−1
ZiTcm
Tf − Ti∫ Tf
Ti
1
Wi(t)
dt
Tcp (5b)
Starting from processor 1, equation (5b) can be reduced as
Tf = T1 +
T1
Z1Tcm
Tf−T1∫ Tf
T1
1
W1(t)
dt
Tcp, which is an equation of T1
and Tf only. Thus T1 can be expressed as a function of Tf
only. By the definition α1 can also be expressed as a function
5of Tf only. This provide an intuition that this problem can
be solved recursively. A recursive algorithm is introduced
to calculate the optimal finishing time Tf and partitions αi
as Algorithm I.
Algorithm 1 Recursive algorithm to solve the optimal
scheduling problem in a time-varying system with a time-
invariant control processor
1. Express α0 as a function of Tf using the equation:
α0 =
Tf
W0Tcp
2. Express T1 as a function of Tf using the equation:
Tf = T1 +
T1
Z1Tcm
Tf − T1∫ Tf
T1
1
W1(t)
dt
Tcp
Express α1 as a function of Tf using the equation:
α1 =
T1
Z1Tcm
3. Express T2 as a function of Tf using the equation:
Tf = T2 +
T2 − T1
Z2Tcm
Tf − T2∫ Tf
T2
1
W2(t)
dt
Tcp
where T1 is a function of Tf
Express α2 as a function of Tf using the equation:
α2 =
T2 − T1
Z2Tcm
where T1 and T2 are functions of Tf
4.Repeat the procedure until αN is expressed as a function
of Tf . Now, every αi has been expressed as a function of Tf .
5.Apply the normalization equation:
α0 + α1 + ...+ αN = 1
to calculate the optimal finishing time Tf , as well as all the
partitions αis.
2.3 Time-varying Control Processor, Processing and
Communication Speed
In the previous section, we studied the optimal scheduling
problem for a single level tree network where the worker
processors have time-varying processing speeds due to the
arrival and departure of background jobs. In this section,
we consider the general case that the background jobs
appear on the control processor as well, which will make the
processing speed time-varying for P0. Also, we assume that
there will be other transmissions such as the control proces-
sor communicating with other networks when assigning the
loads, which will slow down the communication speed for
the job of our interest. This will make the communication
speed time-varying. Similar as the previous subsection a
processor is virtualized into multiple virtual processors to
share the communication power and there is a hypervisor to
control them. Same as Whi (n) and Wi(t), we use Z
h
i (n) and
Zi(t) to represent the time-varying inverse of communica-
tion speed applied to the divisible job at interest. As a result,
Zi(t) will also be a function of steps. Again, we assume that
the time points when links established and finished with
other networks are known for each processor, which means
Zi(t) is exactly known. For simplicity we use Zi to represent
the inverse of communication speed when there is only the
divisible load job of our interest in the control processor for
distribution.
P0
α1Z¯1Tcm α2Z¯2Tcm αN Z¯NTcm
Z1
Zh1 (2)
Zh2 (3)
W0
Wh0 (2)
α0W¯0Tcp
P1
W1
Wh1 (2)
α1W¯1Tcp
T1
P2
W2
Wh2 (2)
α2W¯2Tcp
T2
PN
WN
αNW¯NTcp
TN
Tf
Fig. 5: Timing diagram for single level tree network with
time-varying channel speed and computing speed
Fig. 5 demonstrates the timing diagram for a general
time-varying single level tree system. The channel speed
varies as well as the computing speed for each processor.
At the beginning, P0 starts to transmit partition α1 of the
loads to P1 and finishes at time T1. After finishing receiving
the loads, P1 starts to process at the time point T1, while
P0 starts to transmit the partition α2 to P2. This procedure
repeats for every processor, and again, every processor
finishes at the same time Tf for the optimal condition.
Apparently, Theorem I still works in this situation. Sim-
ilarly, we can find the expression for the equivalent time-
invariant value of Z(t):
Theorem 2. The equivalent constant value of Zi(t) when P0
communicates with Pi equals to:
Z¯i =
Ti − Ti−1∫ Ti
Ti−1
1
Zi(t)
dt
where the Ti denotes the ith processor’s start computing time,
i = 1, 2, ..., N , T0 = 0.
The proof should be similar to the proof of the Theorem
1. Also, the Z¯i can be taken as the inverse of average
communication speed, in the same manner as W¯i in the
6remark of Theorem I.
α0W¯0Tcp = Tf (6a)
Ti − Ti−1 = αiZ¯iTcm, i = 1, 2, ..., N (6b)
Ti =
i∑
k=1
αiZ¯iTcm, i = 1, 2, ..., N (6c)
Tf − Ti = αiW¯iTcp, i = 1, 2, ..., N (6d)
α0 + α1 + ...+ αN = 1 (6e)
Equations (6a) - (6d) demonstrate the system equations
when P0 is also time-varying. Similar to equations (4a) to
(4d), equations (6a) and (6d) represent the processing part
for each processor, (6b) and (6c) represent the communica-
tion part and (6e) represents the normalization equation.
By applying Theorem II to equation (6b):
Ti − Ti−1 = αi Ti − Ti−1∫ Ti
Ti−1
1
Zi(t)
dt
Tcm (7a)
=⇒ αi = 1
Tcm
∫ Ti
Ti−1
1
Zi(t)
dt (7b)
We can find that αi is the integral of 1Zi(t) from Ti−1 to Ti
times a constant Tcm. Since Zi(t) is assumed to be known,
by applying the same recursive method as last subsection,
we can express every αs as a function of Tf and using
the normalization equation to solve the optimal scheduling
problem. The detailed steps are introduced in Algorithm II.
3 STOCHASTIC ANALYSIS
In the previous two subsections, two recursive algorithms to
solve the optimal load fraction in the time-varying system
were studied. However, the assumption that the time points
of arrival and departure of background jobs are known a
priori may not hold for many realistic circumstances. As a
result, it is necessary to perform a more general analysis
where the time points of arrival and departure of back-
ground jobs remain unknown.
In this section, we establish a stochastic model where
the time points of arrival and departure of background jobs
are not exactly known. To model the system, we assume
Markovian statistics for the arrival and departure processes.
Similar to the nature of arriving customers, the arrivals of
background jobs are modeled as a Poisson random pro-
cess with parameter λ and the stay time for each of the
background job followed an negative exponential distri-
bution with parameter µ. In this way, the system can be
modeled as a M/M/1 queue. In [15], the average number
of customers in the M/M/1 chain is used as the average
number of the background jobs in each processor. However,
this method may not be accurate given that the starting
state and processing time are not taken into consideration.
Also, [15] assumed that the system parameters λ and µ
were known, which may also not be possible. To deal with
these issues, we first perform an estimation of λ and µ
based on the previous information of the system using a
fading memory window. Then a simulation-based method
is introduced to solve the optimal scheduling problem. In
order to simplify and accelerate, an iterative algorithm is
Algorithm 2 Recursive algorithm to solve the optimal
scheduling problem in a time-varying single level tree sys-
tem
1. Express α0 as a function of Tf using the equation:
α0 =
1
Tcp
∫ Tf
0
1
W0(t)
dt
2. Express T1 as a function of Tf using the equation:
Tf = T1 +
1
Tcm
∫ T1
0
1
Z1(t)
dt
Tf − T1∫ Tf
T1
1
W1(t)
dt
Tcp
Express α1 as a function of Tf using the equation:
α1 =
1
Tcm
∫ T1
0
1
Z1(t)
dt
where T1 is a function of Tf
3. Express T2 as a function of Tf using the equation:
Tf = T2 +
1
Tcm
∫ T2
T1
1
Z2(t)
dt
Tf − T2∫ Tf
T2
1
W2(t)
dt
Tcp
where T1 is a function of Tf
Express α2 as a function of Tf using the equation:
α2 =
1
Tcm
∫ T2
T1
1
Z2(t)
dt
where T2 and T1 are functions of Tf
4.Repeat the procedure until αN is expressed as a function
of Tf . Now, every αi has been expressed as a function of Tf .
5.Apply the normalization equation:
α0 + α1 + ...+ αN = 1
to calculate the optimal finishing time Tf , as well as all the
partitions αis.
studied to achieve much faster running time with a sacrifice
of negligible precision.
In this section the discussion is in the context that all the
processors are time-varying (section 2.3) but this algorithm
can work for both cases in section 2.2 and 2.3. Also, since we
assume that the other transmissions have the same effect as
the background jobs, we will just focus on the background
jobs (processing speed) since the results also works for the
other transmissions (communication speed). The numeri-
cal tests show that our stochastic model outperforms the
method in [15].
3.1 System Parameter Estimation
In our system we assume that for any time-varying pro-
cessor, the arrivals of background jobs follows a Poisson
random process with parameter λ and the stay time for each
background job follows an exponential distribution. As a
result, the arrivals and departures of background jobs form a
M/M/1 queuing model. To this end, let x1, x2, x3, ...., xn be
the samples of background jobs’ inter-arrival interval times
within the fading memory window. The fading memory
window contains n nearest samples before the divisible load
job arrives, and the samples that are closer to the end point
7will receive a higher weight in the estimation. As a result,
the fading memory estimation will deliver a more stable
result once the parameter varies with the time, otherwise it
will be just same as the normal estimation. These n samples
should be independent and identically distributed with
Exp(λ). To estimate the value of λ, the weighted maximum
likelihood estimation (WMLE) method is used:
lik(λ) =
n∏
i=1
(λe−λxi)βi (8)
λˆ = argmax
λ
log(lik(λ)) (9)
where the β1, β2, ..., βn are the fading memory weights with
an ascending order. By solving the WMLE, the estimate of
λ can be achieved:
λˆ =
∑n
i=1 βi∑n
i=1 βixi
(10)
For the estimation of µ, let y1, y2, ..., yn be the samples of
background stay time within the fading memory window.
By applying the same method, the estimate of µ can be
achieved as:
µˆ =
∑n
i=1 αi∑n
i=1 αiyi
(11)
where the α1, α2, ..., αn are the fading memory weights for
µ.
3.2 Stochastic Model
To solve the optimal scheduling using the stochastic model,
we first introduce a simulation-based method. We take the
median of a large number of samples to approximate the
real case. Then a simplified iterative algorithm is introduced
to reduce running time.
3.2.1 Simulation-based Approach
In the case where the actual arrival and departure times of
background jobs are not known, it is impossible to make
accurate schedule for the system since the real W (t) and
Z(t) can never be obtained. To this end, a proper approx-
imation is necessary for scheduling. Since the arrivals and
departures of background jobs are modeled as a M/M/1
queue, it is naturally to gather statistic information from
the M/M/1 queue with proper system parameter. In [15],
given the system parameter λi and µi for ith processor,
the average number of background jobs ni in the M/M/1
system can be estimated by ρi1−ρi where the ρi =
λi
µi
. Then
the average inverse of the processing speed was model as
W¯i = (ni + 1)Wi since every background job is assume
to share the equal computing power in [15]. In this way,
the schedule can be achieved by solving equations (4) or (6).
However, in the real case the average number of background
jobs for processor i during its processing time may not
simply equal to the average state for the M/M/1 model
due to two reasons. First, the processor may already have
some background jobs be processed at the time when the
divisible load job of our interest arrives, which means the
start state of the M/M/1 model is not zero. Also, the average
number of background jobs of a certain processor during its
processing time may depend on how much time it takes
to process. The divisible load job may terminate before the
the M/M/1 queue reaches its equilibrium, so the average
number of background jobs may not equal to the average
number in the M/M/1 queue.
To deal with this issue, instead simply using the average
number of background jobs as an approximation, a simu-
lation based method is introduced in this paper. The main
idea is to simulate background sequence for each processor,
then the deterministic algorithm I or II can be applied.
By operating this simulation for abundant times, the trial
which achieves statistical median of the finishing time can
be taken as the final schedule. The simulation of background
jobs is based on the natural properties of M/M/1 queue:
the time to stay in one state is a random variable with
Exp(λ+µ) (except for the first state, which is Exp(λ) since
there is no departure), and the probability to move to the
next largest state is pnext = λλ+µ . Given the starting state
N0 and system parameters λ and µ for each processor, the
details of simulating M/M/1 based background sequence is
described in Algorithm III.
Algorithm 3 Algorithm to simulate the background se-
quence
Input: N0, λ and µ
Output: Background sequence
1: Set t = 0;
2: Set M/M/1 state equals to N0 at t = 0; . The state
represents the number of background jobs
3: while t < Tf do
4: if current state equals to 0 then
5: Generate a random variable t˜ ∼ Exp(λ);
6: t = t+ t˜, move the state to 1
7: else
8: Generate a random variable t˜ ∼ Exp(λ+ µ);
9: t = t+ t˜;
10: Generate a random variable p ∼ Unif(0, 1);
11: if p <= pnext then
12: Move the M/M/1 queue to the next state;
13: else
14: Move the M/M/1 queue to the previous state;
15: end if
16: end if
17: end while
The system parameters λ and µ can be estimated by
the estimation step in section 3.1. This simulation can
be done beforehand and the results stored in a table for
future use. Based on the background jobs sequence and
the pre-defined hypervisor function to assign a processor
computation/communication power, W (t) and Z(t) can be
achieved. Then the recursive deterministic algorithm I or
II can be applied to obtain a schedule. By repeating this
procedure for abundant times, various schedule plans can
be achieved. The trial that achieve the median of all the
finishing times is chosen as the final stochastic schedule
plan.
83.2.2 Iterative Algorithm for Simplification
One drawback of the simulation-based algorithm is that it
requires to run the recursive deterministic algorithm for
abundant times. This procedure may become quite time-
consuming when the system scale grows large since the
recursive deterministic algorithm could be quite slow when
the number of processors grows large. The running time
can be significantly decreased if we can solve the linear
equations (4) or (6) directly. However, solving the linear
equations (4) or (6) requires the prior knowledge of W¯
and Z¯ for each processor, which can only be accessed after
scheduling based on theorem I and II.
To deal with this issue, an incorrect initial guess of the
scheduling is made. This initial guess can be achieved either
from the time-invariant approach or the result generated by
[15]. After we achieve the initial scheduling, random back-
ground sequences are generated for each processor using
algorithm III. Same as in the last subsection, W (t) and Z(t)
can be estimated. Since we already have the initial schedule,
we know the starting processing time of each processor.
Based on theorem I and II W¯ and Z¯ can be estimated for
each processor. An updated schedule can be achieved from
solving linear equations (4) or (6). Then the background
jobs sequences are generated again for each processor, and
the updated W¯ and Z¯ can be achieved based on the new
background jobs sequences. The updated W¯ and Z¯ could
be utilized to update the schedule again. Similar as the
previous subsection, abundant iterations of this procedure
are performed and the trial that achieves the median of all
the finishing times is chosen as the final stochastic sched-
ule plan. An simplified algorithm description is shown in
algorithm IV.
Algorithm 4 Simplified Scheduling
1. Perform an initial scheduling. The communication and
processing time for each processor can be obtained.
2. Run the Algorithm III to generate random background
sequences for each processor.
3. Achieve the updated W¯i and Z¯i for each processor i
based on theorem I and II.
4. Updating the schedule based on the new W¯ and Z¯ .
The updated communication and processing time for each
processor can be obtained.
5. Repeat step 2 to 4 for an abundant number of times.
6. The trial that achieve the median of all the finishing times
is chosen as the final stochastic schedule plan.
Due to that each time the W¯ and Z¯ are estimated from
the information of the last iteration, the overall scheduling
may not as accurate as the simulation based method intro-
duced in last subsection. However, the time-saving property
of this method plays an important role when the system
scale grows large. Numerical tests shows that for the system
with large number of processors, this simplified iterative
method can save significant time with negligible errors.
4 NUMERICAL TEST AND EVALUATION
In this section we perform numerical tests for both deter-
ministic and stochastic models. The first two subsections
illustrates our results for the deterministic model using
Algorithm I and II. We simulate each of the two algorithms
in 50 time units and each time unit contain 100 time slots.
That is to say, each time slot is equivalent to 0.01 unit
of time. Usually the total process is finished within 50
time units. In these two subsections we use a simple way
to generate the number of background jobs such that it
is easier to perform evaluation of the system. A certain
number of background jobs are generated throughout the 50
time units. The arrivals and departures of background jobs
are simulated as uniformly distributed random time points
in pairs and the departure time of a certain background
jobs must be later than the arrival time. For simplicity we
assume that the hypervisor evenly distributes the physi-
cal processor’s computation/communication power among
the virtual processors, which means that Whi (n) = nWi.
As a result, Wi(t) and Zi(t) can be obtained based on
the pre-defined Wi and Zi. For Algorithm I, Zi(t) = Zi
and W1(t) = W1 are set to be constant since the control
processor is not time-varying. In our test, we arrange the
processors’ sequence according to their speed. That is to
say, the faster processors will receive load prior to the
slower ones. Based on this concept, we set the inverse of
communication speed Zi = 1 + 0.1(i − 1), i = 1, 2, 3, ..., N
for the processor i. Also, the parameters are set as: Tcm = 1,
Tcp = 4 throughout the whole numerical test.
The third subsection illustrates our results by the
stochastic model. The background jobs are generated by an
M/M/1 queuing model instead of the simple method. We
also compare our result with the result in [15]. It shows that
our stochastic result better matches the deterministic result
in terms of statistics.
4.1 Time-varying System with A Time-invariant Control
Processor
4.1.1 Solution and Verification
In this subsection, the control processor is time-invariant
while the work processors are all time-varying. The link
speed is assumed to be time-invariant. The Wi is set to be
equal for all processors and denoted as W . The Algorithm
I is solved by starting with an initial Tf , then changing the
value of Tf gradually until achieving a sum of all αs that
is enough equal to 1. In this case, since α0 must be smaller
than 1, by the equation (4a), α0 =
Tf
W0Tcp
, then Tf must be
smaller than W0Tcp. So the Tf is initialized with its upper-
bound W0Tcp and is decreased by a step of a time slot to
achieve the correct solution.
Fig. 6 shows how to achieve the optimal Tf and all
the partitions through Algorithm I. In this case, there are
3 worker processors and each worker processor has 40
background jobs for the whole 50 time units randomly
generated. On average there are 0.8 background jobs each
time unit for each processor. W is set to be 1 and Tf is
initialized by W0Tcp = 4 in this case. Our solution lays
where the sum of alpha curve intersects with the line where
the sum of the alphas equals to one.
Table I shows the two closest solution points for Fig. 6,
where the sum of alphas is closest to 1. From the solutions
we can find that the sequence of divisible loads each pro-
cessor takes is α0 > α1 > α2 > α3, that is because the pro-
9Fig. 6: Finishing time vs the partitions of each processor by
Algorithm I
TABLE 1: Two closest solution points for Fig. 6
Tf α0 α1 α2 α3 sum
2.0800 0.5200 0.2182 0.1583 0.1000 0.9965
2.0900 0.5225 0.2182 0.1583 0.1077 1.0067
cessor with the smaller index finishes communication before
the one with larger index. In other words, the processor with
smaller index has more time to process the loads. However
in general the inequality part α1 > α2 > α3 does not always
hold. Since the background jobs are generated randomly
over the whole time interval, so it is possible that the
processor with a larger index has less background jobs than
the processor with smaller index during the processing time.
Taking less background jobs means processing in a higher
average speed. As a result, even with less time to process
the loads, the processor with a larger index is possible to
take more loads due to its fast speed. Especially, processor
0 (control processor) would always take the majority part
of the loads since it does not require communication and it
always has a higher processing speed than other processors,
because there is no background job on P0. Either one of
these two points can be taken as the solution of Algorithm I,
one can also average these 2 points to achieve the solution.
To verify the accuracy of Algorithm I, we use Algorithm
I to solve a time-invariant case where there is no background
job and compare the result with the solution generated from
equations (1a) to (1c) using the same parameters mentioned
before.
TABLE 2: Solutions of equation (1)
Tf α0 α1 α2 α3 sum
1.4070 0.3517 0.2759 0.2122 0.1602 1.0000
Table II shows the solution of equation (1a) to (1c) while
table III is the closest point by Algorithm I. One can find that
the solution by Algorithm I matches the solution of equation
TABLE 3: Closest solution point by Algorithm I without
background job
Tf α0 α1 α2 α3 sum
1.4100 0.3525 0.2755 0.2117 0.1600 0.9996
(1a) to (1c) well.
4.1.2 System Evaluation
Two criteria are used to evaluate the time-varying system
with a time-invariant control processor: finishing time and
speedup. We will see how the system performs via these two
criteria with a changing number of processors and back-
ground jobs. When the number of processors is changing,
the number of background jobs is set to be 40 for each
worker processor throughout 50 time units and when the
number of background jobs is changing, the number of pro-
cessors is set to be 4 (including the control processor). The
definition of speedup will be introduced in the latter part of
this subsection. In this subsection for each certain number
of background jobs or processors, we run the Algorithm I
1000 times and average these trials to get a stable result.
Fig. 7a shows how the finishing time varies with a
increasing number of background jobs. One can find that the
finishing time increases as the number of background jobs
increases, which makes sense since more background jobs
means less allocated to to the main job for a certain proces-
sor. Also, recall that all processors share a same inverse of
processing speed W when there is no background job, it is
obvious that a higher W (means lower speed) will make the
system finish the job slower. This is also shown in the both
Fig. 7a and Fig. 7b, where higher W has higher finishing
time. Fig. 7b shows the number of processors vs finishing
time when each worker processor has 40 background jobs
in total. With more processors sharing the same amount of
job, the job should be finished faster, as shown in Fig. 7b.
Since parallelism can accelerate the processing, one may
wonder how much faster the parallel system can be com-
pared with the sequential system. Defined by the well
known Amdahl’s law [21,22], speedup is the ratio of se-
quential processing time to parallel processing time for the
same amount of load, which is:
Speedup =
Tfs
Tfp
(12)
Where Tfs is the finishing time with a single processor while
Tfp is the finishing time with multiple parallel processors.
The speedup can reflect how much faster the parallel system
is compared with the sequential system. By taking the
control processor as the single sequential processor, Tfs can
be achieved by:
Tfs = 1WTcp = WTcp (13)
As defined in equation (12) and (13), the speedup should
have a positive correlation with the number of processor.
Increasing of the number of processors means an increase of
the parallelism in the system, which will result in a higher
speedup value.
To verify our expectations, Fig. 8 demonstrates how
speedup varies with the number of processors. This matches
10
(a)
(b)
Fig. 7: For a time varying system with time-invariant
control processor (a) Number of background jobs vs
finishing time (b) Number of processors vs finishing time.
our expectation. For the relationship between W and
speedup, from the figure we can find that the higher W
results in higher speedup. This is because the Tfs is linear
to W , which should be more sensitive than Tfp to W . In
other words, Tfs changes faster than Tfp when W changes.
Then, for a certain number of processors, a higher W will
result in a higher speedup. In other words, parallelism has
a bigger benefit for the slower system.
4.2 Time-varying System with Time-varying Control
Processor, Processing and Communication Speed
4.2.1 Solutions and Verification
In this subsection, there are background jobs at the control
processors as well. Furthermore, there will be interfering
communications, which will make the control processor
have both time-varying processing speed and communi-
cation speed. The number of extra connections in control
Fig. 8: For a time varying system with time-invariant
control processor: number of processors vs speedup.
processor is set to be equal to the number of background
jobs in this processor for the whole time interval. Z(t) is
generated in the same manner as W (t) described at the
beginning of this section.
Fig. 9: Finishing time vs the partitions of each processor by
Algorithm II
TABLE 4: Two closest solution points for Fig. 9
Tf α0 α1 α2 α3 sum
4.8900 0.4244 0.2113 0.1849 0.1774 0.9980
4.9000 0.4257 0.2137 0.1866 0.1777 1.0037
Similar to the previous subsection, Fig. 9 and Table IV
shows how to get the solution using Algorithm II. There are
one control processor and three worker processors and each
processor has 40 background jobs. The control processor
also has 40 other incoming and outcoming network connec-
11
tions. Again, all processors share the same processing speed
when there is no background job, as W = 1 for all.
TABLE 5: Closest solution point by Algorithm II without
background job
Tf α0 α1 α2 α3 sum
1.4110 0.3528 0.2755 0.2117 0.1600 0.9999
The same method as the previous subsection is applied
to verify our program. The result is shown in Table V. Again
our solution matches the solution in Table II.
4.2.2 System Evaluation
The same two criteria: finishing time and speedup are used
to evaluate the time-varying system with time-varying con-
trol processor, processing and communication speed. Again
we change the number of processors and background jobs
to see how the system performs. The number of background
jobs is set to be 40 and the number of processors is set to be
four (one control processor and three worker processors)
when the other one is changing. Algorithm II is also aver-
aged over 1000 trails for a stable result.
Fig. 10 shows how the finishing time varies with a
increasing number of processors and background jobs for
three different W values. As the previous subsection, fin-
ishing time has a positive correlation with the number of
background jobs but negative correlation with the number
of processors for the same reason.
In case of Speedup, it is more complicated since now our
reference single sequential processor P0 is also time-varying.
Equation (12) is still used to define Speedup, but equation
(13) cannot achieve Tfs for this case. To solve this problem,
by taking α0 as 1 in equation (6a), Tfs can be obtained by
solving the following equation:
Tfs = 1W¯0(t)Tcp =
Tfs∫ Tfs
0
1
W0(t)
dt
Tcp
=⇒ 1
Tcp
∫ Tfs
0
1
W0(t)
dt = 1 (14)
Fig. 11 demonstrates the relationship between speedup and
the number of processors. One can see that speedup will
increase as the number of processors increases. This is
similar to Fig. 8 and also meets our expectation.
4.3 Stochastic Model
In this subsection the background jobs are generated by a
M/M/1 model. The generation is similar to the method de-
scribed in section 3.2. The starting state for each processor is
taken to be zero for simplicity. Both cases where the control
processor is time-varying or time-invariant are tested. In the
test, we call the result generated by the simulation-based
method as the “simulation-based”. We also note the result
provided by [15] as “before correction” and our simplified
iterative Algorithm as “iterative”. A result for 4 processors
(one control processor and three worker processors) are
shown in Fig. 12.
In Fig. 12 we use box-plotted finishing time as the
criterion to compare the three methods. The details of box
(a)
(b)
Fig. 10: For a time-varying system with time-varying
control processor (a) Number of background jobs vs
finishing time (b) Number of processors vs finishing time.
plot can be found at [26]. Briefly speaking the box contains
50% of the data, whose lower and upper boundary lines
are at the 25%/75% quantile of the data. A central line
indicates the median of the data and some outliers of data
are plotted out as dots. The median of each data is picked
as the stochastic solution. In this case, the starting states
for all processors are set to be zero homogeneously. Here
λ is set to be 0.1 and µ is set to be 0.125. Based on these
settings, the scheduling will be finished before the M/M/1
queue reaches its average state number in general. As a
result, “before correction” method will deliver a higher
finishing time since that “before correction” method will
get an incorrect higher number of background jobs which
will results in a slower processing speed in general. From
the figure we can find that our “iterative” method delivers
similar result as the “deterministic” result in statistics, which
12
Fig. 11: For a time varying system with time-varying
control processor: number of processors vs speedup.
is lower than the “before correction” method for both cases
whether the control processor is time-varying or not. In
this case there is only 4 processors and the times to run
the two algorithms are quite close. The simulation-based
method turns out to be a better solution than the simplified
iterative one since it is more accurate. Another case with
more processors in shown in Fig. 13.
Fig. 13 demonstrates the result for 15 processors (one
control processor and 14 worker processors). The perfor-
mance of the simplified iterative algorithm is similar as the
case when there is only 4 processors. However in this case,
running the simulated based algorithm is overwhelmingly
time-consuming (2213 seconds for Fig. 13a and 2418 seconds
for Fig. 13b) while the simplified iterative algorithm can be
time-saving (182 seconds for Fig. 13a and 267 seconds for
Fig. 13b).
One thing to note is that once the system parameters λ
and µ are fixed, the only factor that influences the stochastic
model are the starting state of each processor. This is totally
different from the deterministic case, which is dependent on
the real distribution of the background jobs on each trial.
5 CONCLUSION
This paper studied optimal divisible loads scheduling of
time-varying single level tree network. The time-varying
processing speed and channel speed were transformed into
equivalent time-invariant ones. The deterministic analysis
was first studied where the arrival and departure times are
known To achieve the optimal partition for each processor,
two recursive algorithms were developed in case whether
the control processor is time-invariant or time-varying. For
stochastic analysis, the arrival and departure of background
jobs are modeled as a M/M/1 queuing model and two
algorithms are provided to solve the scheduling problem.
Extensive numerical tests were performed to demonstrate
the relationships between finishing time, speedup, back-
ground job number and processor number.
(a)
(b)
Fig. 12: Result for 4 processors (a) Time-invariant control
processor (b) Time-varying control processor, processing
and communication speed.
Future enhancement for this research can be pursued
under the context of various network topologies such as
multi-level tree or mesh. Also, the system model can be ex-
tended to handle more complicated cases, such as a general
distribution of the arrivals and departures of background
jobs/transmissions in stochastic analysis.
REFERENCES
[1] M. Drozdowski, Scheduling for Parallel Processing, Computer Commu-
nications and Networks. Computer Communications and Networks,
Springer-Verlag London Limited 2009.
[2] Robertazzi T.G. Ten Reasons to Use Divisible Load Theory, IEEE
Computer, 2003, pp.63-68 36(5).
[3] V. Bharadwaj et al., Scheduling Divisible Loads in Parallel and Dis-
tributed Systems, IEEE CS Press, 1996.
[4] Cheng, Y.C. and Robertazzi, T.G., Distributed Computation with Com-
munication Delays, IEEE Transactions on Aerospace and Electronic
Systems, 24(6), 1988, 700-712.
[5] T.G. Robertazzi, Processor Equivalence for Daisy Chain Load Sharing
Processors, IEEE Trans. Aerospace and Electronic Systems, vol. 29,
no. 4, 1993, pp. 1216-1221.
13
(a)
(b)
Fig. 13: Result for 15 processors (a)Time-invariant control
processor (b)Time-varying control processor, processing
and communication speed.
[6] Sohn, J. and Robertazzi, T.G., Optimal Load Sharing for a Divisible
Job on a Bus Network, IEEE Transactions on Aerospace & Electronic
Systems Vol. 32, No. 1, Jan. 1996, pp. 34-40.
[7] Kim, H.J., Jee, G.-I. and Lee, J.G., Optimal Load Distribution for Tree
Network Processors, IEEE Transactions on Aerospace and Electronic
Systems, Vol. 32, No. 2, April 1996, pp. 607-612.
[8] Sohn, J., Robertazzi, T.G. and Luryi, S., Optimizing Computing Costs
using Divisible Load Analysis, IEEE Transactions on Parallel and
Distributed Systems, Vol. 9, No. 3, March 1998, pp. 225-234.
[9] H.J. Kim, V. Mani, Divisible load scheduling in single-level tree net-
works: Optimal sequencing and arrangement in the nonblocking mode
of communication, Computers & Mathematics with Applications,
Volume 46, Issue 10, 2003, Pages 1611-1623.
[10] Bharadwaj, V., Ghose, D. and Mani, V., Optimal Sequencing and
Arrangement in Distributed Single-Level Tree Networks with Com-
munication Delays, IEEE Transactions on Parallel and Distributed
Systems, Vol. 5, No. 9, pp. Sept. 1994, pp. 968-976.
[11] Li, X., Bharadwaj, V. and Ko, C.C., Optimal Divisible Task Schedul-
ing on Single-Level Tree Networks with Finite Size Buffers, Accepted
for publication in IEEE Transactions on Aerospace and Electronic
Systems, February 2000.
[12] Li, X., Bharadwaj, V. and Ko, C.C., Divisible Load Scheduling on
Single Level Tree Networks with Buffer Constraints, IEEE Transactions
on Aerospace and Electronic Systems. vol. 36, no. 4, Oct. 2000, pp.
1298-1308.
[13] Beaumont, O., Casanova, H., Legrand, A., Robert, Y. and Yang, Y.,
Scheduling Divisible Loads on Star and Tree Networks: Results and Open
Problems, IEEE Transactions on Parallel and Distributed Systems,
vol. 16, no. 3, March 2005, pp. 207-218.
[14] Xiaolin, L. and Veeravalli, B., A Processor-Set Partitioning and Data
Distribution Algorithm for Handling Divisible Loads from Multiple Sites
in Single-Level Tree Networks, Cluster Comput (2010) 13: 3146.
[15] J. Sohn and T. G. Robertazzi, Optimal time-varying load sharing
for divisible loads, IEEE Transactions on Aerospace and Electronic
Systems, vol. 34, no. 3, pp. 907-923, Jul 1998.
[16] R. Agrawal and H. V. Jagadish, Partitioning techniques for large-
grained parallelism,” in IEEE Transactions on Computers, vol. 37, no.
12, pp. 1627-1634, Dec 1988.
[17] S. Bataineh and T.G. Robertazzi. Bus-oriented load sharing for a
network of sensor driven processors. IEEE Transactions on Systems,
Man, and Cybernetics, 21(5):12021205, 1991.
[18] V. Bharadwaj, D. Ghose, V. Mani, and T.G. Robertazzi. Scheduling
Divisible Loads in Parallel and Distributed Systems. IEEE Computer
Society, Los Alamitos, CA, 1996.
[19] J. Blazewicz and M. Drozdowski. Distributed processing of divisible
jobs with communication startup costs. Discrete Applied Mathematics,
76(13):2141, 1997.
[20] J. Sohn and T.G. Robertazzi. Optimal divisible job load sharing for bus
networks. IEEE Transactions on Aerospace and Electronic Systems,
32(1):3440, 1996.
[21] G. Amdahl. Validity of the Single Processor Approach to Achieving
Large Scale Computing Capabilities. AFIPS Conference Proceedings,
30(8):483485, 1967.
[22] M. D. Hill and M. R. Marty. Amdahls Law in the Multicore Era. IEEE
Computer, 41(7):3338, 2008.
[23] J. Berliska, Scheduling data gathering with variable communication
speed, Proceedings of the First International Workshop on Dynamic
Scheduling Problems, Pozna 2016, 29-32.
[24] E. Uchiteleva, A. Shami and A. Refaey, Virtualization of Wireless
Sensor Networks Through MAC Layer Resource Scheduling, in IEEE
Sensors Journal, vol. 17, no. 5, pp. 1562-1576, March1, 1 2017.
[25] J. Ordonez-Lucena, P. Ameigeiras, D. Lopez, J. J. Ramos-Munoz,
J. Lorca and J. Folgueira, Network Slicing for 5G with SDN/NFV:
Concepts, Architectures, and Challenges, in IEEE Communications
Magazine, vol. 55, no. 5, pp. 80-87, May 2017.
[26] McGill, R., J. W. Tukey, and W. A. Larsen. Variations of Boxplots., in
The American Statistician, Vol. 32, No. 1, 1978, pp. 1216.
Fei Wu received the BE degree in information
and telecommunication engineering from Xi’an
Jiaotong University, Xi’an, China, in 2012, and
the MS degree in electrical engineering from
Stony Brook University, Stony Brook, New York,
in 2013. He is currently working toward the
PhD degree in electrical engineering at Stony
Brook University. His research interests include
scheduling, parallel processing, computer net-
works and virtualization.
Yang Cao received the BE degree in Electrical
Engineering and Automation from Northwestern
Polytechnical University, Xi’an, China, in June
2012. She also received MS degree in Electrical
Engineering from Stony Brook University, Stony
Brook, New York, in December 2013. Currently
she is working toward the PhD degree in Elec-
trical Engineering at Stony Brook University. Her
research interests include task scheduling and
resource allocation in distributed systems, cloud
networks, data centers, etc.
14
Thomas G. Robertazzi received the BEE de-
gree from Cooper Union, New York, in 1977
and the PhD degree from Princeton University,
Princeton, New Jersey, in 1981. He is presently
a professor in the Department of Electrical
and Computer Engineering, Stony Brook Univer-
sity, Stony Brook, New York. He has published
extensively in the areas of parallel process-
ing scheduling, telecommunications and perfor-
mance evaluation. He has also authored, co-
authored or edited six books in the areas of
networking, performance evaluation, scheduling and network planning.
He is a fellow of the IEEE and since 2008 co-chair of the Stony Brook
University Senate Research Committee.
