Scheduling divisible loads to optimize the computation time and cost by Shakhlevich, NV
	



	

		
	

	
				
  
	

!∀#∃%&∋()!∗+
	

	,−	
,		
	,

	./	,∀0∀∃,∀1∀2∀%)3

,
4∀5
∀
!6	,∀!∀∋&	.		
5
%435#&∋()∀7
∗∋&	
.		
5
%435#&∋()∀∋89&!&∋(∀:∗
−∀!!∗∀
∋(89∋ 8.!;#89(9(∋9& ∋(9 
		<

∗∋&∋&&89(9(∋9& ∋ 9∋=∋&
	


	>	

				

Scheduling Divisible Loads to Optimize the Computation Time
and Cost
Natalia V. Shakhlevich
School of Computing, University of Leeds, Leeds LS2 9JT, U.K.
Abstract
Efficient load distribution plays an important role in grid and cloud applications. In
a typical problem, a divisible load should be split into parts and allocated to several
processors, with one processor responsible for the data transfer. Since processors have
different speed and cost characteristics, selecting the processors order for the transmis-
sion and defining the chunk sizes affect the computation time and cost. We perform a
systematic analysis of the model analysing the properties of Pareto optimal solutions.
We demonstrate that the earlier research has a number of limitations. In particular,
it is generally assumed that the load should be distributed so that all processors have
equal completion times, while in fact there often exists a dominating schedule with non-
simultaneous finishing times of the processors. Moreover, fixing the processor sequence in
the non-decreasing order of the cost-characteristic may be appropriate only for Pareto-
optimal solutions with relatively large deadlines; optimal schedules for tight deadlines
may have a different order of processors. We conclude with an efficient algorithm for
finding the time-cost tradeoff.
Keywords: scheduling, divisible load, time/cost optimization
1 Introduction
Parallel computer systems have given rise to new scheduling models that go beyond the
classical scheduling theory. While in a traditional scheduling model a task can be processed
by one machine at a time, a new feature of multiprocessor computations is the ability to
split tasks into several parts and to process them simultaneously by different processors, see,
e.g., [5, 8]. An additional feature of modern Grid computing and cloud computing systems
is the introduction of the cost factor, see, e.g. [2, 6, 10]. This study is motivated by the lack
of theoretical research in the area and some inaccuracies which can be found in the earlier
research.
We consider the network model described in [7]. There is a set P = {P1, P2, . . . , Pm}
of m processors connected via a bus type communication medium. One processor of the
set P is selected as a master processor to receive a divisible load of size τ and to divide it
into portions of size α1τ , α2τ ,. . . , αmτ ,
∑
m
k=1 αk = 1, which are then transmitted to slave
processors from P to perform required computations.
The processors have different computation speeds and for each processor Pk ∈ P the
inverse of the speed wk is given. This implies that the load of size αkτ allocated to processor
Pk requires computation time αkwkτ .
If P1 is selected as a master processor and the transmission sequence is P2, P3, . . ., Pm,
then P1 can start processing its own load of size α1τ at time 0 and at the same time it can
start transmitting the relevant portions of the load first to P2, then to P3, etc., until the last
1
portion is transmitted to Pm, see Fig. 1. If z is the time needed to transmit the whole load
of size τ , then the communication time for transmitting the portion αkτ to processor Pk is
αkz.
Figure 1: An example of a schedule with master processor P1 and transmission sequence
P2, . . . , Pm
With the selected transmission order, processor P1 completes its portion of computation
at time
T1 = α1w1τ. (1)
Processor Pk, 2 ≤ k ≤ m, receives its portion of the load at time
∑
k
i=2 αiz and immediately
after that it can start computation, which takes αkwkτ time. Thus processor Pk completes
its portion of the load at time
Tk =
k∑
i=2
αiz + αkwkτ.
The finish time T of the load is defined as the makespan of the schedule; it is equal to the
maximum completion time among all processors,
T = max
1≤k≤m
{Tk} . (2)
It is assumed in the described scenario that the master processor can perform data trans-
mission and computation simultaneously. This usually happens if the processor is equipped
with an additional front-end co-processor which takes care of all data transfer so that the
master processor can perform computation as any other processor of the network. In the
absence of a front-end co-processor, the master processor performs data transmission first
and only after that it can start computing its portion of the load. In the latter scenario,
Fig. 1 should be modified so that for processor P1 the box “α1w1τ” is moved immediately
2
after “αmz”, and formula (1) should be replaced by
T1 =
m∑
i=2
αiz + α1w1τ. (3)
Processing the load in accordance with the load distribution α1, α2,. . . , αm incurs com-
putation cost which depends on processors’ costs. Following the notation from [7], we denote
the cost of using processor Pk ∈ P during one time unit by ck so that the cost of performing
the portion of the load αkwkτ by processor Pk is ckαkwkτ . The overall cost of using all
processors P is therefore
K =
m∑
k=1
ckαkwkτ.
Thus a schedule S is given by
- the transmission sequence with the first processor of the sequence selected as a master
processor
and
- the load distribution α1, α2,. . . , αm with
∑
m
k=1 αk = 1.
In this paper we assume that the processors are numbered so that
c1w1 ≤ c2w2 ≤ · · · ≤ cmwm. (4)
The quality of a scheduled is measured in terms of the two characteristics: maximum
completion time T and computation cost K. As a solution of a bicriteria problem we accept
the set of Pareto optimal points defined by the break-points of the so-called efficiency frontier.
In a pair of the associated single criterion problems,
min K
s.t. T ≤ T
(5)
and
min T
s.t. K ≤ K
one of the objectives is bounded while the other one is to be minimized. Here T and K are
threshold values of the load finish time and computation cost, respectively.
2 Finding the Efficiency Frontier
In the (T,K)-space, the set of Pareto-optimal points represents a time-cost efficiency frontier.
We start with an overview of the main outcomes of [7] and then proceed with the description
of additional steps needed to find a correct efficiency frontier.
It is claimed in [7] that all break-points correspond to the schedules of a special type: the
processor sequence is the same for all break-points and it is (P1, P2, . . . , Pm); only a subset
of the several first processors have a non-zero load, while the remaining processors are idle.
Recall that processors are numbered in accordance with (4).
To represent the described schedules formally, introduce notation (P ∗1 , P
∗
2 , . . . , P
∗
k
, −,
. . . ,−) to indicate that processors P1, P2, . . . , Pk are fully loaded completing computation at
time T , while the remaining processors Pk+1, Pk+2, . . . , Pm are idle. Then the set of the
break-points established in [7] is of the form:
3
( P ∗1 , −, −, · · · , −, −, · · · , −, − )
( P ∗1 , P
∗
2 , −, · · · , −, −, · · · , −, − )
( P ∗1 , P
∗
2 , P
∗
3 , · · · , −, −, · · · , −, − )
. . .
( P ∗1 , P
∗
2 , P
∗
3 , · · · , P
∗
k
, −, · · · , −, − )
( P ∗1 , P
∗
2 , P
∗
3 , · · · , P
∗
k
, P ∗
k+1, −, − )
. . .
( P ∗1 , P
∗
2 , P
∗
3 , · · · , P
∗
k
, P ∗
k+1, · · · , P
∗
m−1, − )
( P ∗1 , P
∗
2 , P
∗
3 , · · · , P
∗
k
, P ∗
k+1, · · · , P
∗
m−1, P
∗
m )
The graphical representation of the efficiency frontier from [7] for the case of m = 3 pro-
cessors is shown in Fig. 2. The three break-points, considered right to left, are (P ∗1 ,−,−),
(P ∗1 , P
∗
2 ,−) and (P
∗
1 , P
∗
2 , P
∗
3 ). When transition from (P
∗
1 ,−,−) to (P
∗
1 , P
∗
2 ,−) is performed,
the load from P1 is re-distributed to P2 until both processors have equal completion time;
the intermediate points belonging to that segment of the efficiency frontier are denoted by
(P ∗1 , P2,−), where notation P2 in the schedule description indicates that processor P2 is
partly loaded. Similarly, when transition from (P ∗1 , P
∗
2 ,−) to (P
∗
1 , P
∗
2 , P
∗
3 ) is performed, the
load from P1 and P2 is re-distributed to P3 until all three processors have equal comple-
tion time; the intermediate points belonging to that segment are denoted by (P ∗1 , P
∗
2 , P3),
where notation P3 indicates that processor P3 is partly loaded, while notation P
∗
1 , P
∗
2 im-
plies that the corresponding processors are fully loaded completing their portions of the load
simultaneously.
It appears that the efficiency frontier is more complicated than the one presented in [7]. In
particular, it includes also the points with the processor order different from (P1, P2, . . . , Pm).
In fact, the efficiency frontier can be found as the set of non-dominating segments of m
curves Cℓ, ℓ = 1, . . . ,m. Each curve Cℓ consists of linear segments and corresponds to
a processor sequence with a fixed master processor Pℓ. As we prove in the appendix, in
the class of schedules with a fixed master processor Pℓ, an optimal processor sequence is
(Pℓ, P1, P2, . . . , Pℓ−1, Pℓ+1, . . . , Pm). If ℓ > 1, then the first ℓ − 1 breakpoints (considered in
the (T,K)-space from right to left) correspond to schedules in which the master processor
Pℓ performs only data transmission and does not perform ant computation; the next break-
point involves all ℓ processors fully loaded, so that the master processor Pℓ performs both,
data transmission and computation; in the remaining m− ℓ schedules, ℓ first processors are
fully loaded together with an increasing number of additional slave processors with indices
larger than ℓ.
Formally, the break-points of the curve Cℓ with a fixed master processor Pℓ are of the
form:
processor Pℓ
does not perform
any computation,
only data
transmission


( P ℓ, P
∗
1 , −, −, · · · , −, −, · · · , −, − )
( P ℓ, P
∗
1 , P
∗
2 , −, · · · , −, −, · · · , −, − )
( P ℓ, P
∗
1 , P
∗
2 , P
∗
3 , · · · , −, −, · · · , −, − )
. . . −,
( P ℓ, P
∗
1 , P
∗
2 , P
∗
3 , · · · , P
∗
ℓ−1, −, · · · , −, − )
processor Pℓ
performs
computation
(until T )
and data transmission


( P ∗
ℓ
, P ∗1 , P
∗
2 , P
∗
3 , · · · , P
∗
ℓ−1, −, · · · , −, − )
( P ∗
ℓ
, P ∗1 , P
∗
2 , P
∗
3 , · · · , P
∗
ℓ−1, P
∗
ℓ+1, · · · , −, − )
. . .
( P ∗
ℓ
, P ∗1 , P
∗
2 , P
∗
3 , · · · , P
∗
ℓ−1, P
∗
ℓ+1, · · · , P
∗
m−1, − )
( P ∗
ℓ
, P ∗1 , P
∗
2 , P
∗
3 , · · · , P
∗
ℓ−1, P
∗
ℓ+1, · · · , P
∗
m−1, P
∗
m )
4
Figure 2: Efficiency frontier defined in [7] for the case of m = 3 processors and the associated
schedules (idle processors are omitted)
5
Here notation P ℓ, which appears in the first ℓ − 1 schedules, indicates that processor Pℓ
performs only data transmission and no computation.
The intermediate points of the segments connecting the first ℓ−1 break-points correspond
to the re-distribution of the load to one additional slave processor, without involving the
master processor Pℓ in the computation; the previously loaded slave processors complete their
load simultaneously. The transition from the break-point
(
P ℓ, P
∗
1 , P
∗
2 , . . . , P
∗
ℓ−1,−, . . . ,−
)
to
the ℓ-th break-point
(
P ∗
ℓ
, P ∗1 , P
∗
2 , . . . , P
∗
ℓ−1,−, . . . ,−
)
corresponds to the reallocation of the
load from the slave processors P ∗1 , P
∗
2 , . . . , P
∗
ℓ−1 to the master processor Pℓ, keeping the slave
processors completing their computation simultaneously. Finally, the intermediate points
of the segments the last m − ℓ break-points correspond to the re-distribution of the load
to one additional slave processor that follows the current busy processors in the processor
sequence (Pℓ, P1, P2, . . . , Pℓ−1, Pℓ+1, . . . , Pm); all previously loaded processors complete their
load simultaneously.
Figure 3: Three curves
C1 coonecting (P
∗
1 ,−,−), (P
∗
1 , P
∗
2 ,−), (P
∗
1 , P
∗
2 , P
∗
3 )
C2 coonecting (P
∗
2, P
∗
1 ,−), (P
∗
2 , P
∗
1 ,−), (P
∗
2 , P
∗
1 , P
∗
3 )
C2 coonecting (P
∗
3, P
∗
1 ,−), (P
∗
3 , P
∗
1 , P
∗
2 ), (P
∗
3 , P
∗
1 , P
∗
2 )
and the trade-off curve (in solid lines) consisting of non-dominating segments and their parts
An example of the three curves C1, C2, and C3 for the three-processor case is shown in
Fig. 3. The resulting efficiency frontier consisting of non-dominated solutions is represented
as solid lines. The efficiency frontier consists of the following components, listed from right
6
to left:
(i) the right-most segment of the curve C1 that connects (P
∗
1 ,−,−) and (P
∗
1 , P
∗
2 , −);
(ii) a part of the second segment of C1 that connects (P
∗
1 , P
∗
2 ,−) and (P
∗
1 , P
∗
2 , P
∗
3 ) until its
intersection point with the first segment of C2;
(iii) a part of the segment of the curve C2 connecting (P 2, P
∗
1 ,−) and (P
∗
2 , P
∗
1 ,−) starting
at the right end with the previously defined intersection with C1;
(iv) the full segment of the curve C2 connecting (P
∗
2 , P
∗
1 ,−) and (P
∗
2 , P
∗
1 , P
∗
3 );
(v) a part of the last segment of the curve C3 connecting (P 3, P
∗
1 , P
∗
2 ) and (P
∗
3 , P
∗
1 , P
∗
2 );
its right-most T -value corresponds to the T -value of the left end (P ∗2 , P
∗
1 , P
∗
3 ) of the
previous segment.
Notice that the resulting efficiency frontier is not convex and even not continuous.
While it is possible to prove that some points of the curves C1, C2, and Cm always dominate
each other (for example, (P ∗1 ,−,−, . . . ,−) always dominate (P k, P
∗
1 ,−, . . . ,−) for any 1 <
k ≤ m), the dominance relation between other points can vary depending on the specific ci-
and wi-values. For example in the three processor case, the may be no intersection point
between curves C1 and C2, so that the whole curve C1 dominates all points of the curve C2.
We demonstrate in the full version of the paper that for each curve Cℓ, all its break-
points can be found in O(m2) time since each subsequent break-point can be defined from
the previous one in O(m) time by re-calculating the associated αi-values, 1 ≤ i ≤ m. Thus
all break-points of the curves C1, C2, . . . , Cm can be found in O(m
3) time.
Having constructedm(m−1) segments of the curves C1, C2, and Cm, the required efficiency
frontier is found as the lower boundary among the curves.
3 Conclusions
In this paper, we have performed a systematic analysis of the problem of scheduling a divisible
load on m processes in order to minimize the computation time and cost. An efficient algo-
rithm for solving the bicriteria version of the problem defines optimal processor sequences for
different segments of the efficiency frontier and the corresponding optimal load distribution
among the processors.
Our study demonstrates that the earlier research [7] has a number of limitations. Some
assumptions result in incorrect major conclusions. In particular, it is generally assumed in [7]
that the load should be distributed so that all processors complete their portions simultane-
ously, while as we show, there often exists a dominating schedule with non-simultaneous fin-
ishing times of the processors. Moreover, fixing the processor sequence in the non-decreasing
order of the cost/speed characteristic given by (4) may be appropriate only for Pareto-optimal
solutions with relatively large deadlines; optimal schedules for tight deadlines may have a
different order of processors with master processor Pℓ, 1 < ℓ ≤ m, moved in front of slave
processors P1, P2, . . . , Pℓ−1, Pℓ+1, . . . , Pm.
The described model with a single divisible load provides a foundation for more ad-
vanced models which better describe various real-world scenarios. Further generalizations
include multiple divisible loads, bandwith dependent formulae for calculation transmission
times, multi-installment load distribution, multi-round schedules and more complex network
7
topologies. An attempt to generalize the results for the case of a more complex cost function
that includes data transmission costs in addition to computation costs is presented in [3].
Clearly, a study of more complex models should rely on accurate analysis of the simplified
model.
Appendix
The validity of the described algorithm follows from a number of properties of optimal
schedules. The properties are proved for a single-criterion version of the problem (5) for a
fixed makespan parameter T . Since T may take different values, the properties are correct
for all schedules of the efficiency frontier.
The first two propositions provide a justification for fixing a processor sequence in an
optimal solution; the third proposition establishes how the load should be distributed in an
optimal solution.
We assume that processors are numbered in accordance with (4). Initially we consider an
arbitrary processor sequence which can be different from the sequences listed in Section 2.
Proposition 1 ‘Swapping Two Neighbour Slave Processors’
Consider schedule S in which two neighbour slave processors Pi and Pk in the processor
sequence compute portions of load αi and αk and have finishing times Ti and Tk, respectively.
It is always possible to change the order of Pi and Pk in the processor sequence so that in a
new schedule S′ the loads are α′
i
and α′
k
, processor finish times are T ′
i
and T ′
k
and
(a) the loads are re-distributed so that α′
i
= αi − δ and α
′
k
= αk + δ for 0 ≤ δ ≤ αi;
(b) the load on other processors remains the same;
(c) the maximum finish time of processors Pi and Pk does not increase:
max {T ′
i
, T ′
k
} ≤ max {Ti, Tk}.
Proof. Introduce notation Θ for max {Ti, Tk}. We consider the two cases depending on
whether processor Pi finishes its portion of the load earlier than Pk or not.
Case 1: in the initial schedule S, Ti ≤ Tk. This implies that
αiwiτ ≤ αk (z + wkτ) . (6)
In the initial schedule S, we denote by H the length of the time interval from the start
of αiz until Θ = Tk,
H = αiz + αk (z + wkτ) .
Consider schedule S′ obtained from S by swapping Pk and Pi. If in S
′ condition (c) is
satisfied, then Proposition 1 holds. Otherwise we have a schedule shown in the right-hand-
side of the figure, with T ′
i
> Θ (notice, that T ′
k
cannot exceed Θ since αk (z + wkτ) < H).
In order to achieve condition (c), we need to move part δ ≤ αi of the Pi-load to Pk, so that
in the resulting schedule S′ the load on Pi is α
′
i
= αi − δ and the load on Pk is α
′
k
= αk + δ.
The following inequalities should be satisfied:
α′
i
≥ 0, (load allocated to Pi does not become negative),
α′
k
(z + wkτ) ≤ H, (T
′
k
does not exceed Θ),
α′
k
z + α′
i
(z + wiτ) ≤ H, (T
′
i
does not exceed Θ).
(7)
8
Figure 4: Changing the sequence of Pi and Pk in Case 1
It follows that
αi − δ ≥ 0, ⇒ δ ≤ ai,
(αk + δ) (z + wkτ) ≤ αiz + αk (z + wkτ) ⇒ δ ≤
αiz
z+wkτ
,
(αk + δ) z + (αi − δ) (z + wiτ) ≤ αiz + αk (z + wkτ) ⇒ δ ≥ αi −
αkwk
wi
.
The second and the third inequalities imply that
αi −
αkwk
wi
≤ δ ≤ αi −
αiwkτ
z + wkτ
, (8)
while the first condition δ ≤ ai is redundant since
αiz
z+wkτ
≤ ai. Notice, that (8) is feasible
since
αkwk
wi
≥
αiwkτ
z + wkτ
by (6).
Case 2: in the initial schedule S, Ti > Tk:
αiwiτ > αk (z + wkτ) . (9)
In the initial schedule S, we denote by G the length of the time interval from the start
of αiz until Ti,
G = αi (z + wiτ) .
After swapping Pk and Pi, if the load is kept unchanged, we obtain schedule S
′ with
T ′
i
> Θ, while T ′
k
≤ Θ. Hence we need to move part δ ≤ αi of the Pi-load to Pk, so that in
the resulting schedule S′ the load on Pi is α
′
i
= αi− δ and the load on Pk is α
′
k
= αk+ δ. We
need to guarantee that inequalities (7) with H replaced by G should be satisfied. It follows
that
αi − δ ≥ 0, ⇒ δ ≤ ai,
(αk + δ) (z + wkτ) ≤ αi (z + wiτ) ⇒ δ ≤
αi(z+wiτ)
z+wkτ
− αk,
(αk + δ) z + (αi − δ) (z + wiτ) ≤ αi (z + wiτ) ⇒ δ ≥
αkz
wiτ
.
9
Figure 5: Changing the sequence of Pi and Pk in Case 2
It remains to show that condition
αkz
wiτ
≤ δ ≤ min
{
αi,
αi (z + wiτ)
z + wkτ
− αk
}
is feasible. Indeed, if the smallest value in the right hand side is αi, then
αkz
wiτ
< ai
due to (9). Alternatively, if αi is the largest value in the r.h.s., then
r.h.s.− l.h.s. =
αi (z + wiτ)
z + wkτ
− αk −
αkz
wiτ
=
αiwiτ − αk (z + wkτ)
wiτ (z + wkτ)
(z + wiτ) .
The numerator in the last expression is positive due to (9), so that a feasible δ that lies
in-between the l.h.s and the r.h.s. does exist.
Proposition 2 ‘Non-decreasing Sequence of ciwi for Slave Processors’
If the master processor Pℓ is fixed, then an optimal processor sequence is (Pℓ, P1, P2, . . . ,
Pℓ−1, Pℓ+1, . . . , Pm).
Proof. Suppose in an optimal schedule there are two neighbour processors Pi and Pk, Pi
precedes Pk and ciwi > ckwk. Then such a schedule is not optimal. Indeed, changing the
order Pi and Pk in the processor sequence, as described in Property 1, leads to a schedule
S′ with the same finish time of the load and α′
i
= αi − δ, α
′
k
= αk + δ. Since the load of
other processors does not change, the computation cost changes from K to K ′ and K ′−K =
(ckwk − ciwi) τδ < 0.
Given a schedule, let T be its makespan, see (2). Depending on processors’ finish times,
we classify them as fully loaded, partly loaded or idle. Processor Pi is busy if Ti ≥ 0, and it
is idle otherwise. To be precise, we call processor Pi fully loaded if Ti = T and it is partly
loaded if 0 < Ti < T . Notice that the master processor can be idle if its performs only data
transmission and no computation.
10
Proposition 3 ‘Unique Partly Loaded Processor’
Consider a class of schedules with master processor Pℓ and an optimal schedule with pro-
cessor sequence (Pℓ, P1, P2, . . . , Pℓ−1, Pℓ+1, . . . , Pm). Let k be the largest index among busy
processors, 1 ≤ k ≤ m. Then all processors with smaller indices P1, P2, . . . , Pk−1 are fully
loaded and all processors with larger indices Pk+1, Pk+2, . . . , Pm are idle.
Proof. We first show that there cannot be an idle or partly loaded slave processor, after
which there is another (partly or fully) loaded processor. Suppose in schedule S processor
Pi is idle or it is partly loaded, while Pi+1 has a non-zero load. Then the load of Pi+1 can be
re-distributed by moving part δ of that load from Pi+1 to Pi, 0 < δ ≤ αi+1, so that in the
resulting schedule S′,
α′i+1 = αi+1 − δ,
α′i = αi + δ,
see Fig. 6. As a result of this transformation, the finish time of Pi increases (due to the
increase in the transition time α′
i
z > αiz and the increase in the computation time α
′
i
wiτ >
αiwiτ) while the finish time of Pi+1 decreases (due to the decrease in the computation time
α′
i+1wi+1τ < αi+1wi+1τ ; the total transition time does not change since
(
α′
i
+ α′
i+1
)
z =
((αi + δ) + (αi+1 − δ)) z = (αi + αi+1) z). The finish times of the remaining processors Pi+1,
. . . , Pm does not change decrease. The largest feasible value of δ either makes Pi fully loaded
or makes Pi+1 idle. Since ciwi ≤ ci+1wi+1 by (4), the cost of the resulting schedule does not
increase.
Figure 6: Re-distributing the load between two slave processors
In what follows we consider an optimal schedule in which the last slave processor with
non-zero load is Pk, all slave processors with smaller indices are fully loaded and all slave
processors with larger indices are idle. If index ℓ of the master processor Pℓ satisfies ℓ > k,
so that cℓwℓ ≥ ckwk and that processor has a non-zero load, then we re-distribute the load
from Pℓ to Pk. In the resulting schedule S
′,
α′ℓ = αℓ − δ,
α′k = αk + δ,
11
where 0 < δ ≤ αℓ. This transformation does not affect other processors with non-zero load,
but decreases the cost, see Fig. 7.
Figure 7: Re-distributing the load from Pℓ to Pi, i < ℓ
If in S′, Pℓ becomes idle, Proposition 3 is proved. Otherwise Pk becomes fully loaded and
we perform a similar transformation by moving the load from Pℓ to Pk+1, . . . , Pℓ−1 until Pℓ
becomes idle or all slave processors with indices smaller than ℓ become fully loaded.
Finally, consider that case that index ℓ of the master processor Pℓ satisfies ℓ < k, so that
cℓwℓ ≤ ckwk. We re-distribute the load from Pk to Pℓ so that either Pk becomes idle or Pℓ
becomes fully loaded. As a result of this transformation, the finish time of Pk decreases (due
to the decrease in the transition time α′
k
z < αkz and the decrease in the computation time
α′
k
wkτ < αkwkτ), and the finish time of Pℓ increases (due to the increase in the computation
time α′
ℓ
wℓτ < αℓwℓτ). The largest feasible value of δ either makes Pℓ fully loaded or makes
Pk idle; the cost of the resulting schedule does not increase.
It follows from Propositions 1-3 that in a class of schedules with a fixed master processor
Pℓ all optimal schedules have processor order (Pℓ, P1, P2, . . . , Pℓ−1, Pℓ+1, . . . , Pm) and for a
given makespan threshold value T , an optimal schedule can be constructed by loading in full
processors in the order P1, P2, . . . , Pk−1 until the remaining load can be processed by Pk.
Varying the T -values we conclude that all optimal schedules in that class belong to the curve
Cℓ defined in Section 2.
Acknowledgements
This research was supported by the EPSRC funded project EP/G054304/1 “Quality of Ser-
vice Provision for Grid applications via Intelligent Scheduling”.
References
[1] Beaumont, O., Legrand, A., Robert, Y.: Scheduling divisible workloads on heterogeneous
platforms, Parallel Comp. 29 (2003) 1121–1152
12
[2] Buyya, R., Abramson, D., Venugopal, S.: The grid economy, Proceedings of the IEEE
93 (2005) 698–714
[3] Charcranoon, S., Robertazzi, G.R., Luryu, S.: Parallel processor configuration design
with processing/transmission costs, IEEE Trans. on Computers 49 (2000) 987–991.
[4] Chuprat, S., Baruah, S.: Real-time divisible load theory: incorporating computation
costs, Proceedings of the 17th IEEE International Conference on Embedded and Real-
Time Computing Systems and Applications
[5] Drozdowski, M.: Scheduling for Parallel Processing, Springer, London, 2009
[6] Kumar, S. Dutta, K., Mookerjee, V.: Maximizing business value by optimal assignment
of jobs to resources in grid computing, European J. of Oper. Res. 194 (2009) 856–872
[7] Sohn, J., Robertazzi, T.G. and Luryi, S.: Optimizing computing costs using divisible
load analysis, IEEE Trans. Parallel and Distributed Systems 9 (1998) 225–234
[8] Robertazzi, T.G.: Ten reasons to use divisible load theory, IEEE Computer 36 (2003)
63–68
[9] van Hoesel, S., Wagelmans, A., Moerman, B.: Using geometric techniques to improve
dynamic programming algorithms for the economic lot-sizing problem and extensions,
European J. Oper. Res. 75 (1994) 312–331
[10] Yu, J., Buyya, R., Ramamohanarao, K.: Workflow Scheduling Algorithms for Grid
Computing. In: F. Xhafa, A. Abraham. eds, Metaheuristics for Scheduling in Distributed
Computing Environments, Springer, Berlin, Germany, 2008
13
