Scheduling divisible loads in bus networks with arbitrary processor release times  by Bharadwaj, V et al.
Pergamon 
Computers Math. Applic. Vol. 32, No. 7, pp. 57-77, 1996 
Copyright©1996 Elsevier Science Ltd 
Printed in Great Britain. All rights reserved 
0898-1221/96 $15.00 + 0.00 
PIh S0898-1221(96)00156-3 
Schedul ing Divisible Loads in Bus Networks  
with Arbitrary Processor Release T imes  
V. BHARADWAJ, H. F. LI AND T. RADHAKRISHNAN 
Department of Computer Science, Concordia University 
1455, De Maisonneuve Blvd. W. 
Montreal, Quebec H3H IM8, Canada 
<birdie><hfli><krishnan>@cs. concordia, ca 
(Received and accepted May 1996) 
Abst ract - -The  problem of processing divisible loads in a distributed bus network architecture 
with arbitrary processor available/release times is considered. The objective is to optimally distribute 
the toad among the processors in the system in such a way that the processing time of the entire 
load is a minimum. Different cases of release time distributions are considered. First, we present the 
analysis for the case in which the processors are assumed to be available from the time instant at 
which the processing load arrives at the system. In this case, a multi-installment load distribution 
strategy is adopted. We develop basic recursive quations for the case when the processors in the 
system are heterogeneous and extend the analysis to the case of homogeneous processors. Then the 
case of arbitrary processor release times is considered. Closed-form solutions for the processing time 
when the processor release times are identical are derived. For a general case of arbitrary processor 
release times, a heuristic algorithm is presented. All the cases are demonstrated through several 
illustrative xamples. 
Keywords - -D iv i s ib le  jobs, Communication delay, Computation time. 
1. INTRODUCTION 
The problem of minimizing the processing t ime in paral lel  and d istr ibuted systems has received a
great  deal of interest in recent imes. In part icular,  due to the advent of high speed mult iprocessor 
systems, a considerable research effort is focussed on the domain of scheduling tasks that  arrive at 
a computer  system. In general, the scheduling problem aims at the minimizat ion of the processing 
t ime of the tasks under various constraints [1]. The jobs that  arrive at a computer  system can be, 
indivisible and/or  divisible. Indivisible jobs have the requirement that  they have to be processed 
in their  entirety. In other words, these jobs cannot be further subdivided.  The problem of 
scheduling such jobs is general ly referred to as bin-packing problem, and has been proven to be 
NP-Complete  under several constraints [1]. The class of divisible jobs are in turn classified into 
two types, namely, modularly divisible or arbitrarily divisible. In the case of modular ly  divisible 
jobs, a given job is subdiv ided into a set of predetermined modules, represented by means of 
task interaction graphs [2-5]. These modules are assigned to different processors in such a way 
that  the processing of the entire job is carried out in the shortest possible t ime. In the case of 
arb i t rar i ly  divisible jobs, the processing job can be divided into inf initesimally small  fractions and 
each of these fractions can be processed in an independent manner.  Essential ly, these fractions 
do not have any precedence relations unlike modular ly  divisible jobs. A detai led discussion on 
The financial assistance provided through industrially oriented research grants by NORTEL (BNR) and National 
Sciences and Engineering Research Council of Canada is gratefully acknowledged. 
Typeset by .A.A/~-TEX 
57 
58 V. BHARADWAJ et al. 
the classification of the various types of jobs based on the divisibility aspect was presented in 
the literature [6]. The area of scheduling arbitrarily divisible jobs is of recent birth and has 
stimulated considerable research interest in the recent times [6-23]. One of the first papers in 
this area is by Cheng and Robertazzi in 1988 [7]. This research was initiated to meet the ever 
increasing demands in processing a large volume of data received in distributed intelligent sensor 
networks. In such networks, one of the main objectives is to assign optimal fractions of the total 
job among several sensors/processors such that the entire load is processed in a minimal amount 
of time. Such types of jobs can be encountered in applications such as processing of large data 
files, image processing data, and in computer vision systems [6]. This body of the literature is 
commonly referred to as divisible load theory [6]. This paper is a contribution in this domain. 
In a distributed system, since the processors are geographically separated, it takes certain 
amount of time for the load to reach the destination from the source site. This communication 
delay, in general, is unpredictable in its behaviour and has four components [24]. All the earlier 
studies [6-23] in this domain of research adopted a linear modelling for the communication links 
and processors. In other words, it has been assumed that the communication delay incurred in 
sending a load from one site to another is proportional to the volume of the load that is being 
transferred. Similarly, the computation time taken by a processor in the network is proportional 
to the amount of load assigned to it. Though this is an approximate model, it has been shown [6] 
that this model gives results that are very close to a more accurate model on comparison. We 
assume a similar model for the problem addressed in this paper. A heterogeneous linear network 
of processors was considered [7] and a computational gorithm was developed to find the optimal 
load fractions by assuming all the processors stop computing at the same time instant. In fact, 
this assumption has been shown to be a necessary and sufficient condition for obtaining optimal 
processing time in linear networks [8] by using the concept of processor equivalence. An analytic 
proof of this assumption i  bus networks is presented [12,13]. Subsequently, it has been rigorously 
shown [6] that in the case of heterogeneous single-level tree networks, this condition is true only 
in a restricted sense. A similar proof was presented for linear networks, too [6]. A closed- 
form expression for the processing time in single-level tree networks has been derived [9] and an 
algorithm was proposed to obtain an optimal tree configuration for a special case [10]. Closed- 
form solutions for the processing time in single-level tree networks were derived and the concept 
of optimal sequencing and optimal network arrangement were introduced [11]. For homogeneous 
linear networks a closed-form expression for the processing time was derived [16]. Asymptotic 
solutions for tree, bus, and linear networks have been presented [14,15]. A multi-installment load 
distribution strategy, in which the processing load is distributed in more than one installment, 
was introduced for single-level tree networks and ultimate bounds on the time performance have 
been derived [17]. Using this strategy, closed-form solutions for the total processing time of the 
load were derived for homogeneous single-level tree networks. Further, this multi-installment 
strategy was applied to linear networks [6,18] and closed-form solutions for processing time for 
homogeneous networks was presented. All these studies focus their attention on the situation 
that only one load is available for processing. This assumption was relaxed and an efficient 
algorithm was proposed for multiple-jobs in bus networks [19]. Very recently, the fault-tolerant 
aspects of the system has been studied on bus networks [20]. A more accurate analysis has been 
presented [21] for bus networks through efficient modelling of links and processors as time varying 
quantities rather than assuming them as constant parameters. Scheduling the processing loads 
that has both arbitrarily divisible and indivisible components was also considered in the literature 
and an heuristic algorithm was proposed [23]. 
All the earlier studies in the literature so far have assumed that all the processors in the network 
remain idle and are available for processing the load starting from the time instant at which the 
load arrives at the system. In a practical scenario, this need not be true always. In this paper, 
we relax this assumption and consider the case in which the processors in the system have some 
release times. In fact, this aspect has more practical impact and has been studied extensively in 
Scheduling Divisible Loads 59 
the literature for indivisible jobs [1]. In this paper, we present some load distribution strategies 
and analyze their behaviour subject o the constraint that the release times of the processors are 
arbitrary for processing arbitrarily divisible loads in bus networks. This is the first time in this 
area an analysis is provided for the case when the processors have arbitrary release times. 
The organization of this paper is as follows. First, we formally introduce the problem, provide 
some definitions and notation.s used throughout his paper. Then, we apply multi-installment 
strategy for the case when the processors are assumed to be available from the time instant 
at which the load arrives at the system. We develop recursive equations for heterogeneous 
networks and extend the analysis to the case of homogeneous networks. These results will be 
used later in our proposed algorithm presented in the Appendix. Next, we consider the case 
in which the processors have arbitrary release times. We demonstrate the algorithm through 
illustrative xamples in the subsequent section. Finally, we discuss certain important features of 
the algorithm and conclude the paper. 
2. SOME PREL IMINARY REMARKS 
In this section, we first describe the problem addressed in this paper, then introduce some 
important definitions and notations that are frequently used throughout this paper. 
2.1. Mot ivat ion  and Def in i t ions 
In all the earlier studies in the literature, a considerable attention was given to minimize the 
total processing time of the entire load by adopting different load distribution strategies. All 
these strategies invariably assume that all the processors in the system remain idle and are 
available from the time instant at which the load originator starts distributing the load. We refer 
to this case as idle case. However, in a practical scenario, when a load arrives, the processors 
in the system may be available only from a particular instant in time. We refer to these time 
instants as release times and refer to the case when the release times are nonzero as nonidle 
case. It would be of natural interest to design and analyse a load distribution strategy that 
minimizes the processing time of the entire load, with additional constraints on the availability 
of the processors in the system. In the case of single-level tree networks, it has been shown [17] 
that by distributing the processing load in more than one installment, the processing time can 
be decreased to a greater extent. In this paper, we first apply this multi-installment s rategy to 
bus networks for the idle case to determine the optimal processing time. As mentioned in the 
Introduction, we consider a bus network architecture in which there are m processors P l , . . - ,Pm 
connected via a bus as shown in Figure 1. The processing load is assumed to arrive at the bus 
controller unit (BCU). The following are the notations used throughout the paper. 
z: The inverse of the maximum channel speed of the bus. 
w~: The inverse of the maximum computing speed of the i th processor. 
Tom: The time taken to communicate he entire load over the standard channel. For a standard 
channel, z = 1. 
Top: The time taken to compute the entire load by a standard processor. For a standard 
processor, w = 1. 
(i) Finish Time: The finish time of a processor p~ is the time interval between the instant it 
stops computing and the instant at which the BCU initiates the load distribution process. 
This is denoted as T~. 
(ii) Processing time: This is defined as 
T(m,  n) -~ max (T1, T2, . . . , Tin), (1) 
where n is the number of installments in which the load is distributed. 
(iii) Optimal Load Distribution: This is defined as the load distribution among the processors 
for which the processing time is a minimum. 
6O 
(iv) 
(v) 
V. BHARADWAJ et al. 
Release time: The release time of a processor pi is the time instant at which the processor 
is available for computing, denoted as ti. 
Divisibility Factor: This is defined as the smallest possible fraction of the load that can 
be assigned to any processor, and is denoted by A > 0. 
Throughout the paper, we shall assume that the number of installments are small such 
that the load fraction assigned to each processor in each installment is greater than or 
equal to the divisibility factor. 
Processing Load 
[ 
Bus Controller Unit ( BCU ) 
Pm Pro-1 P~ P2 Pl 
Figure 1. Bus network architecture with load origination at BCU. 
For the idle case, the load distribution is assumed to follow the rules given below. 
1. The BCU distributes the load fractions to all the processors in installments following a 
fixed sequence. We assume that the BCU does not take part in the computational process. 
2. All the processors in the system are equipped with front-end processors which receive the 
load while previous load fractions ave computed. 
3. A processor Pi starts computing a given load fraction in any installment only after its 
front-end finishes receiving the load in that installment. 
4. All the processors are continuously engaged in the computation once they start computing 
the load fractions from the first installment to the last installment. 
5. All the processors stop computing at the same instant in time. 
Rules 3-5 have been used in all the earlier studies in the literature [6-19]. In the case of single- 
level tree networks with identical inks (bus networks), for single installment load distribution 
strategy, it has been proved rigorously that Rule 5 is both necessary and sufficient o obtain 
the optimal processing time [6,22]. It is intuitively apparent that the above optimality condition 
also holds true for this strategy. However, we do not provide rigorous proof for the optimality 
condition in this paper. It is worth mentioning here that Rules 4 and 5 are the properties of 
the optimal solution and not the physical constraints. In all the earlier studies, this has been 
implicitly assumed. 
3. LOAD DISTRIBUT ION STRATEGY FOR THE IDLE  CASE 
We now present he mathematical model adopted in this paper, describe the architecture of
the network, and develop the basic recursive quations when the processing load is distributed in 
more than one installment for the idle case. Towards the end of this section, we transform these 
recursive quations into a form that is easy to compute the individual load fractions. In the next 
section, we make use of these results to obtain the optimal time performance for the nonidle case. 
Scheduling Divisible Loads 61 
3.1. Basic Recursive Equations 
The bus network consists of a bus controller unit and m processors connected via a bus as 
shown in Figure 1. The processing load is assumed to originate at the BCU. The BCU divides 
the load in an optimal manner and distributes the load fractions tarting from processor Pm to 
processor Pl, following the rules specified in Section 2.1. Further, all the processors in the system 
are assumed to be equipped with front-ends or communication coprocessors. From Figure 1, 
it may be noted that the processors are designated as Pm,Pm-1, . . . ,P l ,  in the reverse order. 
Also, the individual oad fractions are designated in the reverse order. This notation reversal is 
adopted for mathematical ease [17]. Note that the first and the second indices of the load fractions 
denote the processor and the installment, respectively. Now, following the load distribution rules 
specified in the previous ection, we develop the basic recursive quations. 
The timing diagram for a m processor system with n installments i shown in Figure 2. In 
the timing diagram shown in Figure 2, the computation time of the processors are shown below 
the time axis while the communication of the load fractions of the BCU is shown above the time 
axis. Prom the timing diagram, we obtain the following recursive quations: 
o~i, lWiZcp ~- oti_l,  1 (z rcrn  -J- Wi_ l rcp)  , 
ai,j'wiTcp : ak, jZTcm + ak,j- lZTcm , 
kk=l k=i 
The normalizing equation is given by 
m n 
EEo, , J  = 1 
i=1 j= l  
i = 2 ,3 , . . . ,m,  (2) 
i=1  . . . .  ,m, j=2 , . . . ,n .  (3) 
(4) 
Thus, we have (mn) linear equations with (ran) unknowns. These equations can be solved 
recursively by using the procedure described for single-level tree networks [17]. From the timing 
diagram, the processing time of the total load is given by 
T(m,n)  = zTcm + OZl,lWlrcp, (5) 
where c~1,1 is the load fraction assigned to the processor Pl in the last installment, and can be 
obtained by solving (2)-(4). 
3.2. Recursive Equations for Homogeneous Networks 
The recursive quations developed above are for networks consisting of heterogeneous proces- 
sors. In this section, we shall assume that all the processor speeds are identical, i.e., w~ = w 
for all i = 1 , . . . ,m and obtain the optimal load fractions in a form that is computationally 
easier to evaluate. Hence, throughout the paper, unless specified, the network considered is of 
homogeneous type. In this case, equations (2) and (3) can be rewritten as 
and 
where 
~i,1 : a1,1(1 -~- ~) i -1 ,  i=2 , . . . ,m,  (6) 
i--1 trt / 
i= l , . . . ,m,  j=2 , . . . ,n ,  (7) 
zT~m 
wTcp " 
(8) 
62 V. BHARADWAJ et al. 
aijzTcm a I "zTcm 
~=,~ T~ ~i~Tcn~ ,~Tcm%,j ~T~m/'~i lj ~] - ' -~, j -1  ~ ~i,j-1 ~m ~i,l~¢m 5i-~a ~ 
(.. n.. 
% 
P! 
mL.~Co~,  9 i| I~ii~I~V~ am, lWmTcp 
t 
(li'n WiSp I . [ ai'J WiSp m . . . .  
i i 
ai,1 WiSp [ 
I a 1,j w l~p 
a 1,n w iTcp 
. . . . . . . . . . . . . . .  [ a 1,1 w I~p I 
Figure 2. Timing diagram: idle case. 
We follow a similar procedure adopted in [17] to obtain a transformed system of equations from (5) 
and (6) for the ease of computation. Let 
Oli, j : O~l,lXk, i = 1 , . . . ,m,  j = 1 , . . . ,n ,  (9) 
where 
Thus, from (5) and (6) we obtain 
k = ( i -  1) + ( j -  1)m. (lO) 
Xk = (1 +/3) k, 
X k : (Xk_  1 q- Xk_  2 -~-... q- Xk_m)  ~, 
k = 0 ,1 , . . . ,m-  1, (11) 
k = m, . . . ,mn-  1, (12) 
where 
ran- 1 
X= ~-~Xi .  
i=0 
Now using the normalizing equation (3), we obtain 
1 
~1,1 --~ ~*  
Thus, using (14) in (4), the expression for the processing time is given by 
1 
T(m,  n) = zTcm + '-~wTcp. 
(13) 
(14) 
(15) 
Scheduling Divisible Loads 63 
Hence, from (11) and (12), we obtain the values of the Xi's, and then using (13), we evaluate 
the total load X. Then using (15), the optimal processing time can be obtained immediately. A
rigorous derivation to obtain the closed-form solution for the processing time using generating 
functions approach was presented in the literature [17,18]. A similar treatment can be adopted 
here to derive the closed-form expression for the processing time. Once the values of the Xi 's  
are obtained, the corresponding ai 's  can be obtained from (9), and hence, the time instants at 
which the processors tart computing these optimal load fractions, denoted as t* for Pi, can be 
obtained from the timing diagram as 
m 
t~ = ~ ~;,nzTcm. (16) 
j= i  
In the next section, propose an algorithm for the case of arbitrary release times, and the idle case 
algorithm described above is a special case. 
4. PROPOSED ALGORITHM FOR THE NONIDLE  CASE 
Now we describe the complete algorithm for obtaining a load distribution that minimizes the 
processing time for the nonidle case. We assume that the release times of the processors are 
known to the BCU. Without loss of generality, we assume that the load distribution sequence 
follows from the processor which is available earlier to the processor which is available latest, 
and let this order be P l , . . . ,Pm,  respectively. This assumption is made to develop a general 
algorithm and does not depend on the true index of the processor. In a practical scenario, since 
the BCU knows the release times of all the processors, the sequence of load distribution can be 
easily tracked. Since the release time distribution can be arbitrary, the following are the different 
cases to be studied. 
4.1. Ident ica l  Case  
We consider the situation in which the release times of the processors are such that ti = t, 
for all i = 1 , . . . ,  m. Accordingly, this distribution t can either be t > zT~m or t < zTcm. 
When t >_ zTcm, the timing diagram is as shown in Figure 3a. Since t > zTcm, the entire load is 
divided equally into m parts and are distributed among all the processors in a time earlier than 
their release times. Since the processors are identical, all the load fractions are computed in a 
time wTcp/m from time t. Hence, the processing time is given by 
T(m, 1) = t + wTc---Ep (17) 
m 
It may be noted that this load distribution shown in Figure 3a is optimal and the following 
theorem proves this claim. 
THEOREM 1. I11 a bus network, let the re/ease times of all the processors be identical, i.e., ti = t 
for all i = 1, . . . ,  m and let t >_ zTcm. Then, optimal processing time is achieved when each 
processor is assigned an equal fraction of the total load before time t, and all the processors tart 
their computation from time t. 
PROOF. Let S denote the set of all possible feasible schedules, where a feasible schedule is the 
one in which all the processors receive their respective load fractions before t. Let us denote the 
processing time of a feasible schedule sj E S as T(sj).  Let a i ( j )  be the load fraction assigned to 
processor Pi in the schedule st before t. Then, 
T(st) > max [t + ~wTcp], st e S. (18) 
l< i<rn 
64 V. BHARADWAJ et al. 
BCU 
Pm 
BCU 
v2 
zT~m [ 
v.q'cp / m 
-,LCI'cp / m 
wTcp / m 
{t) 
Figure 3(a). Timing diagram: identical release times; t > zTcm. 
L 1 zTcm (I- L I) zTcm [ 
1.1WTcp / rn (1- It) wTcp / m 
, - t  
{t} 
Figure 3(b). Timing diagram: identical release times; t < zTcm; (22) is satisfied. 
But 
max [t + o~iwTcp] > [t + WTmCP ] sj E S, (19) 
l<i<m 
since the maximum of a set is greater than or equal to its average. Therefore, 
T(sj) >_ [t + WTmCP ] , sj E S. (20) 
It may be noted that the RHS of the above expression is the processing time of a schedule that 
corresponds to Figure 3a, thus proving the theorem. 1 
When t < zTcm, we adopt the following strategy. Since all the processors are released at the 
same time, we distribute the load in more than one installment. The timing diagram for this 
case is as shown in Figure 3b, wherein the amount of load that is equivalent o time t is first 
distributed among all the processors equally. This fraction of the total load to all the processor, 
denoted as Lh  is given by 
t 
L1 = zTc----~" (21) 
L zTcm L jzTcm //J-~ f . /  
P 
2 
L I zTcm L 2zTcm 
Pm 
t (release time) 
BCU 
Scheduling Divisible Loads 
"~ Ij. 1 wTcp/m 
L k zTcm 
II 
i i 
i i 
[k.lWTcp/m: ',~,, 
65 
Figure 
I k wTcp/m 
4. Timing diagram: identical release times: t < zTcm; (22) is violated. 
Hence, if the computation time of this load fraction (L l /m) by each of the processor exceeds the 
total communication time, then at most we require one more installment to finish computing the 
entire load, i.e., if 
t 
t + - -wTcp  > zTcm, (22) 
mzTcm 
then, the total load that is to be assigned in the next installment denoted as L2, is given by 
L2 = (1 - L1). (23) 
Therefore, each processor is assigned L2/m amount of load in the second installment. Hence, 
when (22) is satisfied, the entire load is distributed in just two installments and the processing 
time for this strategy is same as (17). Since the processors tart their computation and finish 
computing the entire load at the same time, this distribution is optimal. Following the same line 
of arguments in Theorem 1, it is straightforward to prove the optimality of the strategy shown 
in Figure 3b. We omit the details. 
However, when (22) is violated, we distribute the load in more than two installments as shown 
in Figure 4. It can be seen from Figure 4, in the first installment we distribute L1 amount 
of the total load (L1 is given by (21)). Since (22) is violated, in the second installment, we 
distribute L2 amount of load that is equal to the computation time of the load assigned in the 
first installment for all the processors. That is, 
wTcp (24) 
L2 = L1 mzTcm "
This process is repeated until the computation time of the processors exceeds the total commu- 
nication time zT~m. In the subsequent analysis, we derive the number of installments required to 
meet this requirement. Hence, from the timing diagram shown in Figure 4, we obtain the total 
amount of load that is assigned to all the processors in the jth installment, denoted as Lj, is 
given by 
L wTcp L j= j_ l  m--~,  , j = 2 ,3 , . . . , k -  1, (25) 
66 V.  BHARADWAJ et al. 
and the individual oad fractions to each processor is given by 
c~i j Lj = ~,  i = l, . . . , m, 
m 
Prom the timing diagram, we see that 
k-1  
t + ~ L~ wTc...._...~p > zTcm, 
m 
j -1  
j ---- 2 ,3 , . . . , k -  1. (26) 
(27) 
where Ly is given by (25). Expressing each Lj, j = 2 , . . . ,  k - 1, in terms of L1, we obtain 
Lj = L17 j - l ,  j -- 2, 3 , . . . ,  k - 1, (28) 
where "r = (wTcp/mzTcm). Substituting (28) in (27), we obtain 
k-1  
t + L--A1 E 7J- lwTcp > zTcm. (29) 
m 
j= l  
Hence, we have the following three cases depending on the value of 7. 
CASE 1. "/ > 1. 
In this case, simplifying (29), we obtain 
ln(1 + ((7 - 1)(zrom - 0 m/L lWro , ) ) )  (30) 
k -1  > ln7 " 
It may be noted that in (30), the expression on the RHS gives the number of installments to be 
used. However, since k is an integer variable, the RHS may not yield an integer value. Hence, 
we choose the value of (k - 1) to be 
In this case, if we use the number of installments given by (31), then the computation time of 
the load in the (k - 1) th installment will end later than the total communication time zTcm. 
CASE 2. "r < 1. 
Here, simplifying (29), we obtain 
( In(1 - ((1 - 7)(zTcm - t) m/L lwTc , ) ) )  (32) 
k -1  > ln7 " 
Here too, we choose the value of (k - 1) to be 
As in the previous case, if we use the number of installments given by (33) to distribute the load 
then, the computation time of the load in the (k - 1) th installment will end later than the total 
communication time zTcm. Further, from (33), it can be seen that for k - 1 to exist, the following 
additional condition must hold: 
(1 - 7) (zT~m - t) 
< 1, (34) 
7t 
which can be rewritten as 
t > zTcm(l - 7). (35) 
Scheduling Divisible Loads 67 
BCU 
PI 
em 
XzTcm (l-X)zTcm 
XwTcp/m (I-X)wTcp /m 
t (release time) 
Figure 5. Timing diagram for the heuristic load distribution strategy. 
Hence, for (k - 1) to exist in (33), t must satisfy (35). If (35) is violated then, we adopt the load 
distribution strategy shown in Figure 5. Here, the total load is distributed in two installments 
among all the processors. Let X be the fraction of the total load until time t such that it is divided 
equally into m parts and distributed among all the processors in the first installment. However, 
the computation of these load fractions by all the processors are allowed to start from a time t + 
and finish their computation exactly at time zTcm. Hence, in the second installment, the rest of 
the load is equally divided and distributed among all the processors before the computation of 
the load fractions in the first installment are completed. From the timing diagram, we obtain the 
following equations: 
t + 5 + XwTcp = zTcm, (36) 
m 
t 
X = zT~m" (37) 
From (36) and (37), we obtain 
= t(1 + 3'). (38) 
As mentioned earlier in the second installment, he rest of the load (1 -X )  is equally divided into 
m parts and distributed among all the processors in the system. The individual oad fractions to 
the processors in the second installment are given by (1 - X) /m,  where X is given by (37). The 
processing time can be obtained from Figure 5 as 
T(m,  2) = zTcm(1 + 3") - t3". (39) 
It is worth mentioning here that the strategy adopted for this case is a heuristic method. A 
number of other load distribution strategies seem to be plausible to decrease the processing time. 
A few of them are presented in Section 6. 
CASE 3. 3' = 1. 
Using (29), we see that 
k - l -  
( zT~m - t)  m 
LlWTcp 
(40) 
68 V. BHARADWAJ  et al. 
Here too, for integer values of (k - 1), we choose 
[(zTcm - t) m] 
k- l=  | -~-lw~ 7 ," (41) 
In all the above three cases, the load is distributed until the computation times of the processors 
from time t exceeds the total communication time. The rest of the load is equally divided and 
distributed among all the processors. Hence, in the k th installment each processor receives a load 
fraction given by 
1 -~- -1  ~ ~i,k = , i = 1 , . . . ,m.  (42) 
Hence, the processing time in all the three cases (except when (35) is violated when 7 < 1) can be 
obtained as follows. Using (28) and (21) in (26) for i = 1, we obtain al, j ,  for all j = 2, 3 , . . . ,  k - l ,  
and from (42), we obtain al,k. Also, depending on the value of 7, we use (31) or (33) or (41), 
respectively. Hence, the processing time is given by 
k 
T(m, k) = t + Z C~l,iWTcp. (43) 
i= l  
It may be noted that for the multi-installment s rategy described above, the proof for the op- 
timality of the load distribution strategy can be easily proved for all the cases except for the 
heuristic method proposed for the case when ~ < 1 and (35) is violated, by using the same line 
of arguments adopted in Theorem 1. We omit the details. 
Summarizing, when the release times of the processors are identical then, as a first step, 
condition (22) is verified. If (22) holds, then, we need at most two installments to distribute the 
entire load. However, if (22) fails to hold, we find the value of % When "~ _> 1, or when ~ < 1 
and (35) is satisfied, we use multi-installment s rategy to distribute the total load in an optimal 
manner. However, when V < 1 and (35) is violated, we use the heuristic strategy proposed above. 
Since (22) and (35) determine the choice of the strategy to be used, these equations erve as the 
boundaries that divide the ~-t space into various regions which are shown in Figure 6. Here, 
the regions (A), (B), and (C) indicate the feasible regions for various strategies depending on 
and t. In this figure, the line marked (a) is (22) with equality condition, and the line marked (b) 
is (35) with equality condition. Hence, depending upon the value of the parameter ~/and release 
time t, a suitable strategy can be chosen. 
t I
zTcm 
1.0 0 m_¥ 
Figure 6. Timing diagram: feasible regions for various strategies. (a) is condition (22) 
and (b) is (35). 
Scheduling Divisible Loads 69 
4.2. Non ident ica l  Case 
In this section, we propose an algorithm for the case when the release times of the processors 
are arbitrary. When the release times of the processors are arbitrary, it is quite possible that  
certain processors in the system need not be assigned any load because their release times could be 
so large that  the entire load can be processed without including these processors. The iterative 
algorithm proposed here makes use of the processors efficiently at every stage of its iteration. 
Also, the choice on the number of processors to be used is decided as and when the algorithm is 
executed. 
BCU 
Pl 
p: 
P 
m 
~ t 
i 
i 
i 
i 
t~ T(m) 
[ 
t 2 
I 
t m 
L 
Figure 7. Timing diagram: arbitrary release times; t l  < zTcm. 
Consider the timing diagram as shown in Figure 7. We distribute the load t l /ZT~,n to all the 
m processors in such a way that they all start computing at their respective release times and 
stop computing at the same instant in time. We denote the load fractions assigned to processors 
p l , .  • •, Pm as 41 , . . . ,  am, respectively. From Figure 7, we obtain the following recursive quations: 
aswTcp = (t~+l - ts) + as+lwT~p, i = 1 , . . . ,  m - 1. (44) 
Also, we have 
m 
a, zTcm = ti. (45) 
i=l 
Thus, we have m linear equation with m unknowns, which can be solved to obtain the individual 
load fractions. For this, we express each of the as, i = 1, . . .  ,m - 1 in terms of am as 
( tm - 
a~ = am + - -  (46) 
w Tcp 
Now using (45), we obtain the value of am as 
t l / zT~m - [(m - 1)tin - (tl -[- • • • -~- tin--l)]/wTcp 
am = (47)  
m 
Substituting (47) in (46), we obtain the individual load fractions as 
t l t zT~m - [(m - 1)tin - (tl + . . .  + tm-1)] IwT~p tm-  ti 
as= m + wT~----~' i=1  . . . . .  m-1 .  (48) 
Hence, for the load fraction as > 0, i -- 1 , . . . ,  m, using (48), we observe that  the release time 
of Pi must satisfy the following condition: 
( t lWTcp/zTcm 4- ~-~im=l ti) 
tj < (49) 
m 
70 V. BHARADWAJ et al. 
The finish time, denoted as T(m) ,  can be obtained from the timing diagram as 
T(m)  = tl + VtlWTcp, (50) 
where a l  is obtained from (48) by substituting i = 1. Note that this finish time T(m)  above 
corresponds to the computation of the load till time tl. We now present some important results 
which play a crucial role in determining the number of processors to be used at every stage of 
iteration. 
LEMMA 1. In the load distribution strategy adopted above, if for some j E 1 , . . . ,  m, aj  < O, 
then aj+k <_ O, for all k = 1 , . . . ,  m - j ,  where aj  is given by (48). 
PROOF. From (48), it can be easily shown that when aj  < 0, then 
tj > (tlwT~p/zTcm + Eim=l ti) (51) 
m 
Since tj+k > tj, for all k = 1 , . . . ,m- j ,  
+ E =I ,) (52) tj+k > (tlWTcp/zTcm m t 
m 
Simplifying the above expression and comparing with (48), we conclude that aj+k < O, for all 
k = 1 , . . . ,  m - j .  Hence the proof. | 
The significance of the lemma is as follows. In the above load distribution methodology, it
has been implicitly assumed that the fraction of the total load t l /ZTcm, has been assigned to all 
the m processors in such a way that all the processors start at their respective release times and 
finish their computation at the same instant in time. However, in general, all the processors need 
not be assigned any load from t l /ZTcm. The lemma shows that if any of the aj ,  j E 1 , . . . ,  m - 1 
has a value less than or equal to zero, then all the successive aj+k k = 1, . . . ,  m - j  also assume 
zero or negative values. This means that the load t l /ZTcm can be redistributed in such a way 
that it can computed well within or in time tj with (j - 1) processors alone. We shall show this 
in the following lemma. 
LEMMA 2. /fOtj < 0 /or some j E {1, . . . ,m},  then the load t l /ZTcm can be processed using 
processors P l , . . .  ,Pj-1 in a time less than or equal to the release time of p3. 
PROOF. We need to show that T( j  - 1) < tj, where T( j  - 1) is the processing time with 
(j - 1) processors and tj is the release time of the processor pC. Following similar steps as shown 
for the m processor case (Figure 7), the finish time expression with (j - 1) processors i given by 
t lWT~ (j - 2)tj_l (tl + . . .  + t j -2) 
T( j  - 1) = tl + (j _ 1)zTcm (j - 1) + (j - 1) + (tj-1 - t l ) .  (53) 
We shall prove this lemma by contradiction. Suppose T( j  - 1) > tj, then 
t l wTcp 
zTcm + (tl +""  + t j-1) > (j - 1)tj. (54) 
Now adding (tj +- . -  + tin) on both sides and dividing by m, we obtain 
(tlWTcv/zTcm + ~-]~im= t ti) j t j  + . . .  + tm 
> > t j .  (55) 
m m 
Comparing (55) and (51), we see a clear contradiction, thus proving the lemma. | 
Now, at this juncture we observe that all the processors taking part in the computation of the 
load till time tl finish their computations at time instant given by (53), for some j E {1, . . . ,  m}. 
Scheduling Divisible Loads 71 
Thus, this finish time of the processors become the release times of the processors for processing 
the rest of the load. Thus, we can apply the the techniques adopted for the identical case 
hereafter. Also, all the processors in the system will take part in the computation of the rest of 
the load. 
As a special case, we consider the scenario in which tl > zTcm. For this case, the timing diagram 
is as shown in Figure 8. From this diagram, the load fraction ai assigned to Pi, i = 1, . . .  ,m - 1 
is same as (46). Also, we have the normalizing equation, given by 
?n 
E a j  = 1. (56) 
j= l  
Thus, we have rn linear equations with m unknowns, which can be solved to obtain the individual 
load fractions. As explained in the above derivation, we express each c~i i = 1 , . . . ,  m - 1 in terms 
of am, and using (56), we obtain the value of am as 
1 - [(m - 1)tin - (tl +""  + t in - l ) ] /wTcp  
am = (57) 
m 
Now, substituting in (46), we obtain the individual oad fractions as 
1 - [(m - 1)t,~ - (tl + . . -  + tin-l)]/wTcp tm-  ti 
ai  : m + wTc----~-' i = 1 , . . . ,  m - 1. (58) 
Hence, the expression for the processing time of the load can be obtained from (50), where Ot 1 is 
given by (58). In this case, it may be noted that for (~i > 0, the release time must satisfy the 
following condition: 
m t 
ti < E j= I  J + wTcp (59) 
m 
This condition is obtained by simplifying (58). As proved in Lemma 1, we choose the number of 
processors by using (59). It may be noted that, since the total communication time of the entire 
load is less than or equal to the release time of the processor Pl, we distribute the load in a single 
installment. It is straightforward to verify that the results of Lemmas 1 and 2 also hold for this 
case. Hence, if r processors take part in the computation of the load (after applying (59)), then 
using the result of Lemma 2, it can be shown that the processing time will be less than or equal 
to the release time of Pr+l- Thus, using (59), an optimal choice on the number of processors can 
be made. The following theorem proves the optimality of the strategy shown in Figure 8. 
BCU 
Pl 
Pm 
zTcm ] 3. t 
t 1 T(m) 
~1 wTcp 
t 2 
a 2 wTcp 
tj 
ctj wTcp 
t m 
(7 m wTcp j 
F igure  8. T iming  d iagram:  arb i t ra ry  re lease t imes;  t l  _> zTcm. 
72 v. BHARADWAJ et al. 
THEOREM 2. In a bus network, let the re/ease times of the processors be arbitrary, and let 
tl >_ zTcm. Let (59) be satisfied by P l , . . .  ,Pr. Then, the load distribution strategy is optimal 
when the load distribution is such that all the r processors tart computing their respective load 
fractions from their release times and stop computing at the same time. 
PROOF. It is straightforward to show that any other possible load distribution will yield greater 
processing time by contradicting (56) for that load distribution. We omit details. | 
It is worth mentioning here that when tl < zTcm and also when ti = 0, i = 1 , . . . ,  r, then 
it is obvious that these r processors automatically qualify for the first installment. Also, in 
this case, after using (49) to choose any additional processor, the load until the release time 
of Pr+l is distributed among the processors that take part in the computation. This aspect has 
been implicitly assumed in the proposed algorithm. The complete algorithm is presented in the 
Appendix. In the following section, we shall demonstrate different cases using the algorithm. 
5. I LLUSTRAT IVE  EXAMPLES 
Now we demonstrate the algorithm presented in the Appendix by means of some illustrative 
examples. We use the results of the identical case algorithm and the results of the above lemmas 
at appropriate iteration of the algorithm. As mentioned earlier, without loss of generality, we 
shall assume that the load distribution follows the order P l , . . . ,  Pm and ti _< ti+l, i -- 1 , . . . ,  m-  1. 
EXAMPLE 1. (Identical release times; two-installment case; Region A in Figure 6.) Consider a 
bus network consisting of three processors. Let the processor speed parameter w -- 0.2, the link 
speed parameter be z = 0.1, and let Tom = Ten = 1.0. Also, let the release times of the processors 
be identical and is given by t = 0.06. Hence, (22) is satisfied and we distribute the load L, 
given by (21) among all the three processors. The individual oad fractions are 0.2, respectively. 
Since (22) is satisfied, the rest of the total load can be distributed in another installment. Hence, 
in the second installment, the individual oad fractions are given by 0.133. Hence, the optimal 
processing time can be obtained from (43) and is given by 0.1267. 
As explained in Section 4, it may be observed that the computation time of the processors in 
the first installment exactly equals the total communication time. Hence, in this computation 
time, the rest of the load is distributed to all the three processors equally such that, they all stop 
computing at the same instant in time. Clearly, this assignment is optimal. 
EXAMPLE 2. (Identical release times; multi-installment s rategy; "y < 1; Region B in Figure 6.) 
In the above network, let the release times of the processors be t = 0.05. In this case, it can 
be readily verified that (22) is violated; further, the value of "y = 0.67. Since "r < 1, we need to 
verify (35). Using (35), we find that it is satisfied. Hence, we use (33) to find the number of 
installments needed for the computation time to exceed the total communication time. In this 
case, from (33), we find k - 1 = 2. Hence, the individual oad fractions in first and the second 
installments are 0.167 and 0.1167, respectively. These are obtained by using (26). Hence, the 
remaining amount of the load is distributed equally among all the three processors in the third 
installment (it may be noted that the total number of installments required is k = 3). This 
load fraction is given by 0.0496, obtained from (42). Hence, the processing time given by (43) 
is 0.1167. Clearly, this processing time is optimal. 
EXAMPLE 3. (Identical release times; multi-installment s rategy; ~, -- 1; Region B in Figure 6.) 
In the above network, let there be only two processors, i.e., m -- 2. In this case, the value of the 
parameter "), = 1. Also, let the release times of the processors be t = 0.045. It can be readily 
verified that (22) is violated. Since the value of ~ = 1, we use (41) to obtain k - 1, as in the 
previous example. In this case, k - 1 = 2. Hence, in the first and in the second installment, 
the individual oad fractions are given by 0.225 and 0.225, respectively. These values are again 
obtained by using (26). Now, using (42), we obtain the individual load fraction in the third 
installment as 0.05. Hence, the optimal processing time is given by 0.145, obtained from (43). 
Scheduling Divisible Loads 73 
EXAMPLE 4. (Arbitrary release times.) In the network used in Example 1, let w = 0.4, z = 0.2, 
Tc,n = Top = 1.0. Let the release times of the processors Pl, P2, and P3 be tl = 0.06, t2 = 0.1, 
and t3 -- 0.15, respectively. Since the available times are nonidentical, we follow the algorithm 
described in Section 4. 
In this case, the algorithm starts from Step 3. As described, we derive the number of processors 
that qualify to share the load tl/zTcm using (49). We find that Pl and p~ alone qualify. Hence, 
we distribute this toad among these two processors. The finish time of the processors is given 
by 0.14, and the individual load fraction are given by 0.2 and 0.1, respectively. We reset the 
variables t and ti, i = 1,2 as t = tl and ti = 0.14, i -- 1, 2. Since the first "While loop" condition 
is satisfied, we again choose the number of processors and we see that P3 also qualifies. Again, 
we observe that the release times of all the three processors are nonidentical, and hence, we use 
the strategy described in Figure 7 to distribute the load in the time interval 0.06 to 0.14 among 
the three processors. The individual oad fractions are given by 0.14167 to Pl and P2, and 0.1167 
to P3. In this case, all the processors top computing at the time instant 0.19667. Since all the 
processors are utilized now (r = 3), we use the identical case algorithm and distribute the rest of 
the load as follows. As described in the identical case algorithm, we see that (22) is satisfied, and 
hence, we distribute the remaining load in just two installments. The processing time is given 
by 0.23665. It may be observed that the choice of distributing the load in two installments is 
also justifiable from Figure 6, as the conditions atisfy Region A in the figure. 
In the above example, the speed parameters of the processors and the links are chosen in 
such a way that the use of heuristic algorithm is avoided. However, using w = 0.2, z = 0.3, 
Tcm = Top = 1.0, m = 3 and with the release times as tl = 0.05, t2 = 0.1, and t3 -- 0.2, it may 
be verified that after two installments, heuristics are needed to carry out the processing for the 
rest of the load. We omit the details. 
6. D ISCUSSION OF  THE RESULTS 
In this section, we shall discuss some important points of the proposed algorithm. 
If the processors in the system are idle at the time of arrival of the processing load, then we 
can immediately use the idle case algorithm shown in Figure 2. Here, since the parameter n, the 
number of installments i  software tunable, by choosing sufficiently large n, we can reduce the 
overall processing time to a greater extent, and the load distribution is optimal [17]. However, in 
a practical scenario, the extent o which the load can be divided plays a crucial role. Hence, even 
in this case, the value of n is chosen to satisfy this divisibility requirements. On the other hand, 
if the processors in the system are engaged in their own computational work when the processing 
load arrives at the system, then the load distribution strategy described for idle case cannot 
yield optimal solutions for obvious reasons. The algorithm proposed in this paper describes 
a methodology to distribute the load among the processors depending upon the release time 
distribution, i.e., whether these times are identical or not. When the release times are identical 
and are greater than or equal to the total communication time, the optimal processing time 
(Theorem 1) is given by (17). However, when the release times are identical and less than the 
total communication time, then we have proposed a multi-installment s rategy for distributing 
the load in an optimal manner. When condition (22) is satisfied, it has been shown that the 
entire load can be distributed in just two installments, and the corresponding load distribution 
is optimal. However, when (22) is not satisfied, it has been shown that the processing load is 
distributed in more than two installments depending on the value of % ~/ > 1, or "r < 1 and 
(35) is satisfied. When ~/< 1 and (35) fails to hold, different methods eem plausible. The choice 
of these methods depends on many issues. For example, the simplest of all these methods has 
been suggested in Case 2 for the identical case. Here, the computation of all processors start at 
a time instant t + ~f instead of at time t. In this case, the processors have to remain idle for an 
CAFIWA 3Z°i-D 
74 V. BHARADWAJ et al. 
additional 6 units of time from their release times t. However, if this idle time is significantly 
small, one can use this method without much delay in the processing time. 
Another possible load distribution strategy is to use the idle case algorithm itself by choosing 
an appropriate number of installments. In other words, the choice of n must be in such a way 
that the idle times of all the processors after their release times are minimum. However, the use 
of idle case algorithm at any intermediate stage of the proposed algorithm for arbitrary release 
time case may not be possible because of the difficulty in obtaining a suitable n. Hence, it is 
seems the heuristic method proposed seems to be better in performance and also due to its simple 
nature. 
When the processor release times are arbitrary, a condition for choosing the number of proces- 
sors has been derived. The use of this condition to determine the required number of processors 
is demonstrated in Example 4. In Example 4, it may be observed that the processor P3 is not 
utilized until the second installment. This shows that the processors Pl and P2 are utilized in 
an optimal manner before P3 is admitted. The algorithm proposed in this paper assigns a load 
fraction to a processor only when condition (49) is satisfied. When this condition is violated by 
a processor Pi, it will not be assigned any load. Also, we have shown in Lemma 1 that when 
pi violates condition (49), all the successors of Pi also violate this condition, and hence, we need 
not be assign any load fraction in that particular installment for those processors. Also, Lemma 2 
shows that if j processors qualify for a particular installment, hen the corresponding amount of 
load for that installment can be processed by j processors in a time no greater than the release 
time of pj+l. Thus, each processor in the system when admitted to share the processing load is 
utilized to a maximum possible extent. 
One of the most intriguing questions to be answered is the optimality of the algorithm. As it 
can be seen from the above discussion for the case when ~/< 1 and when (35) is violated, different 
methods eem to be possible. Hence, the choice of a particular method is questionable. Hence, at 
this stage it appears that the proposed algorithm need not be optimal when the parameter "y< 1 
and condition (35) is violated. However, for the identical case, for all other conditions except 
the one mentioned above, the solution provided is optimal. At this juncture, it remains open to 
provide an optimal oad distribution strategy when "y < 1 and when (35) is violated (Region C in 
Figure 6). Once this is provided, a rigorous proof for optimality of the algorithm for nonidentical 
case can be attempted. 
7. CONCLUSIONS 
In this paper, an algorithm for scheduling divisible loads on bus networks was presented. All the 
processors in the system are assumed to be equipped with front ends. The proposed algorithm 
takes into account the arbitrary release times of the processors in the system and produces 
a schedule that minimizes the processing time of the load. For certain cases of release time 
distributions, optimality is assured. This is the first time in divisible load theory, an analysis has 
been provided when the processor release times are considered in the problem formulation. When 
the processors are available from the time at which load arrives at the system, a multi-installment 
strategy adopted earlier in the literature [17] is applied to obtain the optimal processing time. 
When the release times are nonzero and identical, closed-form expressions for the processing 
time have been derived for certain cases and for these cases, the proposed algorithm is optimal. 
When the release times are arbitrary, referred to as nonidentical, an important condition to 
determine the required number of processors was derived. Using this condition and the the 
strategies for identical case, a general algorithm was proposed for the nonidentical case. Several 
numerical examples were presented to demonstrate the operation of this algorithm. Finally, 
some important points have been highlighted in Section 6. We have suggested ifferent load 
distribution strategies for a particular case (Case 2) of the identical case algorithm. At this stage, 
these methods for this particular case do not warrant the optimality of the processing time. It 
Scheduling Divisible Loads 75 
would be beneficial to explore various other strategies for this case which perform superior to the 
strategies presented in this paper. In fact, once an optimal load distribution strategy has been 
found out for this particular case (Case 2), a rigorous proof for optimality for the nonidentical 
case can be attempted. Also, it would be of interest o study the performance of the proposed 
algorithm when the processors in the system are not equipped with front ends and also when 
BCU takes part in the processing of the load. 
We now present he algorithm for processing divisible loads when the processor elease times 
are arbitrary. Different cases (idle and nonidle) of processor elease times have been explained 
clearly. 
Step 1. If the release times of the processors are exactly equal to the optimal start-up times 
of the processors for some n described for the idle case, then the optimal processing 
time is T(m, n), Exit; 
Step 2. If ti = t for all i -- 1 , . . . ,  m, then use the identical case algorithm. Let r = m. Exit; 
Step 3. While (tl < zTcm) 
{ 
Derive the number of processors r using (49); 
If ti, i -- 1 , . . . ,  r are identical, then use the identical case algorithm; 
If ti, i = 1 , . . . ,  r are arbitrary, then use the strategy described in Figure 7; 
Reset t = tl, t~ = T(r),  i = 1 , . . . , r  
} 
Step 4. While (tl >_ zTcm) 
{ 
If ti, i = 1 , . . . ,  m are identical, then use the identical case algorithm, Exit; 
else 
Derive the number of processors r using (59); 
Use the load distribution strategy described in Figure 8; Exit; 
} 
ALGORITHM: IDENTICAL CASE. 
If (r = m) 
{ 
If t > zTcm, then the optimal processing time is given by (17). Exit; 
I f  t < zT~m, then verify (22). If this condition is satisfied, the two installment strategy is applied 
to obtain the optimal processing time. Exit; 
If t < zTcm and (22) is violated, then depending on the value of the parameter ~, use one of the 
following Cases 1 to 4; 
} 
If(r <: m) 
{ 
Verify (22) with respect o tr+l. If it is satisfied then, choose the required fraction of the load 
until ti, i = 1 , . . . ,  r and distribute it among r processors in such away that the finish time is tr+l; 
Determine the time of communication, to, of the load from time tl 
Reset t = to, t~ = t r+ l ,  for all i = 1,. . .  ,r; 
If (22) is violated, then use one of the following Cases 1 to 4; 
} 
CASE 1. ~':> 1 
(a) Use (31) to determine the number of installments (k - 1). 
(b) Carry-out he load distribution strategy as explained for identical case until k - 2 install- 
ments. For (k - 1) th installment, distribute the load in such a way that the computation 
APPENDIX  
ALGORITHM. 
Let t = 0 
76 V. BHARADWAJ et al. 
t ime of all the r processors tops at t ime tr+l ;  Determine the t ime, tc, upto which the 
load has been communicated by the BCU. 
(c) Reset t = to, ti = t r+l ,  i = 1 , . . . , r ;  
CASE 2. 7 < 1 and Condit ion (35) is satisfied. 
(a) Use (33) to determine the number of instal lments (k - 1). 
(b) Carry  out  the load distr ibut ion strategy as explained for the identical case until  k -  2 
instal lments.  For (k - 1) th instal lment,  d istr ibute the load in such a way that  the compu- 
tat ion t ime of all the r processors tops at t ime t~+l; Determine the t ime, tc, upto which 
the load has been communicated by the BCU. 
(c) Reset t = t l ,  ti = t~+l, i = 1 , . . . , r  + 1; 
CASE 3. ")' < 1 and Condit ion (35) is violated 
If  (r < m), we do the following. 
(a) We use the heuristic a lgor i thm proposed earl ier in such a way that  the finish t ime of the 
processors are t r+l ;  Determine the t ime, to, upto which the load has been communicated 
by the BCU. 
(b) Reset t = to, t~ = t~+l, i = 1 , . . . , r  + 1; 
If (r  = m),  we use the heuristic algor ithm to distr ibute the entire load in two instal lments,  
descr ibed earlier. 
CASE 4. V = 1 
(a) Use (41) to determine the number of instal lments (k - 1). 
(b) Carry  out the load distr ibut ion strategy described for the identical case until  k - 2 install- 
ments. For (k - 1) th instal lment,  d istr ibute the load in such a way that  the computat ion  
t ime of all the r processors tops at t ime tr+l;  Determine the t ime tc upto which the load 
has been communicated by the BCU. 
(c) Reset t -- to, t i  = t r+l ,  i = 1 , . . . , r .  
REFERENCES 
1. M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness, 
W.H. Freeman, New York, (1979). 
2. C.-H. Lee, D. Lee and M. Kim, Optimal task assignment in linear array networks, IEEE Transactions on 
Computers 41,877-880 (1992). 
3. C.S.R. Murthy and S. Selvakumar, Scheduling parallel programs for execution on multiprocessors, In Com- 
puting and Intelligent Systems, pp. 91-115, Tata McGraw-Hill, (1993). 
4. J. Xu, Multiprocessor scheduling of processors with release times, deadlines, precedence, and exclusion 
relations, IEEE Transactions on Software Engineering 19, 139-154 (1993). 
5. M. Veldhorst, A linear time algorithm to schedule trees with communication delays optimally on two 
machines, Technical Report RUU-CS-93-04, (January 1993). 
6. V. Bharadwaj, Distributed computation with communication delays: Design and analysis of load distribution 
strategies, Ph.D. Thesis, Faculty of Engineering, Indian Institute of Science, Bangalore, India, (June 1994). 
7. Y.C. Cheng and T.G. Robertazzi, Distributed computation with communication delays, IEEE Transactions 
on Aerospace and Electronic Systems 24, 700-712 (1988). 
8. T.G. t~bertazzi, Processor equivalence for a linear daisy chain of load sharing processors, IEEE Transactions 
on Aerospace and Electronic Systems 29, 1216-1221 (1993). 
9. H.J. Kim, G.-I. Joe and J.G. Lee, Optimal oad distribution for tree network processors, (Preprint), (1993). 
10. H.J. Kim, M.A. Iqbal, H. Park and V.K. Prasanna, Optimal configuration of host-satellite system for load 
distribution, (Preprint), (1993). 
11. V. Bharadwaj, D. Ghose and V. Mani, Optimal sequencing and arrangement in distributed single-level 
networks with communication delays, IEEE Transactions on Parallel and Distributed Systems 5, 968-976 
(1994). 
12. J. Sohn and T.G. Rebertazzi, Optimal divisible job load sharing on bus networks, IEEE Transactions on 
Aerospace and Electronic Systems 1 (1996). 
13. J. Sohn and T.C. Robertazzi, Optimal oad sharing for divisible job on bus networks, In Proceedings of the 
1993 Conf. on In/ormation Science and Systems, Volume 697, (August 1994). 
14. S. Bataineh and T.G. Robertazzi, Ultimate performance limits for networks of load sharing processors, 
Proceedings of the Conference on Information Sciences and Systems, March, 1992, pp. 794-799, Priceton 
University, Princeton, NJ. 
Scheduling Divisible Loads 77 
15. D. Ghose and V. Mani, Distributed computation with communication delays: Asymptotic perfromance 
analysis, Journal of Parallel and Distributed Computing 23, 293-305 (1994). 
16. V. Mani and D. Ghose, Distributed computation i  linear networks: Closed-form solutions, IEEE Transac- 
tions on Aerospace and Electronic Systems 30, 471-483 (1994). 
17. V. Bharadwaj, D. Ghose and V. Mani, Multi-installment load distribution in tree networks with delays, 
IEEE Transactions on Aerospace and Electronic Systems 31 (1995) (to appear). 
18. V. Bharadwaj, D. Ghose and V. Mani, Multi-installment load distribution strategy in linear networks with 
communication delays, Presented at the I st International Workshop on Parallel Processing, Bangalore, 
India, December 26-29, 1994. 
19. J. Sohn and T.G. Robertazzi, A multi-job load sharing strategy for divisible jobs on bus networks, CEAS 
Technical Report 665, State University of New York at Stony Brook, (April 1993). 
20. S. Bataineh and M. Al-Ibrahim, Effect of fault-tolerance and communication delay on response time in a 
multiprocessor system with a bus topology, Computer Communications 17, 843-851 (1994). 
21. J. Sohn and T.G. Robertazzi, An optimum load sharing strategy for divisible jobs with time-varying processor 
speed and channel speed, CEAS Technical Report 706, State University of New York at Stony Brook, 
(January 1995). 
22. V. Bharadwaj, D. Ghose and V. Mani, A study of optimality conditions for load distribution in tree networks 
with communication delays, Technical Report 423/GI/02-92, Department of Aerospace Engineering, Indian 
Institute of Science, Bangalore, (December 1992). 
23. S. Bataineh and B. AI-Asir, An efficient scheduling algorithm for divisible and indivisible tasks in loosely 
coupled multiprocessor systems, Software Engineering Journal (to appear). 
24. D.P. Bertsekas and J.N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods, Prentice 
Hall, Englewood Cliffs, N J, (1989). 
