An efficient load distribution strategy for a distributed linear network of processors with communication delays  by Bharadwaj, V. et al.
Pergamon 
Computers Math. Applic. Vol. 29, No. 9, pp. 95-112, 1995 
Copyright©1995 Elsevier Science Ltd 
Printed in Great Britain. All rights reserved 
0898-1221/95 $9.50 + 0.00 
0898-1221(9$)00039-9 
An Efficient Load D is t r ibut ion  St ra tegy  
for a D is t r ibuted  Linear Network  of P rocessors  
w i th  Communicat ion  De lays  
V.  BHARADWAJ, D.  GHOSE AND V.  MANI  
Department of Aerospace Engineering 
Indian Institute of Science, Bangalore, 560012, India 
(Received August 1993; accepted September 1993) 
Abst rac t - - In  this paper, we present an improved load distribution strategy, for arbitrarily divisi- 
ble processing loads, to minimize the processing time in a distributed linear network of communicating 
processors by an efficient utilization of their front-ends. Closed-form solutions are derived, with the 
processing load originating at the boundary and at the interior of the network, under some important 
conditions on the arrangement of processors and links in the network. Asymptotic analysis is carried 
out to explore the ultimate performance limits of such networks. Two important theorems are stated 
regarding the optimal oad sequence and the optimal oad origination point. Comparative study of 
this new strategy with an earlier strategy is also presented. 
Keywords--Communication delays, Distributed processing, Linear networks, Load distribution, 
Scheduling problems. 
1. INTRODUCTION 
The problem of obtaining an optimal oad distribution in distributed computing systems has been 
of interest in recent years. Conventionally, these processing loads are indivisible in the sense that 
they cannot be further divided and have to be processed in their entirety. Much of the work is 
done in this area of research under the common appellation--task cheduling or load sharing in 
multiprocessors. Usually these use a queueing-theoretic approach; some of the recent results in 
this area are available in [1-14]. However, in many applications we encounter loads which are 
arbitrarily divisible in the sense that the load can be divided into infinitesimally small fractions 
and distributed. Such types of loads are characterized by their large volume, in which each data 
element can be processed independently and requires exactly the same type of processing. Unlike 
the case of indivisible loads, here we do not have data dependency constraints due to precedence 
relations [7-10]. It is this which makes the problem of optimal oad distribution in such systems 
analytically tractable. All these aspects fall under the aegis of what has been termed as 'divisible 
job theory.' Research in this area has been initiated only recently [15-20]. Applications of this 
theory include processing of loads in a radar tracking system or in an image processing system, 
or the processing of massive xperimental data. This paper is a contribution in this direction. 
In this paper, we consider a distributed linear network of communicating processors computing 
an arbitrarily divisible processing load. A load transfer in the communication network is assumed 
to be subject to a deterministic delay which is proportional to the amount of load. It is worth 
mentioning that the effect of communication delay has also been considered in the case of indi- 
visible loads [7,8,11-14]. One of the major problems here is to obtain efficient load distribution 
Typeset by .4~-TEX 
95 
96 v. BHARADWAJ et al. 
strategies for minimization of the processing time. This aspect in the context of divisible job 
theory has been discussed earlier in the literature in [15-20]. Cheng and Robertazzi [15] first 
formulated this load distribution problem in a linear network and presented a computational 
algorithm for Obtaining optimal oad distribution. Mani and Ghose [16] derived closed-form so- 
lutions for the same problem, in the case when the network consists of identical processors and 
identical links. A result regarding the optimal sequence of load distribution, when the load origi- 
nates in the interior of the network, was also proved analytically. Ghose and Mani [17] extended 
these results to an asymptotic performance analysis of the network for a large number of proces- 
sors and decreasing communication delays, and discussed the relevant rade-off issues. Some of 
these asymptotic performance r sults were also obtained by Bataineh and Robertazzi [18] using 
a different approch through the processor-equivaience concept. These ideas have been applied to 
tree networks too [19,20]. 
In all the above studies, it is seen that the communication delay plays a very significant role in 
degrading the expected processing time performance of a linear distributed computing network. 
This is so even when the processors are equipped with front-ends which perform the duty of 
receiving and transmitting processing loads while the processors themselves are engaged only in 
computing. The load distribution strategy followed in the above papers [15-18] do not utilize 
the front-ends efficiently. One of the reasons for this is that each processor begins computing 
its load only after its front-end has received all the load that passes through it. This load 
distribution strategy keeps the processor waiting for an unnecessarily ong time. In this paper, 
we propose a new strategy in which the front-ends of the processors are better utilized than 
before in distributing the load. The front-end of the load-originating processor divides the load 
into smaller fractions and communicates them to its successor one at a time. Each processor 
begins computing as soon as its front-end receives its own load fraction, and continues to do 
so while the front-end receives and transmits ubsequent load fractions to its successors. It is 
shown that this new strategy gives an improvement in performance when compared to the earlier 
strategies. 
2. MOTIVAT ION AND DEF IN IT IONS 
In this section, using a simple example we show that the processing time can be further 
reduced, compared to the earlier strategy proposed in [15], by following the new strategy. Next, 
we present some definitions and state some rules for load distribution which form the basis for 
the new strategy. 
2.1. Mot ivat ion for the New Strategy 
Consider a linear network of three processors P0, Pl, P2 equipped with front-ends, connected 
via communication links gl, 12. The total processing load arrives at the processor P0, which has 
to be shared in an optimal manner such that the processing time is minimum. Using the load 
distribution strategy given in [15], the timing diagram given in Figure la is obtained. It can 
be seen that the processors Pl and P2 remain idle for 45.5% and 63.6% of the processing time. 
Figure lb shows the timing diagram for the new strategy in which the processor P0 distributes 
the load fractions to the processors Pl and P2 one at a time, and Pl begins computing as soon 
as it has received its own load fraction. In this case, we see there is a 4.76% reduction in the 
processing time over the earlier strategy. However, this strategy, when applied to a network of 
processors not equipped with front-ends, does not improve the performance. This is discussed 
later in Section 5. Hence, in this paper, we deal only with processors equipped with front-ends. 
2.2. Def in i t ions and Rules for Load Distr ibut ion 
A linear network of (m + 1) processors PO,P l , . . .  ,Pro connected via communication links 
~1,g2,... ,gin is shown in Figure 2a. If the load originates at the processor P0 (situated at one 
Efficient Load Distribution Strategy 97 
po 
Z1 
Z 2 
r~ 
(1-o(o) zl"r,.m [ 
(~oWoTcp u 0.5238 
] 
i i 
~ I ~  Tcp= 0.2857 
~2 W2Tcp = 0.1905 ] 
Po 
z1 
z 2 
~1 ZlTcm ~ q  
O~oWoTcp • 0 .5  
i 
O(.lWlTcp = 0.3333 
O~2W2Tep = 0.1667 [
W 0 = W 1 -- W2 -- 1.0, Z] = Z2 ---- 1.0 
Tern ---- 0.5, Tcp= 1.0 
Waiting time: 
First processor ---- 0.2381 
Second processor =0.3333 
(a) Earlier strategy. 
W0=W1 =W2=1.0,  Zz =Z2=l .0  
Tcm:  0.5, Tcp= 1.0 
Waiting time: 
First processor ---- 0.1667 
Second processor = 0.3333 
(b) New strategy. 
Figure 1. Motivating example. 
extreme end of the network), we refer to this case as the 'boundary case.' On the other hand, 
if the load originates at some interior processor, as shown in Figure 2b, we refer to this as the 
'interior case.' For the interior case, the processors and links on either side of the load origina- 
tion processor (Po) are redenoted as PI ,P2, ' "  ,P~L and g~1,l~2,... ,g~L for the left hand side, and 
Pl,P2, ,PR and ~,~2, . .  gr . . . .  , R for the right hand side, such that  (R + L + 1) is the total number 
of processors in the network. The following definitions are used in the rest of the paper. 
(I) LOAD DISTRIBUTION. This is defined as the fractions of the total processing load assigned 
to each processor in the network. 
(If) PROCESSING TIME. For the boundary case, this is denoted by T(m) and is defined as 
T(m) ---- max(To, T1 , . . . ,  Tin), (1) 
where Tk is the time difference between the time instant at which the k th processor stops process- 
ing and the time instant at which the root processor Po initiates the process. For the interior 
case, this is denoted by r(R, L) and defined as 
T(R, L) = max (To, T [ , . . . ,  TL ~ , T [ , . . . ,  T~) ,  (2) 
with T[ and T~ similarly defined. 
(III) SEQUENCE. In the interior case, the root processor Po may first distribute the load to the 
processors on the right hand side and then distribute the load to the processors on the left hand 
side or vice versa. Thus, there are two sequences of load distribution. 
(IV) OPTIMAL LOAD SEQUENCE. In the interior case, this is defined as that sequence (of the 
two possible sequences) of load distribution for which the processing time is minimum. 
The other parameters used in this paper are z and w, which are inversely proportional to 
the speed of the links and processors, respectively. The parameters Tom and Tcp denote the 
communicat ion time and computat ion time of a standard link and processor, respectively, for the 
entire load. Note that  for a standard link and for a standard processor, z -- 1 and w = 1. 
98 V. BHARADWAJ et al. 
Processing Load 
(a) Boundary case. 
Processing Load 
(b) Interior case. 
Figure 2. Linear network architecture. 
In a given linear network, the new load distribution strategy is formally defined through the 
following rules. 
(i) The front-end of the root processor divides the total load into a number of fractions (the 
number being equal to the number of processors), keeps its own fraction for computation, 
and sends the other fractions one at a time. 
(ii) All the processors must perform computation continuously till the end. 
(iii) A processor starts computing its load fraction as soon as its front-end finishes receiving it. 
(iv) A front-end starts transmitting as soon as it has received the load from its predecessor 
and its successor is free. 
(v) At any given instant in time, a front-end can either receive or transmit the processing 
load, but not both. 
(vi) A processor and its front-end can perform computation and communication simultane- 
ously. 
As in previous studies [15-20], we assume that all processors top computing at the same 
instant in time. Intuitively, this helps in an efficient utilization of processors. We shall use this 
as a basis for obtaining load distribution among the processors in the network. 
3. LOAD ORIG INAT ION AT  THE BOUNDARY 
In this section, we use the above-mentioned rules to obtain the timing diagram for load dis- 
tribution in the boundary case (Figure 2a). We also obtain the recursive equations and their 
closed-form solution. Using this closed-form expression, an asymptotic analysis on the processing 
time performance of the network is carried out next. 
3.1. C losed-Form So lut ion  
In this case, the root processor P0 divides the total load into (re+l) parts, namely a0, o~1, • •., am. 
The root processor keeps the fraction a0 for itself. It first transmits the fraction al to Pl 
through el. Next it transmits the fraction a2 to P2 through gl and e2 via the front-end of 
the processor Pl. This process continues until all the load fractions have been communicated. 
The timing diagram for this strategy is shown in Figure 3. From this figure, we obtain the 
following recursive quations for the individual oad fractions: 
ceiwiTcp--~o~i+l(wi+ITcp-~-Zi+lTcm-~-ZiTcm--~Zi-lTcm), i- -- 0 ,1 , . . . ,m-  1, (3) 
subject to 
aiz j  <_c~i-lzj+2, i=4 ,5 , . . . ,m and j=1,2 , . . . , i -3 .  (4) 
Efficient Load Distribution Strategy 99 
Po 
~- ;  
~1 '" 
I°;+1 
" I  2 , - ,  , 7 ,  
.... i I I i , , ! , , I 
i , , , 
i i 'N 
I , ! 
o( o Wo Top ] 
(X,I WI Tcp I 
I 
I 
~2W2~D 
• I I 
G(3W3 Tcp I 
~-i+l Z I-I "l'¢m h ' ~  ~;+2 zi'1 Tern 
, ' r I I i , [ , , ,~,., w, . ,  "r.,~ 
I I 
I ci~ t~'lW| +1Top I 
Figure 3. Timing diagram: Boundary case. 
We assume that z0 = z-1 = 0. The inequalities (4) are sufficient o ensure that a front-end is 
not involved simultaneously in receiving and transmitting the loads. Furthermore, we have the 
normalizing equation 
E = 1 (5) 
i=O 
Equations (3) and (5) constitute a system of (m + 1) linear equations with (m + 1) unknowns, 
yielding a unique solution. This would be a feasible solution if and only if it lies in the feasible 
region defined by (4). However, for a general case of arbitrarily chosen processor speeds and link 
speeds, this feasibility condition may not be satisfied. Hence, we shall make some assumptions 
on these speeds which will ensure the existence of a feasible solution. For this, we first solve (3) 
and (5). We can rewrite all (m+ 1) equations given in (3) in terms of c~m. Thus, the load fraction 
to the k th processor can be written as 
c~k = c~m , k = O, l ,  . . . , m -1 ,  .= 
1 w j+ l  + zk  c5 , 
f J=  w j  
k=j -1  
where (6) 
(r) 
where 6 = Tcm/Tcp. Using (5), the value of am is obtained as 
O~rn 
m--lm--1 } 
I I  • 
i=0 j= i  
(8)  
100 V. BHARADWAJ et al. 
Thus, the fraction of the load assigned to the k th processor is 
(mI_i1) / { m--lm--I }
= sj 1+ 1-[ I j  • (9) 
\ j=k  i=0 j=i 
From Figure 3, it can be seen that the processing time 7(m) is given by a0 w0 Tcp: 
7(m) = ,~-  1+ E I-I ' (10) 
i=0 j=i 
where fj  is as given in (7). From (7), it can be seen that fj > 1 i fw j+l  > wj, for all j .  
This implies, from (9), that ak+l < ak for all k. Now suppose zk+l > zk, for all k, then the 
inequalities in (4) are automatically satisfied. Thus, we assume that the processors and the links 
in the network are arranged in the decreasing order of speeds, i.e., 
wk < wk+l, zk <_ zk+l, for all k. (11) 
These assumptions are not as restrictive as they appear, since they include a large number of 
networks which have identical processors and identical inks. Hence, for all practical purposes, 
when Equation (11) is satisfied, the solution to the load distribution problem is obtained by 
solving (3) and (5). When Equation (11) is not satisfied, though the solutionsproposed here are 
not valid, the strategy of load distribution can still be employed to improve the performance for 
a given load distribution. This we shall demonstrate with an example in Section 5. 
3.2. Asymptot ic  Analys is  
We use the above closed-form solution to obtain the ultimate performance limits of the network 
with respect o the number of processors. For this, we consider the situation wherein all the 
processors are identical and all the links are identical. In other words, wi = w for all i = 
0, 1 , . . . ,  m, and zj = z for all j = 1, 2 , . . . ,  m. In this case, the closed-form solution given in (10) 
can be written as 
3g fo fl fro-2 wTcp 
7(m) = 3g (f0 + 1) f l  fro-2 + fro-1 _ 1' m > 2, where (12) 
z5 
g-  , f0=l+g,  f l  =1+2o,  f=1+30.  (13) 
W 
Hence, 
7(0o)-- lim 7(m)= awTcp where (14) 
m---*~ a+b ' 
a = 3g f0 f l ,  (15) 
b = 6a 2 + 6a + 1. (16) 
A comparison with the earlier strategy as proposed in [15] reveals the following interesting issues. 
From [17], we obtain r'(~o) for the earlier strategy as 
7'(0o) = 27 Top (17) 
l+x / l+4/g"  
Comparing (14) and (17), it can be easily proved that 7(0o) _< 7'(0o), for all a. In Figure 4, 
we plot the processing time against the number of processors for g = 0.1 and g = 1. These 
performance curves are bounded by an upper limit of 7(m) = 1 (as g --~ oo) and a lower limit of 
7(m) = 1/(m + 1) (as o ~ 0). These limits are the same for both the strategies. When there 
are only two processors in the network, both the strategies give identical time performance. As 
the number of processors increases, the time performance of the new strategy becomes better 
and better compared to the earlier one. As observed in the earlier studies [15,17], here too the 
performance curve saturates after a few processors. 
In Figure 5, we plot z(oo) vs log(a) and show that the performance of the new strategy is 
always superior to the earlier strategy though they perform equally well for o = 0 and a -- co. 
Efficient Load Distribution Strategy 101 
1.0  
. . . . . .  Previous Strategy 
- -  New Strategy 
o 
(}'=CO 
0.6  0"=1 
"~'(m) 
To4 
0.2  
0"--0.1 
-o if=0 
0 4 8 12 16 
~m 
Figure 4. ~(m) vs. m. 
New strategy 1.0 
. . . . .  Earlier strategy I 
o.81- /.7 
1/` 
• I.,./" 
o. ,V 
, ,Y  I 
t- 
1 (3.2 
f ' -  I I I I I I 
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 
PInO- 
Figure 5. 7-(0o) vs. loga. 
4. LOAD ORIG INAT ION AT  THE INTERIOR OF  THE NETWORK 
In this section, using the rules stated in Section 2.2 we obtain the t iming d iagram for the load 
distr ibut ion,  the recursive equations, and their  closed-form solution for a system of (R + L + 1) 
processors (shown in F igure 2b). Section 4.1 gives the closed-form solution for the case when 
102 V. BHARADWAJ et  a l .  
the sequence of load distribution by the root processor is first to the right-hand side and then 
to the left-hand side processors. Section 4.2 presents the ultimate performance limits for the 
above-mentioned sequence when all the processors are identical and all the links are identical. 
Finally, in Sections 4.3 and 4.4, we state and prove two important results regarding the optimal 
sequencing and optimal oad origination point in such a network. 
4.1. Closed-Form Solutions 
The timing diagram for the configuration shown in Figure 2b is as shown in Figure 6. The 
load is assumed to be distributed first to the right-hand side and then to the left-hand side. 
Note that Figure 6 shows the load distribution for only two processors on both sides of the 
root processor. The load distribution for the remaining processors follows the same pattern 
as shown in Figure 3 for the boundary case. From Figure 6, the following recursive quations 
can be obtained. The load fractions for the left-hand side are denoted as ael,a~,. . . ,a~ for 
the processors p~,pt2,...p~, and for the right-hand side they are denoted as a~, a~,. . . ,  a~ for 
processors p~, p~,...,p~, respectively. The fraction of the total load that the root processor (P0) 
keeps is denoted by a0. The recursive equations for the right-hand side and left-hand side are 
obtained as 
r r W r r r a i w~Tcp = ai+l ( i+1Tc, + Z~+l Tcm÷ z i Tcm + Zi_l Tcm), i -- 0, 1,. . . ,  R - 1; 
(18) 
(19) 
Po 
® 
® 
< I< I " '  • I I ': I " "  
O~ o w o "rcp 
Figure 6, Timing diagram: Interior case. 
I ,'w;,.. 
~ ~ =c/  (w~+lTcp+Z ~ Tcm+Z~Tcm+ e Tcm) i=1 ,2 , . . , L  1; c~ i w i Tcp i-i-1 i+l  Zi - -1  ' " - -  
t(wfTcp+ ~Tcm)+ Ea~ z ;Tcm+ a r z~Tcm; (20) O~ 0 W 0 Tcp ~-  oQ z i 
i= l  \ i=2  
where z~ = z ~ _ 1 = z~ = z51 = 0 and c~ = c~0. The normalizing equation is given by 
R L 
~0 + Z a~ + E ~ = 1 (21) 
i=1 i=1 
Efficient Load Distribution Strategy 103 
As in the boundary case, the following constraints have to be satisfied: 
SiZ ~ __~Si_IZ~+2, i=4 ,5 , . . . , L ,  j = 1 ,2 , . . . , i -3 ,  (22) 
a~ zj~<_s~_~ ~ zj+ 2,~ i=4,5 , .  . . . . .  ,R, j= l ,2 , , i -3 .  (23) 
Following the same arguments given in the previous ection, we solve the above set of equations 
(18)-(21), which give (R+L+ 1) equations with (R+L+I )  unknowns, to yield a unique solution. 
Each fraction of load s[,  i = 0, 1, . . . ,  R - 1, can be expressed in terms of s~. Hence, the load 
fraction s [  to the ith processor p[ is given by 
r = sr  r,i s i  R# ~,R-1) ,  i=0 ,1 , . . . ,R -1 ,  
Y 
#r(x'Y) = H f f '  and 
j=x 
\ 
1 r E z~6 I S; = WU Wj+ I Jr- 
where (24) 
(25) 
(26) 
Similarly, for the left-hand side, the load fraction for the ith processor p~ is given by 
=dipe( i , L -1 ) ,  i=1 ,2 , .  ,L 1, S i  . .  - -  (27) 
where #e(i, L - 1) and f~ are similarly defined as in (25) and (26) with the superscript r replaced 
by L 
Furthermore, from (20), s0 is also given by 
where 
so = s~5 #e(0, L -  1) + s~ [{#~(1,R-  1) + 1} Z[w05 + 0r(2, R -  1)(z[ + z~) ~oo ' 
(28) 
Y Y 
Or(x'Y) = E H ff" (29) 
i=x j=i 
Equating (24) with i = 0 and (28), we obtain a relationship between s~ and s~ as 
OL r = w0 #e(0, L - 1) 
S~L #r(1, R - 1) w[ - 0r(2, R - 1) (z[ + z~) 6 - z[ 6" 
(30) 
e is expressed in terms of s~, for all j = 1, 2, . . .  L. In the normalizing equation (21), using (30), s j  
From this equation, the value of a~ is obtained as 
1 
s~= ~, where (31) 
V~ ¢ = 1 + Or(2, R -  1) + (1 + f~)#r (1 ,R -  1) + ~ K, and (32) 
K = 1 + 0e(2, L - 1) + #e(1, L - 1), (33) 
Gr = f f  w~ ~ {#r(2, R_  l )}_  (z[ + z~)5 {Or(2, R_  l )}_  k w ° , 
wo J wo 
G e = #e(0, L - 1). (35) 
104 V. BHARADWAJ et al. 
Let us denote the processing time for the entire load as T(X, y), where the arguments (x, y) form 
an ordered pair, meaning that the load is first sent to the right-hand side with x processors and 
then to the left-hand side with y processors. From the timing diagram shown in Figure 6, the 
processing time for the entire load is given by 
T(R, L) = ao wo Tcp, where ao is given by 
~(0,  R - 1) 
O~ 0 - -  ¢ 
(36) 
(37) 
Thus, the processing time is given by 
T(R, L) = #r(0' R - 1) woT~p (38) 
¢ 
From (27), it can be seen that when we+l _> w~, we have fe _> 1; which implies that  old+ 1 <~_ O~, for 
all i. Now if zi+e 1 -> zi ~ for all i, then the inequalities in (22) are automatically satisfied. Similar 
arguments hold for the right-hand side also. Hence, we assume that the links and processors are 
arranged in decreasing order of speeds, as in the boundary case, on both sides. Thus, 
Wek ~ W e ~ W r r - k+l, zek _< zk+l; w~ _< k+l, z~¢ < zk+x, for all k. (39) 
The closed-form expression given by (38) is valid only when R > 2 and L > 2. By following a 
similar procedure, we can get the closed-form expressions when R < 2 or L < 2. These are given 
below: 
f~ fo ~ ~o Tcp (40) 
(i) T(1, 1) = f0 ~ (1 + f~) + (W'~/Wo)' 
f~JoJ1 0 ¢p (41) 
(ii) "r(1, 2) = {fe fe l ( l+f~)}+( l+fe){ f~_(z ,~5/wo)}  ; 
(iii) T(2, 1) = f~ f~ W0 Tcp 
1 + f~ (1 + f~) + (f~ f~ - (1 + f~) z~ 5/wo}/fg '  (42) 
(iv) T(2, 2) = f~ f~ W0 Too 
1 + f~ (1 + f~) + (f~w~ - z~6) (1 + fel) / (feofelwo)' (43) 
(v) T(X, 1) = #~(0, X -- 1) W0 Tcp ¢1 , for x > 2, where (44) 
c~ (45) ¢1 = 1 -~ (1 + f~) #r(1,X -- 1) + 0r(2, X -- 1) + G--~I , 
1 [z ~ Grl = -~00 [ 15 {1-[-tgr(2'x - -1)  q- l f f ( l ' x - -1 )}  + Z~ 5~r(2 'X  - -1)  
- w0 f~ f~" #~(2, x - 1)], 
G~ = fi~; 
T(1, x) - f~ wo Tcp ¢2 ' for x > 2, (vi) where 
¢2=1 G~ {f~ + 0e(2, x_  1) +#e(1 ,x  - 1)} +/~ + G-~2 
wo 
G~ = ~e(o, x - 1); 
(vii) T(2, X) ---- f~ f~" W0 Tcp for x > 2, where 
¢3 
(46) 
(47) 
(48) 
(49) 
(50) 
(51) 
(52) 
Efficient Load Distribution Strategy 105 
e (2, 1 )+/ (1 ,x  1)} (53) ¢3 = {1+ z -  - , 
G~ = ]5 f [  z] 6 (1 + f [ ) ,  (54) 
wo 
G~ = #e(0, x - 1); (55) 
(viii) T(X, 2) = ttr (0, X -- 1) W0 Tcp for x > 2, where (56) 
¢4 
a~ (1 + f l )  (57) ¢4 = l +Or(2 ,x -1)  + (l + f~)#r (1 ,x -1 )  + G---~4 
1 Fz ,- G~ = ~oo L ~ ~ {1 + 0~(2, x - 1) + #r(1, X -- 1)} + z~ 60r(2, X -- 1) 
- w0 #r(0,  x - 1)] ,  (58) 
G~ e e 
= fo f l '  (59) 
4.2. Asymptot i c  Analysis  
We use the above closed-form solution to obtain the ultimate performance limits of the system 
with respect o the number of processors. For this, we consider the situation wherein all the 
processors are identical and all the links are identical. In other words, we have we = w~ -- w for 
r all i, and zj = z~ = z for all j. In this case, the closed-form expressions given above for R > 2 
and L > 2 reduce to 
3a f2 fl  fL-2 w Tcp (60) 
T(1, L) = fL-2 {3o. f0 f l  (1 + f0) + 3a f l  + f} -- 1' 
30" f2 fl fR-2 w Tcp (61) 
T(R, 1) = fR-2 {3o. f0 f l  (2 + f0) + f (f0 -- 2Cr) -- 3(72 fl} q- (O. f -- 1)' 
3(7 fO f21 fL-2 W Tcp (62) 
r(2, L) = fL-2 {3a (1 + f l)  (1 + fo f l)  + 3o'fo f l  + 1} - 1' 
3a fo f~ fR-2 w Tcp (63) 
T(R, 2) = fR-2 {30" f2 (1 + fo) + f l  f + 20"} + (20" f - 1)' 
9o.2 f2 f2 fR+L-4 W Tco (64) 
T(R, L) = fR+L-4 KI - o. fR-2 q_ fL-2 K2 -b (3cr 2 - 2(7 f ) '  
where 
K l=3af0 f l f+9a  2f0f l  2 ( l+f0)+a( f+3af l ) ,  
K2 = (2a f - 3a 2) (f  + 3a f l)  - 3o. f0 fl ,  
(65) 
(66) 
and f0, f l ,  and f are defined earlier in (13). Now by letting R and L tend to infinity individually 
and jointly in (60)-(64), we obtain the following expressions: 
29:9-H 
T(1, CO) = lim T(1 ,L ) -  AwT¢.________~p (67) 
L-*co B 
T(OO, 1) = lim T(R, 1 ) -  AwTcp (68) 
R-~oo C ' 
T(2, CO) = lim T(2, L) -- A f l  w TCp (69) 
L--.oo D f0 ' 
v(eo, 2) = lim T(R, 2) -- A w f l  Tcp (70) 
R--.oo E ' 
7(co, L )= lim T(R,L)= 3af lA fL -2wTcp  L>2,  (71) 
R-~ K1 fL -2  _ (7 ' 
106 v. BHARADWAJ et al. 
where 
3al l  A fa -2wTcp 
T(R, ~)  = L--.oolim T(R, L) = K1 fR-2 q- 1(2 ' 
r(oo, c~) = lim T(R, L) - 3a f l  A w Tcp 
R--.oo g l  ' 
L---*oo 
R>2,  (72) 
(73) 
A=3af~f l ,  
B = 3a for1 (1 + f0) +3a l l  + f, 
c = 3a )Co f l  (2 + f0) + f (f0 - 2a) - 3a 2 fl ,  
D = 3a(1 + f l ) (1  + f0 f l )+  3a f0 f l  + 1, 
E = 3a fo f2 (l + fo) + fo f l  f + 2a fo. 
(74) 
(75) 
(76) 
(77) 
(78) 
It should be noted that T(C~, L) and T(R, oo) are not equal even when R = L. This indicates 
that the sequence of load distribution affects the processing time. We shall analyze this further 
in the next section. 
In [18], the value of T(OO, CO) for the earlier strategy was evaluated. Comparing this with the 
present value we note that there is considerable improvement in the processing time. This can 
be seen by substituting different values of a in both the expressions. 
4.3. Optimal Load Sequence 
In this section, we state and prove an important result regarding the sequence of load distrib- 
ution for the interior cse. 
THEOREM 1. In the interior case, having identical processors and identical inks, with x proces- 
sors on the right and y processors on the left, the processing time will be a minimum if the 
sequence of load distribution by the root processor is first to the side with the less number of 
processors. 
PROOF. We have to show that: 
(i) ~-(1, z) < T(X, 1), for x > 2, 
(ii) T(2, X) < T(X, 2), for x > 2, 
(iii) T(X,y) < T(y,x), for x > 2, y > 2 if x < y. 
Cases (i) and (ii) can be easily proved using (60)-(63). To prove case (iii) consider (64). It can 
be observed that the expression T(X, y) can be obtained from (64) by putting R = x and L = y, 
and T(y, X) can be obtained by interchanging x and y in the resulting expression 
2 2 2 W Top 9a f~ f l  fx+y-4 
T(y, X) : fx+y-4 K1 - a fy-2 -b fx-2 /(2 -t- (3(72 - 2a f ) '  (79) 
where K1 and /{2 are given by (65) and (66), respectively. Note that the numerators of (64) 
(with R -- x, L -- y) and (79) are identical. Denoting the denominators of (64) and (79) as D1 
and D2, respectively, the value of (D1 - D2) is obtained as 
91 - 92 = C r2 ( fy -2  _ fx -2 )  (18a2 q_ 24a + 24). (80) 
From (80), it can be seen that (D1-D2) > 0, since cr > 0, f > 1, and proves that r(x, y) < 7(y, x), 
i fx  <y .  
Further, from (41) and (42), it can be verified that r(1, 2) and T(2, 1) have the same closed-form 
expressions. This completes the proof of the theorem. | 
In earlier studies on linear networks [15,16], the proposed strategy for load distribution had 
the property that the processing time was independent of the sequence of load distribution [16], 
whereas in this strategy this is not true. 
Efficient Load Distribution Strategy 107 
4.4. Opt ima l  Load Or ig inat ion 
In this section, we state and prove an important heorem regarding the optimal oad origination 
point in a linear network. We assume all the processors are identical and all the links are identical. 
For the sake of notational simplicity, we redenote the processors as Po,Pl ,P~,. . .  ,Pro from the 
right-hand side. We also assume that the load is first distributed to the right-hand side and then 
to the left-hand side processors, when the load originates at the interior of the network. Hence, 
when the load originates at the ith processor, there will be i processors on the right-hand side 
and m - i processors on the left-hand side. The processing time is denoted by r(i, m - i). Now 
the problem is to find an i = i*, the optimal load origination point, such that T( i* ,m -- i*) is a 
minimum, i.e., 
i *= argmin (T ( i ,m- - i )} .  (81) 
iE{0,1,...,rn} 
To find this i*, we use the following preliminary results. 
LEMMA 1. For m = 2, T(1, 1) < ~-(0, 2), 
PROOF. The expressions for ~-(1, 1) and ~-(0, 2) can be derived from (40) and (10) as 
r(1, 1) = :g w Tcp 
1 + fo (1 + fo)' (82) 
~(0, 2) =/~ ~ Top (83) 
fl +2  
Using the above expressions, the lemma is proved easily. II 
LEMMA 2. For m = 3, T(1, 2) < r(0, 3), 
PROOF. The expressions for T(1, 2) and T(0, 3) can be obtained from (41) and (12) as 
f0:1 w Tcp (84) 
r(1,2) = fl (1 + fo) + 2' 
f0 fl : W Tcp (85) 
~(o,3) = :11(1+:o)+f+1 
Using (84) and (85), the lemma is proved. 
LEMMA 3. For m = 4, 
(i) T(1, 3) < ~(0, 4), 
(ii) T(1, 3) > ~(2, 2). 
PROOF. 
(i) The expressions for ~-(1, 3) and r(0, 4) can be obtained from (60) and (12): 
3G/o fl f2 w Tcp 
r (0 ,4 ) :  :2{3af l+3ayof l+f}_ l '  
3or fo 2 fx f w Tcp 
T(1,3)= f{3af0 f l ( l+f0)+3af l+f}_ l .  
(86) 
(87) 
From (86) and (87), we can prove that r(1, 3) < r(0,4). 
(ii) The expression for ~-(2, 2) can be obtained from (43) as 
r(2, 2) = fo fl 2 w Top 
f l  (1 + f0 f l  + fl) -'I- (I + fl)" 
(88) 
Similarly, we can show that T(I, 3) > w(2, 2). | 
108 V. BIIARADWAJ et al. 
LEMMA 4. For m = 5, 
(i) r(1, 4) < r(0, 5), 
(ii) r(2, 3) < r(1, 4). 
PROOF. 
(i) The expressions for r(1,4) and r(0,5) are obtained from (60) and (12) as 
33 f~ f l  f2 w Top 
7"(1,4) = f2 (33 f0 f l  q- 3o. f2 f l  + 33 f l  q- f} -- 1' 
3o. f0 f l  f3 w T~p 
7"(0, 5) = f3 {3o" f l  q'- 30. fo f l  q- f} - 1" 
With suitable algebraic manipulations, it can be easily proved that 7"(1, 4) < 7"(0, 5). 
(ii) The expression for r(2, 3) is obtained from (62) as 
3o. fo f? f w Tcp 
T(2, 3) = f {30" (i + f l )  (1 + f0f l )  -t- 3O" f0 f l  + 1} -- 1" 
Similarly, it can be proved that T(2, 3) < r(1, 4). 
LEMMA 5. For m > 5, 
(i) r(1, m - 1) < 7-(0, m); 
(ii) T(2, m- -2 )  < r (1 ,m- -  1); 
(iii) For i e {2, 3 , . . . ,  [m/2J }, 
r( i  + 1, m - i - 1) _< r(i, m - i), 
T(i + 1, rn -- i -- 1) _> T(i, m -- i), 
i[ {fm-2i-lg(0.)+ 1} < 0, 
it {fro-2,-1 g(0.)+ 1} > 0, 
where g(a) = 1833 q- 24o -2 h- 63 - 1. 
PROOF. 
(i) The expressions for T (1 ,m-  1) and T(O,m) can be obtained from (60) and (12) as 
30" f2 f l  fro-3 W Tcp 
7-(1, m - 1) = fm-3  {33 fo f l  "t- 33 f2 f l  "q- 3(7 f l  -t- f}  - 1' 
30. fo f l  fro- z w Top 
7"(0, m) : frn-2 (30. f l  + 3O" f0 f l  + f} -- 1' 
(89) 
(90) 
(91) 
(92) 
(93) 
Similarly, from (94) and (95), it can be shown that r(2, m - 2) < r(1, m - 1). 
(iii) First, we shall consider the case when i = 2. The expression for T(3, m--3) can be obtained 
from (64) as 
932 f2 f? fm-a  w Top (96) 
r(3, m - 3) = fm-4  K1 - a f + fro-5 K2 + (332 - 23 f ) '  
Appropriate algebraic manipulations can be done to prove that T(1, m - 1) < T(0, m). 
(ii) The expressions for r (1 ,m - 1) and T(2, m -- 2) can be obtained from (60) and (62), 
respectively, as 
33 fg f l fro-3 w Tcp 
7-(1, rn - 1) = fro-3 {33 fo f l  -1- 3a f02 f l  + 33 f l  + f} - 1' (94) 
33 fo f2 fro-4 w Tcp (95) 
v(2, m - 2) = frn-4 {33 (1 + f l )  (1 + f0 f l )  + 33 fo f l  -k 1} - 1" 
Efficient Load Distribution Strategy 109 
where K1 and/(2  are as defined earlier in (65) and (66). The expression for r(2, m - 2) 
is given in (95). Multiplying the numerator and denominator f (95) by 30. f0, the numer- 
ators of (95) and (96) will be identical. Denoting the new denominators as D2 and Da, 
respectively, the value of D2 - D3 is obtained as 
D2 - D3 = 362 {fro-5 g(0.) + 1}, where 
g(0.) = 18 0.a + 24 0.2 + 60. - 1. 
(97) 
(98) 
Thus, from (97), it can be seen that (D2 - D3) < 0 if {fm-hg(0 . )  + 1} < 0 and, hence, 
T(3, m - 3) < T(2, m -- 2). Similarly, T(3, m -- 3) > v(2, m -- 2), if {f ,n-hg(0.)  + 1} > 0. 
Now consider the general case i > 2. The expressions for r( i ,  m- i )  and r ( i+ l ,  m-i - l )  
can be obtained from (64) as 
9o.2 f~ f~ fro-4 W Tcp 
T(i, m -- i) = fro-4 K1 - a f i -2  + fm-2- i  K2 + (30 .2 - 20. f ) '  (99) 
90.2 2 2fm-a  fO f l  wTcp (100) 
T(i + 1 ,m- - i - -  1) = fm_4 K1 _ a f t _  1 + f,n_i_S K 2 + (3a2_2af ) .  
Since the numerators of (99) and (100) are identical, we find the difference in their denom- 
inators. Denoting the respective denominators a D~ and Di+l, the value of Di - Di+l is 
obtained as 
Di - Di+l = a f i -3  {g(ff) fm-2 i -1  + 1}, (101) 
where g(a) is given by (98). Thus from (101), it can be seen that (Di - D~+I) < 0, if 
{fm-2 i -Yg(a)  + 1} < 0, which means that T(i + 1 ,m -- i -- 1) < v( i ,m -- i). Similarly, 
T( i+ l ,m- - i - -1 )  >T( i ,m- - i )  i f{ fm- -2~- lg (a )+ l}  >0.  This proves the lemma. | 
Now we state the optimal oad origination theorem based in the above lemmas. 
THEOREM 2. In a linear network consisting of (m+ 1) identical processors denoted as po,pl , .  . . , 
Pro, connected by m identical inks, and the sequence of load distribution being first towards Po, 
the processing time will be a min imum if the load originates at the processor pi.,  where i* is 
given as follows: 
(I) when m = O, i* = O; 
(II) when m = l, i* = O or l; 
(III) when m = 2, i* = 1; 
(IV) whenm=3,  i *= l  or 2; 
(V) when m = 4 and 5, i* = 2; 
(VI) when m > 5, 
(a) i f  { fm-hg(0 . )  + 1} > O, then i* = 2, 
(b) /f {fm-hg(0.) + 1} = 0, then i* = 2 or 3, 
(c) i f{ fm-hg(a)  + 1} < O, then 
(i) if h(0.) = [h(0.)~, then i* = h(0.), h(0.) + 1, 
(ii) i f  h(0.) # [h(0.)], then i* = [h(0.)], 
where h(0.) = ½ [(m - 1) + {ln(-g(0.))/ln(/)}]. 
PROOF. The proofs for (I) and (II) are trivial. 
(III) When m = 2, from Lemma 1, it can be seen that i* = 1. 
(IV) When m = 3, it has been shown in Theorem 1 that r(1, 2) = r(2, 1). Further, using 
Lemma 2, we see that i* = 1 or 2. 
(V) When m = 4, using Theorem 1, r(1, 3) < T(3, 1). F~'om Lemma 3, we see that r(1, 3) < 
7"(0,4) and 7-(1, 3) > T(2,2). Therefore, i* = 2 for m = 4. When m = 5, using Theorem 1, 
110 V. BHARADWAJ et al. 
(vi) 
we see that r(1,4) < r(4,1) and r(2,3) < ~-(3, 2). Further, from Lemma 4, it can be seen 
that ~-(1, 4) < r(0, 5) and r(2, 3) < r(1, 4), which implies that i* = 2. 
(a) 
(b) 
(c) 
From Lemma 5, it can be seen that, when {fro-5 g(a) + 1} > 0, T(2, m--2) < T(3, m-- 
3). Further, it is easily proved that {fm-2~-I g(a) + 1} > 0, for i E {3,4, . . . ,  [m/2J}, 
and hence, r( i  + 1, m - i - 1) > r ( i ,m - i), i E {3,4, . . . ,  [m/2J}. This proves that 
i* = 2. 
If {fm-hg(a)  ÷ 1} = 0 then, similarly, {fm-2~-I g(a) + 1} > 0 for i E {3 . . . ,  [m/2J } 
and therefore i* = 2 or 3. 
For i E {2, 3 , . . . ,  [m/2J }, if 
i - 1) < T(i, m - i). Since J 
g(a) < 0. Now consider the 
r(i* + 1, m-  i* - 1). From 
possible cases: 
{ fm-h  g(a) + 1} < 0, then from Lemma 5, T(i + 1, m -- 
> 1, the inequality {fm-2~- I  g(a) + 1} < 0 implies that 
minimum i* E {3, . . . ,  [m/2J} such that T(i*, m -- i*) < 
Theorem 1 such an i* must exist. Now consider the two 
(i) ~(i*, m - i*) = r(i* + 1, m - i* - 1). This implies that {fm-2 i* - i  g(a) + 1} = 0. 
From which i* = h(a), h(a) ÷ 1. 
(ii) v(i*, m - i*) < T(i* + 1, m -- i*). This implies that {fm-2 i* - l  g(a) ÷ 1} > O. 
From which i* = [h(a)]. 
It can be easily verified that T( i ,m -- i) < T(i ÷ 1, ra -- i -- 1) for i > i* + 1, in case (i) and 
i > i* in case (ii). From which we get the optimal load origination point for the case m > 5. 
This proves the theorem. | 
5. DISCUSSIONS 
The underlying principle of the new strategy proposed in this paper is in the exploitation of the 
independent functions of the processor and its front-end. This allows efficient utilization of the 
front-ends and shows considerable improvement in the time performance by allowing the proces- 
sors to start processing their load fractions at an earlier instant in time. When the processors 
are not equipped with front-ends, this new strategy will not show any improvement since the 
processors cannot perform computation and communication simultaneously. 
In Section 3.1, we have assumed that the processors and links should be arranged in the 
decreasing order of speeds. This assumption is sufficient o obtain a set of linear equations with 
a feasible solution. For an arbitrary arrangement of processors and links, the set of equations 
will not yield a feasible solution. However, given a load distribution, the basic concept behind 
the new strategy can still be employed to show an improvement in time performance over the 
earlier strategy [1]. The example that follows illustrates this. Consider the linear network shown 
in Figure 7. The inequalities (11) are violated here. But the system of Equations (3) and (5) can 
still be solved to yield s0 = 0.4711, ~1 = 0.3768, ~2 = 0.1077, c~3 = 0.0154, ~4 = 0.0154, and 
c~5 = 0.0136. Note that with these values the inequalities in (4) are violated. Hence, the timing 
diagram given in Figure 3 is no longer valid. However, we can use the rules of load distribution 
given in Section 2 to distribute these load fractions as shown in the timing diagram in Figure 7. 
From the figure, it can be observed that all the processors do not stop at the same time instant 
and processing time is given by the processor P5 as 1.926. When the earlier strategy [15] is 
adopted for this linear network, the processing time obtained is 2.055. Thus, even in the case of 
arbitrary arrangement, this new strategy shows an improvement in time performance. 
6. CONCLUSIONS 
In this paper, a new strategy of load distribution is proposed for a linear network of (m + 1) 
processors and m links. This strategy is mainly based on the concept of efficient utilization of 
the front-ends of the processors. Closed-form solutions for the case when the processing load 
originates at the boundary and for the case when the processing load originates at the interior of 
Efficient Load Distribution Strategy 111 
1.131 I 0.324 ] 
O~oWoTcp = 1,684 [ 
],  °/'1Wq ToP = 0"754 ] 
1_~0.015[_ .~Q.0~ 0 T 2 • 1.l~O? 
.014 
if"2 W=Tcp = 0,216 ] 
['~ 1,3,,1,061 
~,W~,Tcp : 0.12 I 
~0.007 T 4 ,,1.911 
I $~4W4 TcpS 0"06 ,] 
I 0(sW51.cp. 0,0Z !
( Not to $eale ) 
Figure 7. Example for arbitrary arrangement. 
"fSg 1.626 
I 
the network have been derived. In both the cases, we assume that the processors and links have 
been arranged in the decreasing order of speeds. Asymptotic performance analysis for both the 
cases have been carried out and a comparative study with the earlier strategy is presented. For 
the interior case, two theorems regarding the optimal sequence of load distribution and optimal 
load origination point for a given sequence have been proved. 
This paper discusses the strategy only when links and processors are arranged in the decreasing 
order of speeds. However, even for an arbitrary arrangement, and a given load distribution, 
this strategy can be applied to get an improved time performance. An example with arbitrary 
processor and link speeds has been given, and a comparison with the earlier strategy [15] shows 
that this new strategy gives a better time performance. 
The problem of obtaining the optimM load distribution to minimize the processing time, in 
a network having processors and links with arbitrary speeds, remain an open problem for the 
strategy proposed in this paper. 
REFERENCES 
1. S.H. Bokhari, Assignment Problems in Parallel and Distributed Computing, Kluwer Academic Publishers, 
Boston, (1987). 
2. T. Bla~ewicz, M. Drabowski and J. Weglarz, Scheduling multiprocessor tasks to minimize schedule l ngth, 
IEEE Transactions on Computers C-35, 389--398 (1986). 
3. R. Weber, On a conjecture about assigning jobs to processors ofdifferent speeds, IEEE Transactions on 
Automatic Control 38, 166-170 (1993). 
4. H. Lee and O.R. Liu Sheng, Optimal data allocation in the bus computer network, Proc. of IEEE Intl. 
Conf. on Comp. and Comm, pp. 394-399, Phoenix, AZ, (1990). 
5. F.D. Fracchia nd L.V. Saxton, Approximation algorithms for scheduling on uniform processors, Information 
Systems and Operational Research 31, 16-23 (1993). 
6. N.G. Shivaratri, P. Krueger and M. Singhal, Load distribution for locally distributed systems, IEEE Com- 
puter Magazine 25, 33-44 (1992). 
7. W.W. Chu. L.J. Holloway, M.T. Lan and K. Efe, Task allocation in distributed ata processing, IEEE 
Computer Magazine 13, 57--69 (1980). 
112 V. BHARADWAJ et al. 
8. D.-T. Peng and K.G. Shin, Optimal scheduling of cooperative tasks in a distributed system using an 
enumerative method, IEEE Transactions on Software Engineering 19, 253-267 (1993). 
9. C.C. Price and M.A. Salama, Scheduling of precedence-constrained tasks on multiprocessors, The Computer 
Journal 33, 219-229 (1990). 
10. J. Xu, Multiprocessor scheduling of processors with release times, deadlines, precedence, and exclusion 
relations, IEEE Transactions on Software Engineering 19, 139-154 (1993). 
11. D. Fernandez-Baca, Allocating modules to processors in a distributed system, IEEE Transactions on Soft- 
ware Engineering 15, 1427-1436 (1989). 
12. C.-H. Lee, D. Lee and M. Kim, Optimal task assignment in linear array networks, IEEE Transactions on 
Computers 41, 877-880 (1992). 
13. R. Mirchandaney, D. Towsley and J.A. Stankovic, Analysis of the effects of delays on load sharing, IEEE 
Transactions on Computers 38, 1513-1525 (1989). 
14. K.G. Shin and M.S. Chen, On the number of acceptable task assignments in distributed computing systems, 
IEEE Transactions on Computers 39, 99-110 (1990). 
15. Y.C. Cheng and T.G. Robertazzi, Distributed computation with communication delays, IEEE Transactions 
on Aerospace and Electronic Systems 24, 700-712 (1988). 
16. V. Mani and D. Ghose, Distributed computation i  linear networks: Closed-form solutions, IEEE Transac- 
tions on Aerospace and Electronic Systems 30, 474-490 (1994). 
17. D. Ghose and V. Mani, Distributed computation with communication delays: Asymptotic performance 
analysis, Journal of Parallel and Distributed Computing 23, 293-309 (1994). 
18. S. Bataineh and T.G. Robertazzi, Ultimate performance limits for networks of load sharing processors, Proc. 
of the Conf. on Information Sciences and Systems, Princeton, 794-799 (1992). 
19. Y.C. Cheng and T.G. Robertazzi, Distributed computation for tree network with communication delays, 
IEEE Transactions on Aerospace and Electronic Systems 26, 511-516 (1990). 
20. V. Bharadwaj, D. Ghose and V. Mani, Optimal sequencing and arrangement in single-level tree networks 
with communication delays, IEEE Transactions on Parallel and Distributed Systems 5, 968-976 (1994). 
