Novel Concepts In Divisible Load Scheduling With Realistic System Constraints by Suresh, S
Abstract 
In recent years, there JS a great deal of attention focused on dimible load scheduling 
problem m a &tri bu ted compu tmg system/network consistrng of a number of processors 
interconnected through communication links A &visible load can be dvlded lnto any 
number of  fractions, and can be processed independently on the processors, as there are 
no precedence relationship In other words, divisible load has the property that d the 
elements m the load reqwes the same type of processmg In a dstributed computmg 
system, t h  load origmates at one of the processors T b  processor divldes the load into 
many &actions, keeps one of the fractions for itself to process/compute and &tributes 
the rem- load fractions to other processors m the nemrk  The load fixtiom 
ass~gned to the processors are processed in pardel The objective in &wsible load 
s c h e d h g  JS that of h d n g  the optmd load fractions assigned to the processors such 
that the processing time of entue processmg load ~s a mlnlmum 
The research work reported in the t h w ,  presents divisible load scheduhg by comidenng 
the r&tic system constrmts m addrtion to the mherent communication and computa- 
tion delays on the processmg time. Novel concepts are used to study the effect/&uence 
of realrstic system constraints such as start-up delays, available bufTer/memory sue and 
processor release tunes on the processlag time We consider smgle and multi-installment 
load drstribution strateges in blocdung mode of cornmumcation Another mode of com- 
munication known as non-blocIang mode of wmmu~~lcation is also addressed m thu 
them. We present an apphcation of &visible load scheduling methodology for optmal 
partition of a neural network architecture for parallel trauruzg process. The m m  objec- 
tive in t h s  work is to design and mdyze load dzstribution strategies to mimmize the 
processmg tune o f  the entlre processrng load for a gven network The man contributions 
m the them are summax1zed below 
Fust, we study the dec t  of  start-up delays in a bus network mth m chld processors 
The &visible load is msrrmed to origmate at the bus control urn t ,  whch divides the load 
into m fractions and &tributes the load fractions to the chld processors in a sequence, 
one after another. For a given sequence of load dstribubon, we present the recursive 
load distribution equations and derive a closed-form expression for the processing time 
For the case of a bus network, when the start-up delays are not considered, it is known 
that the processmg tune decreases mth rncrease in the number of processors and &o the 
processmg tune is independent of the sequence of load dstribution It is shown in our 
studes, that when the start-up delays are mcluded, the processmg time decreases up to 
a certain number of processors and beyond whch the processing time increases It is also 
shown that the processmg tune rs depends on the sequence of load ciktribution Us~ng 
the c l d - f o r m  qression, the optmal number of processors and the optimal sequence 
of load distribution are obtained. 
Next, we extend this study to a smglt+level tree network with m chdd processors Here, 
the processing load IS assumed to o rmate  at the root processor, whch divides the 
load into m + 1 fractions, keeps its own fixtion for processmg, and distributes the 
rest to the child processors in a gven sequence, one after another We present the 
two combinatorid optimation problems that axBe in schedzihng diwible loads m a 
single-Ievef tree network with communication start-up delays The first problem LS that 
of finding the optimal combination of particrpatmg processors for a gven sequence of 
load distributzon The second problem B that of h d m g  the optmal sequence of load 
cktribution It is shown in this t h e s ~ ~  that both these problems are of combinatorid in 
nature and are diBcult to solve. Hence, two genetic algorithm approaches are presented 
for the solution o f  these problems It is kuown that the solution obtasned from genetic 
aJgorithms can not be guaranteed to be optimal; i e ,  no formal proof of  optimalrty 
But, for a large class of combinatorial optmation problems, it has been shown that the 
genetic algon thm produces solution that are optimal or close to op tunal solution Hence, 
we c&U the solution obtmed upon ternnation fiom genetic algorithm as the 'best' 
solution The fist genetic algorithm, search for the best combmation of participating 
processors for a given sequence of load drstribution such that the processmg tune IS a 
minrmum The second genetic algorithm, search for the best sequence of load drstribution 
such that the processmg tulle IS a minimum We also present numerical examples to show 
the applrcabihty of the genetic algorithm appro& 
Another mportant issue in a dtstributed computrng system LS the avdable b d e r  (mem- 
ory) size at the processors The processors may be engaged m local load processmg or 
sharing the resources m t h  other processors Because of thLs multi-taskmg abhty of each 
processor, ~t may allocate lunited blrffer sze  for stormg the load fraction of the &vLsible 
load Due to these buffer coastramts, all processors partic~patmg in the computation 
process need not stop computmg at the same tune lnstant We constder a smgle-level 
tree network to study the effect of b d e r  constrarnts on the processmg tune A h e a r  
programming formulation JS presented to find the load fractions ass~gned to the proces- 
sors and the processing tune for a gven sequence of load &stribution. The problem of 
fin- the optimd sequence of load tiistribution mth  buffer constrmts IS a combma- 
torial optunization problem Hence, we use the genetic algorithm approach for h d m g  
the best sequence of  load drstributlon We also extend tkus study to a multi-level tree 
network w ~ t h  buffer constrarnts. 
Next, we present a multi-installment load drstribution strategy m a smglelevel tree 
network with communication start-up delay liz tkus strategy, the root processor dLsL 
tributes the load fractions to the child processors in more than one installment In 
single installment load drstribution, it is shown in tLus them, that there exists an op- 
timal number of processors m* for which the processmg time IS a min~mum. Smdarls: 
in multi-installment load distribution strategy, it is shown that there exists an optimd 
sequence of load dstribution, optlmal number of processors m* (m* < m), and an o p  
timal number of installments n* (n* < n) such that the processmg time is a a m  
Thk is a difbicult optimization problem to solve because for a given sequence of load 
distribution, we have to h t  find the processors participating m each installment and 
then obtazn the load iiactions assgned to them and the processing tune In t b  thesis, 
we present a real-coded genetic algorithm approach to search for the optimal sequence 
o f  load dstribution, opt~mal number of processors and o p t d  number of installments, 
such that the procesmg tulle IS a mlnrmum Using the values of load fractions for any 
given sequence of load dzstribution, we can search for a better sequence of load dLStribu- 
tion. We have also shown that thrs red-coded genetic algorithm approach can be used to 
solve the multi-installment load drstribution strategy mth start-up delays a.nd processor 
release times 
We present an apphcation of the almible load schedulrng methodology to find the op- 
timal partition of the neural network archtecture The multi-layer perceptron network 
wth backpropagation learmg algorithm IS widely used for solwng the practical prob- 
lems Ln general, the trauvng time requrred to solve practical problems IS hgh  Hence, in 
this thms, we consider parallel rmplementation of multi-layer perceptron network. Here, 
the neural network archtecture IS partitioned mto sub-networks and each sub-network 
are trained m Werent processors usmg parallel leaz1~1g algonthm In thxs thes~s, we 
develop a newpdtioxung scheme called 'hybnd partibong scheme' to reduce the mter- 
processor communication m the trauung process For a homogeneous system, we derive 
a closed-form expression for trarning time It is shown in t b  them that the hybrid 
partitioning scheme performs better than the earher vertical partItiomg scheme. When 
hybrid partitionrng scheme IS extended to heterogeneous systems, the neural network is 
partitioned us= the o p t d t y  prmciple presented m diwsible load schedulrqg method- 
ology. A closed-form expression for number of neurons assigned to each processor m the 
network IS derived Using the closed-form expression, we also obtun the optlmal num- 
ber of process^^^ required to minlmlze the trarning tune Fmall' we demonstrate our 
methodology by solving the hand written character recognztion problem and structural 
health momtoring problems in network of workstations The analytical and experimental 
performances are measured and compared. 
In case of a siugle-level tree network, so ffar it has been assumed that the cELild processors 
start computing only after its front-end completdy receives the load iiaction ass~gned 
to it. Load &tribution based on tkus assumption LS known as 'blockmg mode of com- 
mumcation' Thzs commumcation model mtroduces an idle time for aLl the processors 
participatmg m the computation process In hterature a new cornmumcation model 
known as 'non-blobg mode of comm~cation', ~s presented m which the processors 
start computmg whde its front-end starts recaving the load fractions Many analysis 
such as when to &ribute the loads to processors m the network, optmal sequencmg 
and magement of processors m the network, and the effect of start-up tlme are a d -  
able m blodmg mode of commmcation It is mterestmg to analyze the above Issues in 
non-blockug mode of commu~vcation In ths them, we present an equivalent network m 
bloclung mode of corn-cation for a network usmg non-blockmg mode of communzca- 
tion This eqmvdent network can be used d.zrectly to analyze the Issues m non-blocdung 
mode of commuzucation. 
