A Distributed Discrete-Time Neural Network Architecture for Pattern Allocation and Control by Chronopoulos, A.T. & Sarangapani, Jagannathan
Missouri University of Science and Technology 
Scholars' Mine 
Computer Science Faculty Research & Creative 
Works Computer Science 
01 Jan 2002 
A Distributed Discrete-Time Neural Network Architecture for 
Pattern Allocation and Control 
A.T. Chronopoulos 
Jagannathan Sarangapani 
Missouri University of Science and Technology, sarangap@mst.edu 
Follow this and additional works at: https://scholarsmine.mst.edu/comsci_facwork 
 Part of the Computer Sciences Commons, Electrical and Computer Engineering Commons, and the 
Operations Research, Systems Engineering and Industrial Engineering Commons 
Recommended Citation 
A. Chronopoulos and J. Sarangapani, "A Distributed Discrete-Time Neural Network Architecture for 
Pattern Allocation and Control," Proceedings of the International, IPDPS 2002, Abstracts and CD-ROM 
Parallel and Distributed Processing Symposium, Institute of Electrical and Electronics Engineers (IEEE), 
Jan 2002. 
The definitive version is available at https://doi.org/10.1109/IPDPS.2002.1016613 
This Article - Conference proceedings is brought to you for free and open access by Scholars' Mine. It has been 
accepted for inclusion in Computer Science Faculty Research & Creative Works by an authorized administrator of 
Scholars' Mine. This work is protected by U. S. Copyright Law. Unauthorized use including reproduction for 
redistribution requires the permission of the copyright holder. For more information, please contact 
scholarsmine@mst.edu. 
A Distributed Discrete-Time Neural Network Architecture for Pattern
Allocation and Control  
Anthony T. Chronopoulos,
Dept. of Computer Science,
Univ. of Texas at San Antonio,
6900 N. Loop 1604 West,
San Antonio, TX 78249.
atc@cs.utsa.edu
Jagannathan Sarangapani






The focus of this study is how we can efficiently im-
plement a novel neural network algorithm on distributed
systems for concurrent execution. We assume a dis-
tributed system with heterogeneous computers and that
the neural network is replicated on each computer. We
propose an architecture model with efficient pattern al-
location that takes into account the speed of processors
and overlaps the communication with computation. The
training pattern set is distributed among the heteroge-
neous processors with the mapping being fixed during
the learning process. We provide a heuristic pattern
allocation algorithm minimizing the execution time of
neural network learning. The computations are over-
lapped with communications. Under the condition that
each processor has to perform a task directly propor-
tional to its speed, we show that the pattern allocation
is a polynomial-time problem, solvable by dynamic pro-
gramming.
1 Introduction
There is no consensus on how to simulate artificial
neural networks on parallel machines. During the last
years, researchers have been trying to achieve maximal
performance on their favorite (or available) parallel ma-
chine. Neural networks were implemented on many par-
allel architectures (see e.g. [5, 6, 7, 9]).
Backpropagation or other multilayer neural networks
can be parallelized by network-partitioning, by pattern-
partitioning, or by a combination of these two schemes.

This research was supported, in part, by research grants from (1)
NASA NAG 2-1383 (1999-2001), (2) State of Texas Higher Education
Coordinating Board through the Texas Advanced Research/Advanced
Technology Program ATP 003658-0442-1999, (3) NSF ECS 9985739.
In network-partitioning, nodes and weights of the neu-
ral network are partitioned among different processor s,
and thus the computations of node activations, node er-
rors, and weight changes are parallelized. The idea of
pattern-partitioning [11] is to distribute the training ex-
amples over the processors, i.e. it slices the training set
and it as signs one slice to each processor while keeping
a complete copy of the whole network in each processor
node.
The implementation of a neural network on a hetero-
geneous parallel architecture gives rise to a hard prob-
lem. This problem concerns the optimal mapping of the
network and of the training patterns among the heteroge-
neous processors. This optimization model is generally
a NP-complete integer (or mixed) programming prob-
lem which can be solved either directly (for instance, by
branch-and-bound), or simplified heuristically to a poly-
nomial problem. The mapping algorithms can be static
or dynamic. In the static case, we assume that the map-
ping is unchanged throughout the learning process. In
the dynamic case, we assume that the background work-
load is time varying; hence, it may be necessary to per-
form a remapping as workload changes.
Only a few mapping schemes have been reported to
implement neural network algorithms on parallel archi-
tectures with heterogeneous processors. Chu and Wah
[5] presented an approximation algorithm for the map-
ping of large neural networks on multi-computers, given
a user-specified error degree that can be tolerated in the
final mapping. Saratchandran et al., [7] optimized pat-
tern parti tioning in backpropagation learning on a het-
erogeneous array of transputers. They solved the op-
timization problem in two ways: by branch-and-bound
and by genetic algorithms.
There exist other approaches for optimal data parti-
tioning in distributed systems. Notable are some works
in divisible load theory [3, 12] where a divisible load can
Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS02) 
1530-2075/02 $17.00 © 2002 IEEE 
Master





Figure 1. The master-slave model.
be arbitrarily partitioned and distributed to more than
one computer to achieve a faster execution time. For
non-divisible or discrete loads there are (non-optimal)
strategies which have been used such as the equal and
rectilinear allocation [10]. These strategies have been
used for grid problems but not for neural networks.
In this paper, we consider the pattern-partitioning
scheme. This scheme is particularly suited for feed-
forward neural networks where the size of the training
set is large compared to the size of the network. The
pattern-parti tioning is a coarse grained method because
the number of processors is limited by the number of
patterns.
We next propose a novel feedforward architecture for
a neural network pattern association algorithm based on
dynamic programming.
Distributed Architecture Model:
(1) We consider a dedicated master-slave architecture
based on a network of heterogeneous computers (see
Figure 1). The computers are assumed to have the ca-
pability to perform computation and communication si-
multaneously which is the case in most existing systems.
(2) We consider the pattern partitioning scheme for
mapping our multilayer neural network algorithm for
learning pattern associations onto the computers.
Pattern allocation schemes:
(1) We propose a straightforward proportional alloca-
tion which takes into account the CPU speed of the com-
puters but does not overlap communication with compu-
tation.
(2) We then propose a novel pattern allocation algo-
rithm which takes into account the CPU speed of the
processors and overlaps computation and communica-
tion.
2 A Neural Network Pattern Association
Model
In pattern association, the objective is to learn the
associations via training or by physically associating
the patterns with features. One of the commonly
used method in the literature for learning the associa-
tion or a relationship among inputs(patterns) and out-
puts(features or mapping) is using neural networks as
they have proven to have the universal approximation
property. Typically, the training process of a NN (neural
network) is quite involved as the patterns could be taken
from different regions and the mapping(or association)
could be nonlinear. Therefore, NN training via updat-
ing the weights take considerable time and further the
weights may not converge unless a suitable and mathe-
matically proven training algorithm is deployed. If the
weights do not converge, the pattern associations are not
possible. To address these problems, we are proposing
(i) a Distributed Neural Network learning Algorithm for
faster learning and (ii) a proof of convergence of the
NN weights when the proposed weight tuning scheme
is used.
In our past work [2], we have used Backpropagation
and implemented the parallelized backpropagation algo-
rithm via network partitioning, or by pattern partitioning
or by combination of these schemes. Since the Back-
propagation algorithm is not proven to converge, in this
paper , a novel NN weight tuning scheme is described.
The proposed NN training algorithm via pattern parti-
tioning will not only improve the learning time (execu-
tion time) but also the NN learns the nonlinear pattern
association(or mapping), which is proven using a Lya-
punov based analysis.
In this paper, a novel learning scheme is proposed for
a multilayer NN. Patterns are partitioned and allocated
to different processors and trained using the proposed
algorithm. It was shown using the Lyapunov stability
analysis that the proposed NN weight tuning algorithm
converges to a bounded set (error) when initialized at
zero initially and to an average value thereafter. Here we
have considered two examples one for learning a map-
ping via pattern partitioning and the other for real-time
control.
Pattern Association
The ability of neural networks to approximate large
class of nonlinear systems makes them prime candidates
for the identification of nonlinear pattern associations
[8]. Let us consider a multi-input multi-output nonlin-
ear association, to be constructed by a NN, as
 	 
  
    





where:   
ff	fiffifl is the nonlinear mapping or asso-
ciation to be constructed,  
 fi fl is a vector consist-
ing of patterns,  
ffifi fl is the disturbance or noise
whose bound is given by ! !   
 ! !#"  $ . Here the pat-
terns are a function of time if k is a time index. If the
Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS02) 
1530-2075/02 $17.00 © 2002 IEEE 
association is not a function of time, then k could be a
pattern number.
The objective is to construct a suitable model that
when subjected to the same input %& '( will produce an
output )%& '*	+ ( such that the actual output %& 'ff*	+ ( and
the estimated output )%& ' *,+ ( are very close in some
sense. Now taking the structure of the model same as





& %& '( 0 %& 'ffi12+ ( 0 3 3 3 0 %& 'ffi14 *2+ ( ( (2)
Define the error in association as:
5
& '(6.7%& '(18) %& '( (3)
Then the error in pattern association in the next inter-
val is given by
5
& '-*2+ (6.2%& '*2+ (18) %& '*2+ (6.
.
/




& %& '( 0 3 3 3 0 %& '12+ ( (#*79 & '(




& 3 (#*79 & '( (5)








& 3 ( (6)
This is an error system where the association error sys-
tem is driven by functional estimation error. In the re-
mainder of this paper we focus on selecting NN training
algorithms that guarantees the convergence of the pat-
tern association error.
3 Distributed NN Architecture
One of the most common programming model used
in developing distributed systems application is the
master-slave model. In this model we have a control pro-
gram called master and a number of slave programs. The
master program is responsible for spawning sla ve pro-
grams, initialization and collection of results. The slaves
programs perform the computation on data allocated by
the master or by themselves. The master-slave model
involves no communication among the slaves. Only
the master can communicate with sl aves by message-
passing. The structure of the master-slave model is
shown in Figure 1.
We now describe the mapping of our proposed mul-
tilayer NN algorithm onto the computers. Our NN algo-
rithm trains a given feedforwarding neural network for
a given set of learning patterns. The training of the neu-
ral network can be viewed as discovering values for its
weights in order to match the effective outputs of the
network with the desired outputs, for each input pattern.
In our proposed NN learning, weights can be updated
in two ways:
1. In the per-pattern regime the weights are updated
after each training pattern is presented;
2. In the set-training regime the weight increments are
computed for each training pattern. The increments
are summed for all patterns and the weights are up-
dated with the total increment after all patterns have
been presented one time [13].
Pattern-partitioning schemes for parallelization are
applicable only to set-training updating [9]. However,
our pattern-partitioning scheme can be based on either
a per-training regime or a set-training regime. Here we
intend to use both and evaluate the differences.
We assume a master-slave model with 4 slaves (pro-
cessors). The training set < is partitioned into 4 subsets,
<#= 0>6.8+ 0 ? 0 3 3 3 0 4 . These training subsets are distributed
to the 4 processors. Each slave process contains a com-
plete copy of the whole neural network.
One epoch of our proposed NN algorithm has the fol-
lowing coarse description:
1. The weight changes and bias for the current epoch
are initialized to zero.
2. Each slave process ( @= ) carries out the training
phase for each pattern assigned to it.
3. Each slave process also accumulates the weight
changes and error according to the local patterns.
4. Each slave process sends the weight changes and
errors to the master. The master process computes
the sum of all weight changes and of all errors.
5. The master broadcasts the new weights to all
slaves. The weights are updated on each slave. The
master checks if the convergence is reached.
Remark: We do not use a reduce operation for imple-
menting the communication required in 4. above, be-
cause in a LAN the reduction tree must be mapped to a
single bus and this is very inefficient. More importantly,
the master is needed to assign patterns according to our
algorithm.
The timing diagram for an epoch is shown in Figure
2. In this diagram we used the following notations:
A
@B 0 @C 0 3 3 3 0 @D are the slave processes.
A7E is the master process.
Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS02) 
1530-2075/02 $17.00 © 2002 IEEE 


















.  .  .
(2)(1) (n)
P P1 2 Pn M
Figure 2. The timing diagram for an epoch
F7G H I H J K L M is time taken to initialize the weight
changes and error.
F7NK L M is the number of patterns allocated to the slave
process NH .
F	O-K L M is time taken to perform the training phase of
the algorithm for a single pattern.
F7G P Q RffR is time taken to send the weight changes and
errors from the slaves to the master.
F7G S P T U J is time taken to broadcast the updated
weights.
F	O#I is the parallel execution time on V processors.
In order to obtain the minimum epoch time we have
to overlap communication time ( G P Q RR ) with computa-
tion time and to find a proper pattern distribution among
the processors.
4 Optimization of pattern mapping
We present next an optimization approach (see [2]
for details). The allocation of tasks (or jobs) in dis-
tributed systems may be considered a special case of task
scheduling, without imposed precedence relations in the
execution of tasks. The purpose of a task allocation tech-
nique is to find some task assignment in wh ich the total
cost due to interprocessor communication and task ex-
ecution is minimized. The task allocation problem is
known to be NP-complete [1]. Optimal algorithms are
obtained in very restricted cases. For instance, simpli-
fied versions of task allocation could be solved by dy-
namic programming [4], a method which leads usually
to a polynomial solution. The intractability of the prob-
lem has led to the introduction of many heuristics.
Our pattern-mapping optimization problem is a task
allocation problem. We have to distribute
W
patterns
among V heterogeneous processors, minimizing O#I . In
other words, we have to minimize O#I for one epoch,
on V heterogeneous processors, consi dering also the
weights transmission from each slave to the master.
The computations are overlapped with communications,
considering the following strategy: each processor has
to perform a task directly proportional to its speed. The
fastest processor is always the latest one in computing
one epoch.
We shall cascade the computation times on the V pro-
cessors in the following way. The computation time
for one epoch on a faster processor has to overlap (as
much as possible) with computation plus message pass-
ing times on a slower processor. This would make the
faster processor to be the last one sending its weights
to the master after one epoch. This means also that we
have to map more patterns to a faster processor than to a
slower one. Hence, we prefer to use the fastest proces-
sors over the slowest o nes. Moreover, we shall actually
use a subset of the available V processors. This subset
consists of the fastest processors.
We use the following notations:
F7X;Y a vector of V elements, where NffiK L M Z\[ is









F7abY a vector of V elements, where O-K L M is the
processing time for one pattern (and one cycle) as-
signed to processor L , ]^ L ^	V .
We shall suppose from now on, without restricting
the generality, that OK ] M ^ O-K c M ^ ` ` ` ^ OK V M .
Our objective is to find an optimal vector Xd












minimizing OI . In this case, it is obviously better to
send more work to the faster processors (those with a
low OK L M ). The first processor (the fastest) has to be the
last one sending the weights to the master. Intuitively,
O#I is proportional to OK ] M NK ] M (and this means to NK ] M ,
since OK ] M is constant). Therefore, we have to mini-
mize NffiK ] M . The same result can be achieved if, instead
of maximizing the number of patterns processed by the
fastest processor, we minimize the number of patterns
processed by the slowest processor.
This task allocation problem can be simplified by
considering the following assumption.





M is optimal if it minimizes (maximizes) the
number of patterns allocated to the fastest (respectively,
the slowest) available processor, under the following as-
sumption:
Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS02) 
1530-2075/02 $17.00 © 2002 IEEE 
Assumption 1:
h-i j k l-m i j kn7h-i j o p k lm i j o p k offiq r s tfft-u#pv7jv7wxp
(7)
y
This assumption reduces the size of the search space,
giving us the possibility to find a polynomial solution
to the optimization problem. The meaning of the re-
strictions is that we always try to have no idle time peri-
ods for the fastest processors, by ke eping them busy as
much as possible. The result is an unequal pattern dis-
tribution with overlapping of communication time and
computation time.
For any vector z
i li j k6n2{ u pv7jv7w
,
lffii p k o | | | o
li wk6}	~#k
, respecting the restrictions:
h-i j k li j k6n7h-i j o-p k lffii j op k oq r s ttup-v	jv7w6x-p
(8)
we have the following two properties (the proofs are
obvious, since
hi p k6v	hi  k6v| | | v7h-i wk):
Property 1:
li j k6n2li jop k upv7jv7w x7p (9)
y
Property 2:









i h-i p k lffii p ko7q r s tfftk6}
 
  
h-i p k lffii p ko7q r s tfft
(12)
because of the assumption on ordering the processors
according to their CPU speed.
Based on Assumption 1, we can find an optimal solu-
tion z m by dynamic programming. However, z m is not
an optimal solution of the general optimization problem
(i.e., the pattern mapping optimization without Assump-
tion 1 ) and this can be easily shown by an example (see
Example 2).
4.1 Dynamic programming solution (DP):
We will maximize the number of patterns allocated
to the fastest processor, building a gain array  of
wffi~
elements, where 






the maximum number of patterns allocated to processor
j (the slowest) if we distribute  patterns on processors
p u  u | | | u j
.
We initialize the array as follows:

i p u  k}7 u}8p u | | | u ~ (13)

i j u p k}{ uj6} u | | | u w (14)
We have to compute the rest of the elements of  .
The optimality principle holds, since Assumption 1 is a
recursive relation, and we have:
     # (15)
e 
    ¡   ¡  ¢   ¡ #£¢   ¤¥ ¦ § ¨¨




We can complete  line by line, or column by column.




, we have 





i j#op u  k}{ (17)




i j#op u o2p k6}{ (18)
After completing  , the solution to our optimization








i w x2p k}








l-m i j k}

i j u i ~ffixlm i wkxlm i w x2p kx7| | |
xfflm i j#op k k k upv7jv	w x2p (19)
Fact:
The complexity of the DP algorithm is ¸
i w ~ff¹ º »~#k
.
This follows from: (1) An element  i j u  k can be com-
puted in ¸
i ¹ º »~#k
time using binary search. (2) The
whole array can be completed in ¸
i w ~¼ k
time. Sub-
sequently, the backward phase of the dynamic program-
ming algorithm is in ½
i wk
time.
4.2 Reducing the complexity of DP algorithm
The complexity of DP algorithm can be reduced
based on the following observation. If there is a suffi-
cient number of patterns we can allocate them in two
phases. Phase 1: We allocate most patterns in constant
time so that all the processors will finish execution at
Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS02) 





















Figure 3. Pattern allocation for Example 1
the same time. Phase 2: We apply DP on the remaining
patterns.
We make the following assumptions: (i) all À proces-
sors will be used in the most efficient way; (ii) Á Â Ã ÄffÄ
is not negligible compared to the execution time of one
pattern on any processor and so we want to overlap com-
munication and computation.
The following example illustrates this technique:
Example 1:
We consider a system of ÀÆÅÈÇ processors. The to-
tal number of patterns is É2ÅÆÊ Ê . The execution times










Å,Ð . We assume the com-
munication time: Á Â Ã ÄffÄÈÅ
Í
. We compute the least
common multiplier ( Ñ Ò Ó ) of Ë-Ì Ô Î , Ô Å Í Õ Ï Õ Ç . In this
case Ñ Ò ÓÈÅ,Ç Ö and it represents the execution time for
one time block. We allocate 2 time blocks on each pro-
cessor as in Figure 3. In this figure the dark segments
represents Á Â Ã ÄÄ .
A formal description of this procedure is as follows.
Let
Ë#× Ø





Í Õ Ú Ú Ú Õ
À . We have
two cases: (I) when Á Â Ã ÄÄÜÛ Ë#× Ø Ã Â Ù and (II) when
Á Â Ã ÄffÄÝ
Ë× Ø
Ã Â Ù .
Case I:
1. Since the communication is to be overlapped with
computation, we subtract the number of patterns
ÉÂgÅßÞ2àá âãffä å





overlap with communication. Then, we determine




































Ã Â Ù .
Using the procedures described above we can reduce






























From the remark above the complexity of DP is
ø
Ì
À ÉÂ ý þ ßÉÂ
Î





. This is an important reduction in complex-
ity because in most practical situations ÉñÀ .
Remark:One can generalize this approach to consider
the case when it is more efficient to use fewer than
À processors. Then all ÉÂ7Å Þ Ùá âãffä å






Í Õ Ú Ú Ú Õ
À must be examined.
5 Multilayer NN design
In this paper, a three-layer NN is considered and the
convergence analysis is carried out for the error system
in equation (6). A novel learning scheme is proposed for
the NN. Assume that there exists some constant weights

and  for the three-layer NN such that the nonlinear
mapping 
Ì Ú Î

















is a matrix of hidden-to-output layer
weights,  is a matrix of input-to-hidden layer weights

Ì Ú Î
















Assumption: The ideal weights are bounded by
 

 7ÛÄ  and    7Û Ä  and the hidden
layer and the input activation functions are bounded by





























Ì Ú Î (22)












Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS02) 
1530-2075/02 $17.00 © 2002 IEEE 
with the hidden layer output error
 
!#"!%$'& ! (25)
Using the above equation the error system in (6) can
be written as:
( ) *,+.- /
"

















) *1/+36 ) *1/	+7 ) *1/ (28)
The next step is to determine the NN weight updates
so that the boundedness of the associated error is guar-
anteed.
6 Weight updates for guaranteed perfor-
mance
Theorem: Given the nonlinear pattern association
system with the NN weight updates, and let the weight







) <) *1/ / =
&










) *,+.- / (30)
with :C AEDGF denoting constant learning rate pa-
rameters or adaptation gains, provided that the following
conditions hold
:H H ;
) <) *1/ /
















Then the pattern association error, e(k), and the weight
updates are uniformly ultimately bounded.









































































) *1/ @ (34)







































) <X) *1/ /
5
;





) <) *1/ /
5
;











) <X) *1/ /
5
;
) <X) *1/ /
KV$B:;
) <X) *1/ /
5
;














































Since L N , L \ , L
I























Now summing the change in the Lyapunv function

















F as long as (31) then (32) hold. The
definition of
Y
and the inequality (36) imply that every
initial condition in the set l will evolve entirely in l .
That is whenever the association error H H ( ) *1/ H H is outside






further imply that H H ( ) *1/ H H will not increase and will re-
main in l . This demonstrates that the association error







are bounded. The dynamics







) <X) *1/ /
5
;




























) O / = 2 ) O / @
5
(38)
where the association error is considered bounded.













) *1/ are as-
sured.
Remark 1: In the above theorem, for the case of pat-
terns that depend upon time, the theorem shows that with
Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS02) 
1530-2075/02 $17.00 © 2002 IEEE 
the developed NN weight tuning scheme, the pattern as-
sociation and weight estimation errors converge to a set
as time increases. For the case of patterns that are not
a function of time, the time index o now becomes the
pattern numbers. In this case, with the patterns used as
inputs to the NN over and over again during training, the
NN will learn the association since the pattern associa-
tion error and the weight estimation error converge.
Remark 2: This theorem shows that even when the
weights of the NN are initialized at zero or any other
value, the pattern association and weight estimation er-
rors converge to a small set. This further implies that if
the initial weight values are close enough to their target,
the convergence of the pattern association and weight es-
timation errors become faster than if the initial weights
are far off from their targets. However, in the case of
pattern association, there is no guarantee that the initial
weights will be selected close to their target values ex-
cept that patterns are allocated based on their locations
or regions. In such a case, based on their regions, ac-
tual weights from the individual slave processors, can
be quite different. By averaging the weight values from
several slave processors and subsequently using them as
initial weights will push the actual weights closer to their
targets in the subsequent cycles of training and hence the
overall convergence will be faster. Therefore, by averg-
ing the actual weights in the subsequent cycles and using
the proposed weight updates, the NN learns the patterns
associations efficiently and this leads to a faster error
convergence.
Control: In closed-loop adaptive control application,
NN identifiers are used to learn the nonlinear mapping
and subsequently to use this information for control.
Here the pq o1r will become the states of the unknown
nonlinear system an sXq pXq o1r r will be the unknown non-
linear function to be approximated. For the case of iden-
tification of such unknown nonlinear systems, the pro-
posed algorithm can be employed. By distributing the
patterns, here the states, the nonlinear function can be
approximated globally.
References
[1] H. Ali and H. El-Rewini. On the intractability of task al-
location in distributed systems. Parallel Processing Let-
ters, 4:149–157, 1994.
[2] R. Andonie, A. T. Chronopoulos, D. Grosu, and
H. Galmeanu. Distributed backpropagation neural net-
works on a PVM heterogeneous system. In Proc. 10th
IASTED Intl. Conf. on Parallel and Distributed Systems
(PDCS’98), pages 555–560, October 1998.
[3] V. Bharadwaj, D. Ghose, V. Mani, and T. G. Robertazzi.
Scheduling Divisible Loads in Parallel and Distributed
Systems. IEEE Computer Society Press, Los Alamitos,
CA, 1996.
[4] B. Boffey. Distributed Computing-Associated Combina-
torics Problems. Blackwell Scientific Publications, Ox-
ford, 1992.
[5] L. C. Chu and B. W. Wah. Optimal mapping of neural-
network learning on message-pasing multicomputers.
J. of Parallel and Distributed Computing, 14:319–339,
1992.
[6] M. Crespo, F. Piccoli, M. Printista, and R. Gallard. Par-
allel shaping of backpropagation neural networks in a
workstations-based distributed system. In Proc. EIS’98
Int. ICSC Symp. on Engineering of Intelligent Systems,
pages 709–715. ICSC Academic Press, February 1998.
[7] S. K. Foo, P. Saratchandran, and N. Sundararajan. Paral-
lel implementation of backpropagation neural networks
on a heterogeneous array of transputers. IEEE Trans. on
Syst., Man and Cybern. Part B: Cybernetics, 27(2):118–
126, February 1997.
[8] S. Jaganathan and F. Lewis. Multi layer discrete-time
neural-net controller with guaranteed performance. IEEE
Trans. on Neural Networks, 7(1):107–129, 1996.
[9] V. Kumar, S. Shashi, and M. B. Amin. A scalable parallel
formulation of the backpropagation algorithm for hyper-
cubes and related architectures. IEEE Trans. Parallel and
Distributed Syst., 5:1073–1090, 1994.
[10] D. M. Nicol. Rectilinear partitioning of irregular data
parallel computations. J. of Parallel and Distributed
Computing, 23:119–134, 1994.
[11] H. Paugam-Moisy. Parallel neural computing based on
network duplicating. In I. Pitas, editor, Parallel Algo-
rithms for Digital Image Processing, Computer Vision,
and Neural Networks, pages 305–340. John Wiley &
Sons, 1993.
[12] J. Sohn, T. G. Robertazzi, and S. Luryi. Optimizing com-
puting costs using divisible load analysis. IEEE Trans.
Parallel and Distributed Syst., 9(3):225–234, March
1998.
[13] J. M. Zurada. Artificial Neural Systems. PWS Publishing
Company, Boston, 1992.
Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS02) 
1530-2075/02 $17.00 © 2002 IEEE 
