Generalized Methodology for Array Processor Design of Real-time Systems by El hadidy, F. & Herrmann, O.E.
GENERALIZED METHODOLOGY FOR ARRAY PROCESSOR DESIGN OF 
REAL-TIME SYSTEMS 
F.  Moelaert El-Hadidy and O.E. Herrmann 
University of Twente, Laboratory for Network Theory 
P.O. Box 217, 7500 AE Enschede, The Netherlands 
Tel: $31-53-892822, Fax: $31-53-340045 
E-mail: ferialant .el.utwente.nl 
ABSTRACT 
Many techniques and design tools have been developed for  
mapping algorithms t o  array processors. Linear mapping 
is usually used for  regular algorithms. Large and complex 
problems are not regular b y  nature and regularzzation may 
cause a computational overhead which prevents the ability 
2 0  meet real-time deadlines. In  this paper ,  a systematic 
design methodology for  mapping partzally-regular as well 
as regular Dependence Graphs is presented. I% this ap- 
proach the set of a l l  optamal solutions is generated under 
the given constraints. d u e  t o  nature of the problem and 
the tight timzng constraints of rea,l-time systems the set 
of alternative solutions i s  limited. A n  image processing 
e x a m p l e  is discussed. 
tions for scheduling and projection. Our aim is to  de- 
sign a powerful methodology for systematically mapping 
M-dimensional DG's directly into I<-dimensional array 
processors for real-time systems, where M > A'. This 
method offers a flexible platform t o  investigate the possi- 
ble mapping alternatives under a given set of constraints. 
Emerging from the fact that  real-time systems have tight 
timing requirements, we show that  the set of alternative 
solutions is limited. Further, we show that  under a cer- 
tain set of constraints the solution set is independent of 
the problem size. 
In Sections 3 and 4 the branch-and-bound approach 
for the scheduling and projection problem is presented. 
Further, complexity issues are discussed in Section 5 and 
mapping to  fixed size arrays is discussed in Section 6. 
1, INTRODUCTION 
2. HIERARCHICAL DG REPRESENTATION 
Array processors are well suited to  efficiently implement 
a major class of signal processing algorithms due to  their 
para.llelism and regular da ta  flow [KUN88]. A widely 
used approach for mapping algorithms to  array proces- 
sors is the Dependence Graph ( D G )  methodology. In this 
methodology, first an algorithm is developed in Single h s -  
signment Code ( S A C )  where each variable is only allowed 
t.o have a single value. Then t,he algorithm is represented 
in a graphical form by a DG [KUN88]. The nodes of 
the DG are then mapped into an array processor. In 
literature, several techniques and software packages have 
been reported for the automation of the mapping (see e.g. 
[QUISI], [MOLW], [RAO88], [hNN88], and [JAYSla]). 
Except for [JAYSla], only mapping of regular DG's has 
been fully automated. The  DG's for large and complex 
problems are not, regular in general and are very difficult 
to  make regular by adding dummy operations. 
In an earlier paper [MOE92], based on work done 
in [JAYSlb], we presented a n  Integer Linear Program- 
ming ( U P )  formulation for mapping (semi-)regular D G s  
to  array processors. In this paper, we use a branch- 
and-bound technique to  oht,ain the set of optimal solu- 
'The best way to  manage the complexity of large sys- 
tems is to  adopt a hierarchically structured design. In 
literature, a hierarchical design environment has been 
treated in [KUN84], [ANN€%], [THI88] and [JAYSla]. 
Here we adopt the hierarchical form of the S A C  pro- 
posed in [JAYSla]. This form is referred to  as Structured 
Single Asszgnment Code ( S  AC). The graphical repre- 
sentation of the S 2AC description is called Structured 
Dependence Graph (SDG) .  The canonical forms of the 
S ' A C  and SDG are used for the construction of the DG 
with local- dependence edges in a minimum dimension Eu- 
clidean space to  keep projection simple. 
An index point in a DG can, in general, contain a 
set of variables whose computations are dependent on 
variables from neighboring as well as same index point 
(multi-variable DG's). Single-variable DG nodes can be 
linearly scheduled which is not always the case for multi- 
variable nodes [RA088]. Yet most systematic methodolo- 
gies proposed in literature treat multi-variable nodes as 
single-variable nodes. Here the SDG is used to  model the 
single-variable as well as multi-variable DG. 
0-7803-24404/94/ $4.00 @ 1994 IEEE 4C. 14.1 145 
3. ALTERNATIVE NODE SCHEDULES 
For mapping regular iterative algorithms the systolic sched- 
ule is used, represented by the schedule vector s'. A sys- 
tolic schedule implies that  there is a t  least one delay on 
each edge of the resulting array processor. In semi-regular 
arrays however, the best schedule is not necessarily lying 
along a linear path. Therefore a more efficient approach 
has to  be derived. 
We define a DG as a directed graph G = {V, E} where 
V is the set of nodes and E is the set of directed edges. 
The set of nodes V = I U N U 0 contains input nodes 
I ,  output nodes 0, and intermediate nodes M. The fea- 
sibility of a schedule is determined by the partial order- 
ing and process assignment scheme. A node should have 
valid da ta  on all its input edges before it can be scheduled. 
Given the earliest schedule of all nodes nJ E I ,  the earliest 
schedule time of any node nJ E N U 0 can be found. The 
latest schedule time of nodes nJ E 0 is also known since 
the system must meet a set of deadlines which imposes 
that the output must be available before a specific time. 
The latest schedule of all nodes nJ E N U I can thus be 
'calculated. Therefore, for each node ni an earliest sched- 
ule s& and latest, schedule SI is given. I f s &  = sj then node 
ni is called a critical node. A path containing only critical 
nodes is called a critical path. Ifs: > si for any ni E V 
then no valid schedule can be found to  meet the given 
deadline. For si, < sf a range of schedules is available. 
Definition 3.1 Schedule range of a DG node For a 
dependence graph G = {V, E }  given the earliest schedule 
time s i  of all input nodes ni E I ,  and the latest schedule 
time SI of a l l  output nodes ni E 0, the schedule range ri 
o f  node ni as [<si$ s i ] ,  where sk is the earliest schedule time 
cif the node na and si is  the latest schedule time of the 
node n 2 .  
. .  
From the above we conclude that for a given DG, each 
node n2 E V can only be scheduled in the schedule range 
rz  = [s i ,  sj], where s i ,  s; E Zf . There are a number of 
basic requirements needed for generating a suitable sched- 
ule for a DG. Based on the features of array processors 
we define the basic requirements for scheduling. 
. .  . .  
Definition 3.2 Scheduling requirements 
1 The schedule of  n node n mzst be a t  least  o n e  unzt 
d e l a y  hzgher than the haghest schedule of a l l  nodes 
havang an edgr  to node n. 
2. Input nodes 1%' E I should not be scheduled rarlaer 
than the requzrements defined b y  the system. 
3. Output nbdes n3 E 0 should not be scheduled later 
than the system defined deadline. 
The basic idea of t8he Alternative Node Schedules ANS 
algorithm is to  generate an enumeration tree of all possi- 
ble schedules in the DG given the schedule ranges ri such 
that the requirements in Definition 3.2 are satisfied. The 
ANSalgorithm generates a set of solutions GS. A solution 
GP E GS is a set of { < nj ,  s j  >} where nj is a node with 
schedule S I .  The ANS algorithm in its general form gener- 
ates a set of solutions which grows exponentially with the 
problem size. In practice the designer is only concerned 
with solutions that are optimal with respect to  certain cri- 
teria. Note that introducing a wide schedule range gives a 
huge amount of solutions which may be redundant, time 
consuming, and may even be hard to  generate for large 
DG's. There exists a maximum range beyond which the 
set of solutions offers no more improvement. This range 
depends on the dimension of the array processor. 
A semi-regular DG contains a set of connected sub- 
DG's. These sub-DG's are regular. Keeping uniform de- 
lay distribution in the sub-DG's simplifies the design, Let 
Ri be the set of all edges along a linear path and Ej be 
the set containing all edges on a selected number of lin- 
ear paths belonging to several Ri with parallel edges. We 
partition the set of edges in the D G  into a number of sets 
Ei, such that ViVj+Ei n Ej = 4 and &Ei = E. A set 
{< Ei, dz >} specifies for each Ei a delay dz (i.e. all the 
edges in E' have the same delay di). Once the schedule 
time for an edge along a linear path in Ei is chosen, the 
delay dz is fixed for all edges in Ei. This is done for all 
sets E'. 
Further, for mapping from a M-dimensional DG to 
a I<-dimensional array processor, any node can be con- 
nected to a maximum of 3K - 1 nodes scheduled in the 
same time slot. We now define the set of constraints for 
scheduling as follows. 
Definition 3.3 Schedule constraints 
1. The maximum number of nodes 113 having an edge  
t o  na and the same schedule time must be less than 
3" - 1. 
2. For aJl nodes with only external input edges se2 8; = 
st and/or for all nodes with only external output 
edges set s i  = si 
3. For euch sub-DG, create a set { ( E i , d i ) }  for each 
e d g e  dzrectaon (3K  - 1) /2  in  the IC-dimensional Eu- 
clzdean space.  
We call the constraints in Definition 3.3  for generating 
the set of optimal solutions GSOpt,  tight constraints. A 
solution may be either linear or non-linear. Under tight 
constraints, the set of solutions GS,,t is constant for a 
specific algorithm with respect to  problem size. 
Theorem 3.1 Bounding rule for the scheduling The 
saze of the  set GS,,t is independed of the problem size if 
146 4C.14.2 
Figure 2: Image Detector with Mask Width 4 (a)Image 
Width 4 (b)Image Width 8 (c)Image Width 16. 
Figure 1: The DG of'the Edge Detector for a mask width 
of 8 and image of (A4 + 1) x ( L  + 1) pixels. 
the bounding constraints are tight. Furthermore, the sets 
of solutions jor different sues  are equivalent. 
The reader is referred to  [MOE94] for the proof. In 
case of tight constraints the complexity of the ANS al- 
gorithm is O ( U H ) ,  where U is the number of sub-DG's 
and N is the number of edge directions. In fact, even if 
the constraints are not so tight, there still exists a set of 
solutions tha t  is independent of the size of the DG. Since 
the result is invariant to  the problem size, the set of opti- 
mal schedules can be generated from a Reduced size DG 
(RDG).  The complexity of the algorithm for calculating 
the schedule time of all nodes in DG is O(N+ x A,), where 
N,  is the number of nodes in the RDG, A, = nEl ai and 
ai is the scaling factor for the ith dimension. 
We choose the edge detection problem as an example. 
The DG of the Edge Detector is three dimensional. Figure 
1 shows one part of algorithm. The other part is identical. 
Readers are referred to  [JAYgIa] for the derivation. Due 
to  the huge proccssing power requirement parallel pro- 
cessing is needed. The black nodes on the far right add 
the result of both parts. White nodes are convolution 
functions and dark grey nodes are row to column transla- 
tion functions. It is clear that  the DG for this problem is 
inhomogeneous. We now apply the ANS algorithm for a 
mask width of 4 and an image width of 4,s and 16. The 
schedule range for all nodes is 2. A bounding constraint 
is added for each edge direction in all subgraphs. The 
behavior of the three different image sizes is compared in 
the plot of Figure 2. Tn all three cases, the final set of 
possible schedules contains five optimal solutions for all 
different DG sizes. 
4. ALTER.NATIVE NODE PROJECTION 
We construct, a n  nlgorithrn to  find all valid linear (non- 
linear) projections. Linear mapping involves projection 
along a straight line whereas nonlinear mapping means 
that  multiple nodes not necessarily along a straight line 
map to  the same PE. In certain circumstances, a nonlinear 
mapping may offer some unique flexibility and advantages 
such as fewer PE's, faster pipelining, or higher utilization 
of the array. This may allow nonlinear scheduling. On 
the other hand it usually incurs the expense of somewhat 
sophisticated control. If an advantageous trade-off can be 
reached, a nonlinear schedule mapping may become pre- 
ferred. We define the projection requirements based on 
the definition of array processors in [KUN88]. 
Definition 4.1 Projection requirements: 
4 Preserve spatially local interconnectzon. 
e The dimension of the array processor determines the 
maximum number of communzcation links allowed. 
. 4 Inpirt/Output nodes should remain on the boundary. 
The set of permissible positions to  place a node ni on 
a P E  depends on the current position of all nodes tha t  
have an edge to  node ni. Each instance of the placement 
can be modeled by a rectangular volume which we call 
polyrec. Polyrec is a poly tope  such that a l l  the hyperplanes 
lying on its boundary are perpendicular t o  one of the axes 
and orthogonal to each other. Further a polyrec is fully 
represented b y  two extreme points lying on the boundary. 
Let p'j represent the location of a P E  on the array 
processor such that  p'j = [p:, pa, ..., PA]'. Let Pi be the 
set of all positions p ' j  of nodes nJ having edges to  node n i .  
Given the set P i ,  the set of all valid positions that  node ni 
can occupy resides inside the polyrec Ai. The Polyrec Ai 
can be fully represented by two extreme points [MOE94]. 
For a number of edges t i  to a node ni and dimension I< 
the complexit,y of finding polyrec Ai is 0(tiA'). In gen- 
eral there are a maximum of 3x positions a node can be 
projected into, where I< is the dimension of the array pro- 
cessor. We can now define polyrec Ai for each node ni in 
4C.14.3 147 
the DG when the position of all the nodes having an edge 
to  node nz is known. Based or, this an algorithm is de- 
veloped to generate all possible solutions. We start the 
Alternative Node Projection ( A N P )  algorithm by plac- 
ing all nodes which have only external input edges in the 
A-dimensional array processor space. Due to da ta  depen- 
dencies these combinations are limited. The rest of the 
nodes are then placed using polyrec  Ai .  We propose the 
following projection constraints: 
Definition 4.2 Projection Constraints: T h e  e n u -  
m e r a t i o n  t r e e  f o r  p r o j e c t i o n  s o l u t i o n s  must be p r u n e d  f o r  
each  
e 
e 
e 
e 
n o d e  nz zf a n y  of t h e  f o l l o w i n g  c o n d i t i o n s  hold:  
A n o d e  ni c a n  o n l y  be pro jec ted  o n t o  a p o s i t i o n  p ' j  
t h a t  l i es  within t h e  polyrec  A' .  
Two n o d e s  m i t h  t h e  s a m e  s c h e d u l e  c a n  n o t  be p r o -  
j e c t e d  t o  t h e  s a m e  PE. 
T h e  n u m b e r  o f  PE's s h o u l d  n o t  exceed s o m e  u p p e r  
b o u n d .  
M a x i m u m  n u m b e r  o f  c o m m u n i c a t i o n  l i n k s  3M - 1 
m u s t  be p r e s e r v e d  f o r  each  PE. 
Up to  now, factors such as complexity of the result- 
ing PE and non-uniform distribution of 1/0 nodes on the 
boundary has not been taken into account. The ultimate 
performance goal of an array processor system is a com- 
putation rate that  balances the available 1/0 bandwidth 
with the host. In order to achieve this we have to guar- 
antee that the 1/0 nodes are uniformly distributed and 
match t>he interface to  the outside world. An additional 
set of constraints are therefore needed. 
Definition 4.3 Additional projection constraints: 
e I n p u t / O u t p u t  n o d e s  s h o u l d  r e m a i n  o n  t h e  b o u n d a r y .  
e P r e v e n t  t h e  m a p p i n g  o f  n o d e s  with d i f ferent  f u n c -  
t i o n a l i t y  o n t o  t h e  s a m e  P E ( o p t i o n a 1 ) .  
e R e m o v e  e q u i v a l e n t  a n d  s i m i l a r  s o l u t i o n s .  
The A N P  algorithm finds the set of all possible map- 
pings { G P i }  under the constraints in Definitions 4.2 and 
4.3. It maps a M-dimensional DG to a I<'-dimensional 
array processor and finds the set of all possible linear and 
non-linear projections. No solution is found if a node vio- 
lates the set of constraints for all intermediate solutions. 
Linear projection has been thoroughly studied in liter- 
ature. Yet in certain circumstances a non-linear mapping 
may offer some unique flexibility and advantage. To ex- 
tract the optimal 1inea.r and non-linear solutions {GP'}  
in terms of array processor characteristics and given con- 
straints we need to  define an extra set of constraints which 
we call bounding rules. Let us define a bounding rule for 
I C  I 
C I I I 1 1  1 1 1  
( 8 1  (hi /'I (di (4 
Figure 3. All possible linear and semi-linear projections 
(a) Horizontal (b) Diagonal (c) L-shape (d) inverted L- 
shape (e) Vertical. 
a regular M-dimensional DG. The intersections of all hy- 
perplanes lying on the boundary of the DG form a poly- 
tope. We call this polytope a DG-polytope. Let Ri be a 
set of all edges having the same direction along a linear 
path and Ea be the set containing all edges on a selected 
number of paths. 
Definition 4.4 The bounding rule for a regular M- 
dimensional DG of size ( a l ,  a ~ ,  ..., a ~ )  E Z,$: F o r  
s e t  Rz o n  t h e  lanear p a t h  j o z n z n g  t w o  ver t zces  of t h e  DG- 
poly tope  a n d  lyang o n  t h e  b o u n d a r y ,  a l l  edges  an R' h a v e  
t o  follow t h e  s a m e  r u l e  of p r o j e c t z o n  2.e. edge dzrectzons 
a f t e r  p r o j e c t t o n  are  i d e n t i c a l  t o  e a c h  o t h e r .  
Definition 4.4 guarantees that 1/0 nodes are mapped 
uniformly. This can be generalized to include the set E'. 
The enhanced A N P  algorithm uses set E2 to  add bound- 
ing constraint,s. An example of the projection set {GP'}  
for a 3x3 matrix-vect,or multiplication using Definition 4.4 
is given in Figure 3. This set contains linear as well as 
semi-linear mapping. Whether the matrix is a 3 x 3 or 
n x R DG, the set { G P i }  contains 5 alternative solutions 
which are equivalent for all sizes of the DG. In case an 
?n x n DG where m # n, only solutions (a), (b) and ( e )  
are possible. This means that given a regular array and 
using Definition 4.4, the set { G P i }  is dependent on the 
topology of the DG but independent of the size of the DG. 
This is a very interesting result. 
The above discussion assumes that the DG boundaries 
lie on hyperplanes orthogonal to  each other. This is not 
always the case e.g sorting problem. We therefore define 
a general bounding constraint for regular DG's. 
Definition 4.5 General bounding rule for a regu- 
lar M-dimensional DG: F o r  s e t  E2 of t h e  l i n e a r  p a t h  
joaning  t w o  v e r t i c e s  o f  t h e  D G - p o l y t o p e  a n d  l y i n g  o n  t h e  
b o u n d a r y  o f  t h e  DG, al l  edges  in Ei h a v e  t o  f o l l o w  t h e  
s a m e  p r o j e c t z o n  r u l e  i . e .  edge d i r e c t i o n s  a f t e r  m a p p i n g  
are  i d e n t i c a l  t o  each  o t h e r .  If t h e  p a t h  c o n s i s t s  o f  f loa tzng  
n o d e s ,  create  a s e t  Ei o f  al l  edges,  paral le l  t o  each  o t h e r  
14% 4C.14.4 
I ry 1 
I 
I 
3 
Figure 4: Different constraint techniques for the image 
detector (a) Local Boundary (b) Global Boundary (c) 
Global Boundary with orthogonal boundary constraints. 
Figure 5: Solutions for an image width of N and a mask 
width of N using global bounding constraints. 
and entering the floating nodes. The source nodes of these 
parallel edges are treated as boundary nodes. 
A more general problem representation is given by 
semi-regular D G s .  This is evident in algorithms which 
consist of a set of interconnected recurrency equations. 
DG’s of such algorithms consist for example of a cascade 
of regular sub-DG’s. Treating the DG as a whole will ease 
the data reformatting and increase the pipeline rate. 
In the edge detector example there are 34 projections 
possible when boundary constraints of Definition 4.4 are 
placed on each sub-DG as shown in Figure 4 (a). No- 
tice that all the nodes in the last sub-DG are floating (far 
right). In that case additional boundary constraints be- 
tween sub- DG‘s are needed. Placing boundary constraints 
on the DG as a whole (Figure 4 (b)) results in 9 optimal 
solutions (Figure 5). Def in i t i on  4.5 is generalized for 
semi-regular DG’s as follows: For al l  boundaraes be- 
tween sub-DG: af any ofthe boundaraes have floatang nodes 
a d d  boundang constraants as  above. We now define an im- 
portant theorem. 
Theorem 4.1 Tight Bounding rules for projection 
of a DG Given a DG there exists a bounding constraint 
which generates the set  of a l l  optimal solutions CP,,t = 
{GPi} .  The set GP,,t is independent of the problem size 
if the bounding constraints are tight and the scheduling 
chosen applies a uniform distribution of delays. 
See proof in [MOE94]. This method can be applied on 
any form of mapping by changing the valid-position rules 
and the constraints rule. Even if the bounding constraints 
are not tight enough, then the reduced size DG will still 
generate a set of optimal solutions S,. . This is evident in 
Figure 4 (b) and (c). Figure 4 (b) has constraints on the 
complete DG which are not so tight on the sub-DG’s. In 
this case, both bounding constraints give the same set of 
mappings seen in Figure 5 .  Yet the computation time of 
Figure 4 (b) is higher because the ipternal nodes have a 
higher degree of mobility. Further, the set of solutions in 
Figure 5 has linear as well as nonlinear projections. The 
non-linear solution (3) has a simpler PE since the con- 
volver and row t,o coliimn t,ranslation nodes are mapped 
to different PE’s. Furthermore, the number of PE’s per- 
forming multiplication is fixed to 8 independent of the 
image size. 
5 .  C O M P L E X I T Y  ISSUES A N D  S C A L I N G  
The time bound for both algorithms is limited by V - 1 
stages where V is the set of nodes. The average com- 
putations per stage are proportional to the set of edges 
to a node. It is apparent that  all computations along 
the hyperplane orthogonal to the flow of data have no 
mutual dependency. Therefore they can be executed si- 
multaneously. In general there is always a certain degree 
of dependency which dictates the sequence of the compu- 
tation. The choice of the order in which nodes are to be 
placed in each step has an influence on the computation 
time but has no effect on the end result. Take the image 
detector example with image length of 3 and mask width 
of 3 and find the set {GPi}  for the constraints as given 
in Figure 4 (c). For different ordering we get a different 
distribution of the set of intermediate solutions. Three 
orders were simulated as shown in the plot of Figure 6. 
The local maximum of the peaks increase as the com- 
putation gradually proceeds because within the search 
path,  the internal nodes have a higher degree of freedom 
than boundary nodes. For a small size DG this is not a 
problem but as the size of the DG increases this grows ex- 
ponentially. This may cause the algorithm to run out of 
memory before reaching a solution. There are two ways to  
solve this problem. One is to  add additional internal con- 
straints concentric to the boundary constraint. This will 
reduce the internal peaks and speed up the calculation yet 
guarantee that the set of optimal solutions {GPi}  is the 
same. Since the result is invariant to  the size according 
to Theorem 4.1, another way is to solve for small size ar- 
rays and then scale up the result to the required size. The 
complexity of the algorithm for scaling is O(W+E), where 
4C.14.5 149 
lteratlons 
Figure 6: Variation of the number of solutions as a func- 
tion of (a) Horizontal order (b) Vertical order (c) Breadth 
first preordered spanning tree. 
W is the number of nodes in DG and E is the number of 
edges in RDG. 
6. MAPPING T O  A FIXED SIZE ARRAY 
A major area of researc,h for systematic design methods is 
dedicated to  the general problem of mapping classes of al- 
gorithms onto regular array processors with limited num- 
ber of processing elements, communication link or nieni- 
ory size. Systematic design of processor arrays with a 
given dimension and given number of PE's is called pnrtz- 
tioning. Existing approaches to the partitioning problem, 
however do only partially treat the problems like map- 
ping from a M t o  I< dimensional space directly, where 
M > K .  Another point is that  the approaches are bound 
to special structures. A unified approach to  the solution of 
the partitioning problem to realize all known partitioning 
schemes [TE193] and to linear and nonlinear mapping is 
not available. The  algorithm mentioned in this paper can 
be used to  map arrays with limited resources. An upper 
limit on the number of PE's can be used or a boundary 
representation (b-reps) is defined. 
7 .  CONCLUSIONS 
A systematic approach is presented for mapping algo- 
rithms into array processors. This approach uses the 
branch-and-bound technique to find the set of all opti- 
mal solutions. The  power of this approach  lies i n  the 
ability to  generate the set of possible mapping alterna- 
tives using mixed linear and non-linear mapping. I t  has 
also been shown that  the resulting set is limited and inde- 
pendent of the problem size. This is especially interesting 
for modeling large and complex problems. Further, map- 
ping from M-dimensional space to I<-dimensional space, 
where M > I<, is done in one step. 
For mapping to fixed size arrays, it  has been shown 
that  different partitioning techniques, can be modeled in 
the algorithms using regularazed Boolean set operatzons for 
the design of 2 and 3-dimensional array processors. 
8 .  R,EFERENCES 
[ -4N N 8 81 
[JAY 9 1 a] 
[JAYSlb] 
[KU N 841 
[ K U N88] 
[M 0 E9 21 
[ M 0 E9 41 
[ M 0 L, 8 71 
[QUI841 
[RA0881 
[ROY 8 61 
[TEI93] 
[THI88] 
J .  Annevelink, A Des ign  Me thod  for Imp lemen t -  
ing Signal  Processing i l lgor i thms  o n  V L S I  Pro- 
cessor Arrays ,  Ph.D. Thesis, University of Delft, 
T h e  Netherlands, 1988. 
J.A.K.S. Jayasinghe, A n  A r r a y  Processor Design 
Methodology f o r  Hard Real- T i m e  S y s t e m s ,  Ph.D. 
thesis, University of Twente,  T h e  Netherlands, 
1991. 
J.A.K.S. Jayasinghe, F. Moelaert El-Hadidy and 
0 . E .  Herrmann, A n  A r r a y  Processor Design 
Methodology f o r  Hard Real- T i m e  S y s t e m s ,  I E E E  
International Symposium on Circuits and Sys- 
tems, Singapore, 11-14 June 1991. 
S.Y. Kung,  J .  Annevelink and P.M. Dewilde, 
Hierarchical I terat ive  Flow-Graph Integrat ion f o r  
V L S I  A r r a y  Processors, VLSI Signal processing, 
IEEE Press, pp. 294-305, 1984. 
S.Y. Kung V L S I  A r r a y  Processors, Prentice Hall, 
1988. 
F. Moelaert El-Hadidy and O.E. Herrmann, I n -  
teger L inear  Programming A lgor i thms  for A r r a y  
Processor based Real- T i m e  S y s t e m s ,  University of 
Twente, Internal Report number 9 2 N  188, August 
1992. 
F. Moelaert El-Hadidy , Generalized Methodolo- 
gies f o r  A r r a y  Processor Des ign  of Real- T i m e  sys- 
t e m s ,  University of Twente, PhD. Thesis, 1994. 
Dan I. Mo1dova.n A D V I S :  A Sof tware Package f o r  
the Design of Systol ic  Arrays ,  IEEE Transaction 
on Computer-Aided Design, vol. CAD-6, no. 1, 
pp. 33-39 ,  January 1987. 
P. Quinton, A u t o m a t i c  Syn thes i s  o f  Systol ic  A r -  
rays f r o m  U n i f o r m  Recurren t  Equa t ions ,  Pro- 
ceedings of the  11th. Annual Symposium on Com- 
puter Architecture, pp. 208-214, July, 1984. 
Sailesh K .  Rao and Thomas  Kailath, Regular It- 
erative A lgor i thms  and  their  I m p l e m e n t a t i o n  on 
Processor Arrays ,  Proceedings of the IEEE,  vol. 
76, no. 3 ,  pp. 259-269, March 1988. 
V.P. Roychowdhury and T .  Kailath, Regular Pro- 
ing,  IS1 preprint, Standford University, Stand- 
ford, CA, 1989. 
J. Teich and L. Thiele, Par t i t i on ing  of Processor 
Arrays:  A Piecewise Regular Approach ,  Integra- 
tion, T h e  VLSI Journal, vol. 14, no. 3,  pp. 297- 
332,  February 1993. 
L. Thiele, O n  the Hierarchical Design of VLSI 
Processor Arrays ,  I E E E  Symposium on Circuits 
and Systems, Helsinki, pp. 2517-2520, 1988. 
cessor Arrays  f o r  Matrix  Algorithms with Piuot- 
150 4C.14.6 
