Many hard real-time systems need huge computing power and they are mostly designed by ad hoc methods. A m y pmcessors provide a viable means to achieve huge computing power and they can be designed systematically. This paper presenb 4 sysfemafic design methodology to design a m y processor based hard real-time systems.
Introduct'lon
Real-time systems must produce not only logically correct results, but also meet timing constraints. Depending on the types of timing constraints, real-time systems are divided into two groups: Hard real-time systems and Soft real-time systems [l] , [2] . A soft real-time system must produce computations as fast as possible such that a statistically described response time is satisfied. In a hard real-time system, computations must be finished before a given deadline.
Analogous to the status of VLSI design at its infancy, cur- rently there is no scientific basis for hard real-time system design [2] . Though most state-of-the-art hard real-time systems have been designed by ad hoc methods, a scientific approach for hard real-time system design is esSential as verification of the ad hoc designs are costly and error prone. Due to huge processing power requirements, almost all hard real-time systems need a multiprocessing edvironment. According to r2], a multiprocessor hard real-time system must possess the following features: Homogeneity, Scalability, Survivability and Flexibility.
Array processors consist of a set of modular processing elements (PES) with spatially local communication, which makes them homogeneous and scalable. Survivability and flexibility can be introduced in the array processor design as well. Furthermore, systematic. methods are used in array processor designing. These factors make array processor based hard real-time systems very attractive. The array processors 01'-erating with synchronous (asynchronous) communication are called systolic (wavefront) arrays. As the array processor contains modular PES, only design problems associated with regular or partially-regular dependence graphs are considered for array processor design.
The rest of this paper is organized as follows. In Section 2, we briefly describe the widely used dependence graph approach and its limitations for real-time array processor design. In Section 3, our design methodology is presented. Finally, conclusions are drawn in Section 4. can be handled by these. Therefore, the current practice is to make the DG regular while the algorithm is written in single assignment form [8] . If the given problem is not associated with a regular DG, dummy operations can be added to get a regular DG. The DGs for large and complex problems are not regular in general and are very difficult to make regular by adding dummy operations. On the other hand, duminy nodes keep the PES in the array processor busy unnecessarily. This could prevent the ability to meet hard real-time deadlines.
Dependence Graph Based Array Processor Design and its Limitations

Structured Dependence Graph Based Array Processor Design
To simplify the construction of the single assignment code, we construct it hierarchically. This generates a set of DGs which are then combined to get the DG of the given problem. This DG is then projected into an abstract processor array using integer programming. Due to the generality of this approach, it can be used for partially-regular and regular DGs. Furthermore, it enables the projection of the DGs linearly as well as nonlinearly. In general, the abstract processor array resulting from a projection of a partially-regular DG contains prowssors whose behaviors are time varying. With the help of a set of tags, the abstract processor array is mapped into an array processor. These tags control the time-varying behavior, improve the regularity, survivability and flexibility of the array processor. In the following subsections, we describe these design steps briefly. More details are given in [9] .
The Structured Single Assignment Code (S'AC).
The S'AC description consists of a set of hierarchical routines where each routine is described by a header and a body. Only a single assignment is made to every variable in each routine. We refer to the top-most routine as the level-0 routine and the routines in the next level as level-1 routines and so on. To simplify the construction of the DG, level-i routines are only allowed to call routines in the level-(i+ 1). Only atomic operations are used in the last level routines. The following syntax is used to write the S2AC description. All but the last level routines use data types army and record as defined in conventional structured programming languages. An array represents a set of data on which the same operation is performed. A recoid represents a set of data on which different operations are performed. The header of a routine consists of output variables, the name of the routine and input variables. The body of a routine consists of four fields: type declamtion field (where the data types of input and output variables are declared), initialization field (where local variables of the routine are initialized), variable assignment field (where the values of variables are calculated by salling lower level routines or by performing atomic operations) and output assignment field (where output variables are updated). The second and last fields are optional.
A formal description of the syntax of the S2AC description is given in [9] .
3.2
The SDG contains a DG for each routine in the S'AC description. To indicate the hierarchy of the S'AC description, the SDG is defined in Definition 3.4 with the aid of the following auxiliary definitions. Definition 3.1 Any edge that supplies (produces) data to (from) a DG is said to be an I n p u t (Output) Edge of the DG. The node to (from) which the input (output) edge supplies (produces) data is said to beathe I n p u t (Output) Node. 
The Canonical SDG
In array processor design, DGs containing only localdependence edges are of importance. Furthermore, the p r e jection of the DG becomes easy if we can construct the DG in a minimum dimensional Euclidean space. Therefore, we define a canonical SDG which will be expanded to create the DG which will be used for succeeding design steps. 
Integer Programming Formulation of t h e Scheduling and Projection of DGs
For a given n-dimensional DG, our goal is to project the DG into a lower dimensional abstract processor array such that timing constraints are not violated. For this, we solve two integer programming problems. For the scheduling problem, we assign a scaler s i to each DG node indicating its execution time. For the projection problem, we assign an m-vector P;.
to each DG node. Here p'; indicates the target position of the processor element (PE) where the function of the it* DG node is performed.
3.4.1
The number of delay registers introduced into the systolic array primarily depends on the scheduling of the DG nodes, and hence we try to get the best schedule in the first place. The number of delay registers introduced into the systolic array also depends on the projection, because sometimes one can reuse the same delay register several times. As the projection is still an unknown, we.use the function To satisfy the precedence conditions, we introduce the following constraint for each edge in the DG:
Sdestination node -ssaurce node 2 1
(2)
We must schedule the DG such that the input and output timing constraints are met. Therefore, by introducing a variable TI for all the edges along the same direction connected t o t h e same type of DG nodes. In practice, it is convenient t o restrict the number of neighboring PES to which a given P E can communicate to reduce the number of communication links necessary. We ensure this requirement by using the following constraint for each node k in the DG:
j@k.
j # i
where Nk represents the set of nodes connected to node k and C is the maximum number of neighboring PES to which a P E can communicate. By defining g(T) such that g(0) = 1 and ? ( T ) = 0 for T = f l , f 2 , ..., we find whether node i and j have the same schedule time or not.
Apart from these constraints, we allow the designer to insert a set of optional constraints to ensure the projection of a set of DG nodes into the same PE. In this case, these nodes must be scheduled in different time slots which can be ensured by the following constraint:
where uk is the kih user specified node set.
3.4.2
Once the scheduling is known, we can reduce the number of PES necessary for the systolic array by projecting several DG nodes into a single P E in an abstract processor array. Therefore, for the projection problem, the objective function must represent a measure of number of PES. Let i and j be two neighboring nodes connected by an edge and F = W.
As the ith DG node is projected into a P E in the abstract processor array located at 6, DG nodes i and j will be projected into the same P E when 6 = F, ( i # j). Then we have PP;. -FP;. = 0. Let Furthermore, we cannot project two nodes into the same P E if they have the same schedule time. Therefore, for all DG node pairs having the same schedule time, we add the following constraint:
If there are SI DG nodes scheduled for the ith time slot, then the above equation introduces E, S,? -SI constraints. The number of constraints can be dramatically reduced by exploiting the near-neighbor communication property. In that case, we can relax the above constraint for all nonadjacent node pairs, which cannot be projected into the same P E without violating the near-neighbor communication property.
In addition to the previous constraints, we can add the following optional constraints also. To prevent two DG nodes from being projected into the same PE, we can add the constraint, P;. -P; # 0. On the other hand, the designer might prefer to project some specific nodes into a single PE. Constraints of the form P; -P; = 0 can be inserted to obtain such preferences. These optional constraints cam be first left out and then they can be gradually inserted to eliminate undesirable features of the resulting design. We recommend to insert these optional constraints by inspecting the members of the SDG or automatically by specifying a set of predicates.
If one is only interested in lineaf. projections, the constraint gdesttnatton node -p',ource node = 61 Can be introduced to each different edge direcbion.
Once the optimum values for P; are known for all the DG nodes, an abstract processor array can be built as follows: For all DG nodes, define a P E at the location described by P;. 
3.4.3
According to [3], any sys!olic array can be converted into a wavefront array simply by replacing each synchronous conimunication link by an asynchronous communication link and replacing each delay register by an initial token. We adopt the same technique due to its simplicity. Therefore, in summary, the scheduling and projection problems for the wavefront array design are identical to those of systolic array design.
The Scheduling and Projection Problems for Wavefront Array Design
Tag Based Control
In general, an abstract processor array resulting from a projection of a partially-regular DG contains PES performing timevarying functions. We map this abstract processor array into an array processor by designing a set of super-PEs whose functionality is controlled by Tags. A supper-PE is modular and it is capable of performing all the functions of a set of PES in the abstract processor array. At each PE, we place a rcsidcntiultag to identify the different P E types. A set of rriobile-tags is propagated through the array in a controlled fashion to control the time-varying behavior. The mobile tags are sent through the array via a so called idid-tag-path defined below.
Definition 3.8 Any out-tree of the DG is said to be a ValidTag-Tree if the root node of the tree represents the virtual node where all the input edges ofthe DG are virtually connected and the terminal nodes of any edge in the tree ezcept for the edges from the nwt node are scheduled in consecutive time slots'. Definition 3.9 The path defined by the projection of a validtag-tree into the abstmct pmessor army is said to be a ValidTag-Pat h.
As the functions performed by the PES are controlled by tags, the functions performed by a faulty P E can be switched off and executed on a neighboring PE. This provides a survivability to the array processor. Different combination of tags are equivalent to different algorithms. Therefore, we can get a flexible array processor capable of executing a set of algorithms which can be mapped onto the selected topology simply by sending different combinations of tags.
Conclusions
An array processor design methodology suitable for hard realtime system is presented. Scheduling and Projection of the DG is solved using integer programming. By exploring the regularity of the DG we can solve the necessary IP problems in an efficient manner. This methodology provides a unified approach for linear and nonlinear projection of regular and partially-regular DGs.
