High level modeling and (quantitative) performance analysis of signal processing systems requires high level models for the applications (algorithms) and the implementations (architectures), a mapping of the former into the latter, and a simulator for fast execution of the whole. Signal processing algorithms are very often nested-loop algorithms with a high degree of inherent parallelism. This paper presents -for such applications -a suitable application model and a method to convert a given imperative executable specification to a specification in terms of this application model. The methods and tools are illustrated by means of an example.
of concurrently executing sequential processes. It expresses parallelism naturally from the very fine-grained to the very coarse-grained, and does not pre-impose any particular schedule. The mapping then becomes the assignment of the processes to the microprocessor and the coprocessors as shown by tools like ORAS [6] , or SPADE [7] . Using these tools, a Ychart [8] can be constructed, allowing the quality assessment of mappings on architectures.
This paper describes the Compaan tool set that automatically transforms certain Matlab applications into a process network description, as shown in Figure 1 . It converts a Matlab application into a polyhedral reduced dependence graph (PRDG) , that is subsequently converted into a Kahn process network (KPN) description [4] . The Compaan tool set is confined to operate on affine nested loop programs (NLPs) [9] that appear often in applications of interest.
The outline of the paper is as follows. Section 2 gives the Y-chart exploration scheme underlying our concept of modeling and simulation. Section 3 presents the Matlab-toProcess Network compiler Compaan. Section 4 briefly reviews the tools MatParser and DgParser that generate from a given imperative specification a single assignment code and a reduced dependence graph, respectively. Section 5 introduces the tool that generates the processes in the Process Network specification of an application. & ¤ § ' ) ( 0 2 1 3 0 5 4 6 " $ 7 9 8 @ Ä ! 0 C B
In line with Kienhuis et al. [10] we advocate that the development of heterogeneous architectures should follow a general scheme, which can be visualized by the Y-shaped figure in Figure 2 that drives the design of the architecture. Typically, a designer studies this set of applications, makes some initial calculations, and proposes an initial parameterized architecture. The effectiveness of this architecture is then evaluated by comparing its performance for different values of the parameters. For this performance analysis, each application is mapped onto the architecture and the performance of each application-architecturemapping combination is evaluated. The resulting performance numbers may inspire the architecture designer to improve the architecture. The designer may also decide to restructure the application(s) or to modify the mapping of the application(s). These designer actions are denoted by the light bulbs in Figure 2 (a).
In this approach, it is assumed that the applications and the architecture are specified in terms of a model of computation and a model of implementation, respectively. For the parallel execution of signal processing nested loop programs (NLPs), a natural model of computation is the Process Network model [4, 5] which is quite different from the usual imperative model of computation in which the applications are commonly specified. It is, therefore, necessary to provide a compiler that extracts the available parallelism from an application specified as an NLP, say in Matlab, and automatically convert it into a process network specification. We thus extend the Y-chart environment shown in Figure 2 (a) to the environment shown in Figure 2 (b).
In Section 3 to 5 we focus on the upper right corner in Figure 2 (b) which is implemented in our Compaan tool set.
We developed the Compilation of Matlab to Process Networks (Compaan) tool set, which transforms a nested loop program written in Matlab into a process network specification. Compaan does this transformation in a number of steps, shown in Figure 3 , leveraging a lot of techniques available in the Systolic Array literature [11] . In Figure 3 , a box represents a result and an ellipsoid represents an action or tool.
Compaan starts the transformation by converting a Matlab NLP specification into a single-assignment code (SAC) specification. This makes all parallelism available in the original Matlab specification explicit. Next, it derives the polyhedral reduced dependence
W

To appear in Parallel Processing Letters June/Sept 2001 session
Step 3 in more detail Process Generation in more detail
Step 1.
Step 2.
Step 3. 
Node Domain
A node domain is characterized by 1) an iteration domain
is a polytope, 2) a function, and 3) a set of port domains. Here, . The function resides in each and every point in the iteration domain. The function takes its arguments from its input ports and returns values to its output ports. A particular input port or output port belongs to a node domain's input port domain (IPD) and output port domain (OPD), respectively.
Edge Domain
An edge domain is an ordered pair 
Example
To illustrate the notion of node and port domains, we show in Figure 
In the path from Matlab to the PRDG, Compaan uses the tools MatParser [12, 9] and DgParser [9] . MatParser is an array dataflow analysis compiler that finds all parallelism available in affine NLPs written in Matlab using a very aggressive data dependency analysis technique based on integer linear programming [13] . The analysis results in a static representation of the application which enables us to analyse and manipulate it.
MatParser finds whether two variables are dependent on each other, and moreover, at
To appear in Parallel Processing Letters June/Sept 2001 session
Iteration Domain
Node Domain Port Domains which iteration. It partitions the iteration spaces defined by the affine control statements, and gives the dependence vector between partitions. For the program given in Figure 1 , that only containts for-next control statements, MatParser solves about a hundred parametric integer program problems to find all data-dependencies. In Figure 6 , part of the output of MatParser is shown for the algorithm given in Figure 1 . It shows how the iteration space spanned by the for-next iterators ) should be used. DgParser converts the SAC description into the PRDG description, which is a straightforward conversion. Accordingly, the shape of the node domain is given by the way the for-next loops are defined and the partitioning of the node-domain corresponds with the if/else conditions. The terms ipd and opd used in Figure 6 relate to the IPD and OPD defined in Section 5.
Once DgParser has established a PRDG model of an algorithm, the Panda tool can generate a network description and the individual processes. The network description is straightforward, as it follows the topology of the PRDG. Each node in the PRDG is mapped onto a single process and each edge is mapped onto an unbounded FIFO. In case of Fig-å %% Single Assignment Code Generated by MatParser for k = 1 : 1 : K, for j = 1 : 1 : N, As shown in Figure 3 , the Panda tool divides the generation of a process into three different steps: domain scanning, data partitioning, and linearization. Because the PRDG in Figure 4 in not well suited to illustrate these three steps we use another example in this section. The example used in this section is given in Figure 7 and Figure 8 . 
Domain Scanning
The first step in Panda is to scan the node domains by lexicographically ordering the points in the node's iteration domain. Thus, given the iteraton domains for l = k : 1 : N, 4.
[out_0] = f( ); 5.
[x_1( k, l)] = opd( out_0 ); 6.
if k-1>= 0, 7.
[in_0] = ipd( x_1(k-1,l) ); 8. else %% if -k >= 0 9.
[in_0] = ipd( x(k-1,l) ); 10. end 11.
[] = g( in_0 ); 12. end 13. end (b) SAC the MatParser produces. . The conversion to a nested loop scanning is then straitforward.
Data Partitioning
MatParser generates a SAC description in which only the IPDs are explicitly specified. This means that the input argument°hÜ in Figure 7 is surrounded by if/else statements, while the output value ¹ © is not. A consequence of this is that output values can be generated that are never used by some input domain. This problem is illustrated in the top part of Figure 8 ; the solid shaded triagle is known explicitly while the dashed shaded triangle is not. The second step in Panda is to make the OPDs explicit.
Making the output port domains explicit is illustrated in Figure 9 . It shows two communicating node domains ê ï ¬ ë and ê ï ¬ ì 
. Comparing these port domains with the respective node domains, it follows that what remains to be checked at run time -to detect the port domains while scanning the node domains -is whether
Linearization
The channels between processes are one dimensional FIFO buffers. Therefore, the order in which a consuming process reads tokens from a channel must be the same as the order in which tokens are written onto the channel by the producing process. Of course, the consuming process will in general use the read tokens in a different order (out-of-order consumption). The chanel's FIFO and the consumer process's reorder memory are modeled as a single one-dimensional memory . This is shown in Figure 10 . 
The linearization method in Panda relies on methods to count the number of integral points contained within a polytope using so-called Ehrhart Polynomials [16] . These methods are implemented in the library PolyLib [17] . The three steps domain scanning, data partitioning, and linearization result in a control program as shown in Figure 12 for the running example. As can be seen from the figure, 
