We present an approach to model dataflow architectures at a high level of abstraction using timed coloured Petri nets. We specifically examine the value of Petri nets for evaluating the performance of such architectures. For this purpose we assess the value of Petri nets both as a modelling technique for dataftow architectures and as an analysis tool that yields valuable performance data for such architectures through the execution of Petri net models. Because our aim is to use the models for performance analysis, we focus on representing the timing and communication behaviour of the architecture rather than the functionality. A modular approach is used to model architectures. We identify five basic hardware building blocks from which Petri net models of dataflow architectures can be constructed. In defining the building blocks we will identify strengths and weaknesses of Petri nets for modelling dataflow architectures. A technique called folding is applied to build generic models of dataflow architectures. A timed coloured Petri net model of the Prophid data.flow architecture, which is being developed at Philips Research Laboratories, is presented. This model has been designed in the tool ExSpect. The performance of the Prophid architecture has been analysed by simulation with this model.
Introduction
Digital video signals for multi-media applications and digital television are highthroughput, stream-based video signals. Real-time processing of digital video signals requires high performance computation. A growing number of video applications have functions for which the computation time depends highly on the applied input data. Examples of this are encoding and decoding of images in the MPEG format So, dynamic behaviour with variable data rates and data dependent computation is becoming a characteristic of real-time digital video signal processing. Because of the processing power required for real-time digital video signal processing, video algorithms are typically executed on special !Cs, called video signal processors. Due to the dynamic behaviour of video signal processing algorithms we need a computational model for real-time video signal processing that can handle variable data rates and data dependent computation. The dynamic dataflow model [Buck, 1993] provides a formalism to describe algorithms with this kind of dynamic and parallel behaviour. A dataflow algorithm is a directed graph where the nodes represent tasks and the arcs define the data dependencies between the tasks. Data produced and consumed by the nodes is carried in tokens which flow along the arcs. The video signal processing algorithms have to be mapped onto architectures that will perform the computations. Datajlow architectures [Watlington and Bove, 1995] are particularly suited to implement this kind of stream oriented video algorithms. Architectures and methodologies for mapping video signal processing algorithms onto architectures are being developed at Philips Research Laboratories. A video signal processing architecture called the Prophid architecture [Leijten et al., 1997 ] is being developed, which is the subject of our study.
Problem statement
In a design process there are usually several possibilities for implementation. To be able to make the right choices, these possible implementations need to be quantified taking into account such measures as performance, chip area and power dissipation. In this paper we focus on methods for performance evaluation. These methods must permit the designer to explore the design space in search for the implementation that offers satisfactory performance. In the application domain of dataflow architectures for video signal processing, the performance depends on the following three elements:
• the architecture, • the video signal processing algorithms that are used, and • the mapping of the algorithms onto the architecture.
Note that we focus on programmable architectures that satisfy the performance needs of a class of algorithms. , . To analyse the performance of an architecture, a workload must be imposed on the architecture. A workload consists of a video processing algorithm, a mapping for this algorithm onto the architecture and streams of video samples as input data for the algorithm. With models of the architecture, the algorithms and the mapping we can analyse the performance. This is depicted schematically in Figure 1 , which is called the Y-chart because of its shape.
For the performance analysis we have to select the performance criteria to evaluf!.te, the so-called metrics, that will give us a measure for the performance of the modelled architecture with an algorithm mapped onto it. Some of the metrics that are of interest to us are the utilisation of the video signal processing elements, memory usage and throughput of the system. The performance numbers that are the result of the analysis quantify the performance of the architecture and give the designer better insight in the working of dataflow architectures and the mapping of algorithms onto these architectures. This information can help the designer to improve the architecture, the way algorithms are mapped onto the architecture and the way algorithms are structured. This is represented by the feedback loops with the light bulbs in Figure 1 . The designer repeats the analysis until he Currently design tools for performance analysis of high performance, heterogeneous systems such as dataftow architectures for video sign<l.l processing are lacking. Most available tools assist in modelling the architecture components at a detailed level. A high abstraction level is needed in an early stage of the design _ to evaluate different design choices. We propose to use timed coloured Petri nets to model datafiow architectures at a high level of abstraction for performance analysis. Petri nets seem to be a good candidate as a modelling technique for dataflow architectures for a number of reasons. Intuitively there is a match between the data driven execution of Petri nets and the dataflow model. Dataflow architectures are characterised by a high degree of parallelism. Petri nets offer the ability to model parallelism in a straightforward manner. Furthermore, higher level Petri net concepts such as token colouring [Jensen, 1992] , timed Petri nets [Zuberek, 1991] , stochastic delays [Molloy, 1982] and hierarchy provide expressive modelling power. These Petri net concepts are supported by the tool ExSpect [ASPT, 1994] . In ExSpect we can design and execute Petri nets. This enables us to run simulations with our architecture _ model. Computers have become much faster over the last decade which makes computer-based simulation of complex systems feasible. ·
We can formulate our problem statement as follows: how can timed coloured Petri nets be used to execute the Y-chart ( Figure 1 ). More specifically, we want to assess the value of Petri nets for modeling dataflow architectures, video algorithms and mappings. We want to use these models to exercise the loops of the Y-chart in order to analyse the performance of different solution points in the design space. Therefore we are interested how we can configure a Petri net model with an architecture instance, an algorithm and a mapping and measure performance metrics using the Petri net model.
Several studies on performance modelling of multiprocessor systems using Petri nets have been presented in the literature [Govindarajan et al., 1997; Madhukar et al., 1995; Marsan et al., 1986; Lindemann, 1992] . As an example we mention the work presented by Marsan et al. [1986] on performance models of multiprocessor systems. Our approach for modelling architectures for performance analysis is similar to the approach presented by Marsan. The models differ however from our model in the granularity of the tasks executed on the architectures. In most previously published work the architectures that are considered consist of processors and ·memory components that communicate via an interconnection structure. The performance of the architectures is analysed by modelling the exchange of messages between the components and the execution of tasks on the processors. The architectures that we focus on are stream based architectures with sample based communication between the co-processors. Therefore our model of the communication and processing of video samples is more fine-grained. To our knowledge no work has previously been published on performance modelling of dataflow architectures using Petri nets.
Solution approach
We will assess the value of timed coloured Petri nets for modeling dataftow architectures using the requirements for models and modeling techniques that we presented in the previous section. We model the dataftow architectures at a high level of abstraction, such that we do not have to model all hardware details and we are still able to evaluate the performance of the architectures. To model an architecture we have to model its components and the interconnections among.the components. Because we are interested in the performance of · the architectures, we focus on the timing behaviour of the components. We model the behaviour of the components in terms of delays and synchronisations with the activities of O'ther components. _ . we· use a modular approach by constructing Petri net models of basic building blocks of dataftow architectures. We use these building blocks to identify the strengths and weaknesses of Petri nets as a modelling tool for performance evaluation. Our aim is to construct a model with a high level of genericity that can be used t<:) simulate different instances from a class of architectures. It must be possible to configure the modef with the information of a video algorithm and a mapping such that we can execute the algorithm on our model of the architecture and analyse its performance by measuring a set of metrics.
Architecture
The architecture that is subject of our study is the Prophid dataflow architecture as proposed by Leijten et al. [1997] that is being developed at the Philips Research Laboratories. A simplified structure of an architecture belonging to the class of Prophid architectures is shown in Figure 2 . ., .
An architecture of this class consists of a number of co-processors that autonomously process streams of video samples. The video processing algorithms that will be executed on this architecture are dynamic datafiow graphs that consist of coarse grained tasks, such as noise reduction and sample rate conversion. Tasks in the datafiow graphs are mapped onto the co-processors of the architecture. Multiple tasks may be mapped onto the same coprocessor. Tasks that are mapped onto the same co-processor are executed multi-plexed in time. In dynamic dataftow, tasks can consume tokens in dynamic, variable patterns from the input edges. The number of tokens that are produced on the outputs may also vary dynamically. This results in dynamic behaviour in the co-processors for consuming samples from the input FIFO buffers and producing samples to the output FIFO buffers. The co-processors can have a computational pipeline and can have dynamic, data-dependent computation times. ·· The co-processors communicate via input and output FIFO buffers connected with a communication structure. The communication structure consists of routers, a global controller and a switch matrix. The video sample streams are organised in sample packets. The head of a packetcontains routing information and the rest of a packet contains the actual video data (samples). The routers communicate sample packets via the switch matrix from their output buffer to an input buffer of the co-processor that will perform the next task in the dataflow graph. The routers interact with the global controller to obtain the destination information.
Hardware modelling with Petri Nets
The architectures will be realised in synchronous hardware. In hardware operations can be executed in parallel. Parallel operations can be modelled by Petri net transitions that can fire in parallel or by transitions that can fire instantaneously in a sequence. The performance numbers that will be produced by our models must be independent of possible non-detenninisrn in the execution or evaluation of the model. Therefore we have to make sure that different possibilities for firing a sequence of transitions that are enabled simultaneously will produce the same marking and result in the same performance numbers.
We use the timed coloured Petri net model with time stamps attached to tokens and delays associated with token production [Jensen, 1992; van Hee, 1994] . This form of delay is a natural choice to model dataflow architectures where the timing behaviour is determined by the delays that are involved with communication and processing of video samples. We assume in our Petri net model that the order in which tokens are placed in the places is preserved. This gives the places a FIFO property which can be exploited to model memory elements with a FIFO property.
We have identified a set of basic hardware constructs that can be seen as buildina blocks . 0
for the dataflow architectures of interest to us. These building blocks are:
(a) communication between 2 components (via a register), (b) communication via a FIFO buffer, (c) pipelined computation, (d) merging multiple streams of samples (multiple producers, one consumer), and (e) splitting one stream of samples into multiple streams (one producer, multiple consumers). We discuss each construct in the following subsections and show how Petri nets can be used to model the hardware constructs. Each of the sections starts with a description of the hardware construct, a description of the model, followed by a discussion of the strengths ') and weaknesses of Petri nets for modelling that particular hardware construct at an abstract level. With the Petri net models for the basic hardware constructs we can assemble architecture models in a modular manner by either linking building block models or by combining several building blocks to create more complex constructs.
Communication
Hardware description. This basic construct, shown in Figure 4( Model description. We can model this construct with the timed coloured Petri net shown in Figure4(b). The components A and Bare modelled with Petri net transitions. The registers R 1 and R 2 are modelled with Petri net places. The values of the registers are modelled with "coloured" tokens that have integer values. We need time in our Petri net to model the delays in the hard ware for the read and write actions from and to the registers. To ensure that transition A and B cannot fire at any given time, but only once per clock cycle, we use the selftoops containing the places value and idle to enable or disable these transitions. The delays given to the tokens in these places represent the time that a component is busy with its operation. ·-Discussion. We can conclude that concurrent read and write operations in hardware can be modelled with a timed Petri net where tokens have a time stamp. The order in which transitions fire is determined by the availability of tokens. We can use the tokens that model the values that are communicated between the hardware components, to impose an order on the firing of transitions. Delays for communication are modelled by assigning delays to the tokens that model the communicated data. Delays for computation times are modelled by assigning delays to the tokens that represent whether a component is busy or idle.
FIFO buff er
Hardware description. The FIFO buffer construct, shown in Figure 5( Model description. We can model the FIFO buffer with two Petri net places, one to represent the content of the buffer and one to represent the free buffer space. The Petri net model is shown in Figure 5(b) . The delay of one clock cycle in the hardware when a value is -written to the buffer is modelled by giving the token that transition A places in place FIFO a delay of one time unit. This also applies for the tokens that B places in free and R. The tokens in the selfioops containing the places idle and value determine the rate at which tokens can be consumed from the buffer and placed in the buffer.
Discussion. Because the order of the tokens in place FIFO with respect to their arrival time is preserved in the timed Petri net model with time stamps, we can model the FIFO buffer in such a compact way.
Pipeline
Hardware description. This construct, shown in Figure 6(a) , contains a component B that performs a pipelined computation. The output that it produces depends on a numb,er of consecutive input values produced by component A. The pipeline is blocking for consumption of input data if it is full and it is blocking for production of output data if it is not full. Each clock cycle components B tries to perform its operation.
Model description. We model the pipeline with a Petri net place pipeline, similar to our model of the FIFO buffer (section 5.2). The tokens that represent the data accumulate in We model component B with two transitions, B-consume and B-produce for adding and removing tokens from the pipeline respectively. If the pipeline is not full, B-consume consumes a token from the input place and places it in the pipeline place. The free spaces in the pipeline are modelled by tokens in a place free, just like we modelled the FIFO buffer. If the pipeline is full, B-produce removes a token from the pipeline and produces an output token. However we cannot use the pipeline place alone to enable transition Bproduce when the pipeline is full,_ because B-produce does not know how many tokens reside in the pipeline place. To model the "block on not full" property of B, we need an extra place, N, with a token that indicates the number of tokens in pipeline. To enable Bproduce only when the pipeline is full, we give B-produce the precondition N =pipelinesize.
Discussion. We can model a pipeline in a similar manner as a FIFO buffer. The FIFO property of places in the timed Petri net model enables us to use a piace to model the content of the pipeline. The extra place is needed to provide the transition that removes tokens from the pipeline with information about the number tokens in the pipeline.
Multiple producers
Hardware description. This construct, shown in Figure 7 (a), consists of a component B that merges two streams of data corning from components Ai and A 2 . These streams may have different rates. Component B reads values from its input channels according to its consumption pattern.
Model description. The rate differences with multiple inputs cau· se$ some difficulty for a model in Petri nets. Because a Petri net transition can fire only if all of its input places contain at least one token, we cannot use a transition with multiple input places when u::J register we want to model a component that can read either one or two input values. A way to model this variable consumption behaviour is to create different transitions for each possible consumption pattern and a token that enables exactly one of these transitions. In our construct with two producers, we have three transitions as shown in Figure 7(b) . Transition B models consumption of one token from component A1, B' models consumption of two tokens from A1 and A1, and B" models consumption of one token from A 2 . The place consumption-pattern contains one token and is coloured to enable exactly one of the transitions B, B' or B" that have preconditions to check the colour of this token. The consumption pattern can be represented as a list that is stored in the token in consumptionpattem. When the transitions B, B' and B" consume a token, they place it back with the head of the list moved to the end, to enable the transition that is next in the consumption pattern.
Discussion. The variable token consumption cannot be modelled in a straightforward manner, because we cannot define firing rules for the Petri net·transitions like we can for hardware components. In Petri nets it is not possible to define a transition that consumes a variable number of tokens from its input places or that fires when not all input places contain tokens. A hardware component, on the other hand, can consume tokens even if not on all inputs data is available.
To model these kinds of variable rates, we had to split up the consumption and use consumption pattern tokens that indicate what token should be consumed next.
Folding
We can map the components A1 and A1 onto a single Petri net and apply token colouring to keep the tokens that are produced by components A I and A1 separated. The resulting model is shown in Figure 7 (c). We call this technique of modelling multiple hardware components of the same type by one single Petri net folding . This makes the model more compact and easily configurable for a variable number of hardware components, possibly with different parameters such as firing rates. Essential for this approach is the fact that transitions fire instantaneously. This makes it possible to model multiple instances of a component with one transition. In our model of the hardware construct with multiple producers the folding technique re-I ( \ duces the number of transitions needed to model the possible consumption patterns. However, the delay of placing back the consumption-pattern token is different in the folded model. If component B consumes two input values simultaneously, transition B of the folded model consumes two tokens of different colour sequentially but instantaneously.
Multiple consumers
Hardware description. This construct, shown in Figure 8 l\ilodel description. The Petri net model of this hardware construct is shown in Fig-'  ure 8(b) . The value that component A writes to register R is inspected by more than one component. In Petri nets a value is read from a place by consuming the token and thereby removing the token from the place. Therefore the value can only be read once, unless it is placed back into the place from which it was consumed. An alternative approach is to let transition A produce a number of identical tokens such . that there are enough tokens for all consumptions. from the transitions Bi. This requires that transition A has information about the number of consuming transitions and their token consumption behaviour, Since we want to model the component A such that is independent of the number of consuming components and the rate at which they consume tokens, we consider this approach unacceptable. This means that we choose to let the each transition that models access to register R, place the token back into R to allow other transitions to read the value of this token as well. A problem arises when transitions A and Bi are enabled simultaneously. When they fire, they will consume the token from place R and place it back. The transitions Bi will leave the value of the token unchanged, but transition A will change its value. Therefore the order in which the transitions fire is of importance. If we do not impose restrictions on the order in which transitions A and Bi can fire in our Petri net, our model will be nondeterministic. This problem can be solved by extending the Petri fl.et model with priorities associated with the transitions. By giving transition A a higher priority than transitions Bi, transition A will always fire first. As a result of this, all transitions B1 will read the new value of the token in place R.
We can apply the folding technique to this hardware construct. The folded model is shown in Figure 8(c) .
Discussion. This model shows that we have to use a method of placing back tokens when multiple components read the same input, because token consumption is destructive in Petri nets. A result of placing back tokens is that we can no longer use the availability of these tokens to correctly model the timing and order of the communication between components. Synchronous hardware has two phases in each clock cycle to perform its operations. In the first phase input data is read and computations are carried out. In the second phase all registers and signals are updated with their new values. This ensures that the input data is stable when the output data is being computed.
In Petri nets computation of new output results and assignment of the results to places is not separated into phases. When a Petri net transition fires, it computes the output values and produces the results (possibly with a delay). When more than one transition is enabled at the same time, it is possible that each of the enabled transitions performs its operations on a different state of input values. Therefore we must impose explicit restrictions in our model such that transitions fire in the correct order. This can be done by using priorities for the transitions or using appropriate delays such that transitions will be enabled in the correct order.
Petri net model of the architecture
We have chosen to model the architecture at an abstract token level, that is, we define uninterpreted models for performance analysis. Tokens in our model represent samples of a video signal, but they do not contain the values of the samples. We do not model the functionality of the co-processors, but take only timing behaviour into consideration. This allows us to use our model for performance analysis. The accuracy of our performance analysis depends on the accuracy of the timing behaviour in our model. Because we do not modelthe functionality of the co-processors, we cannot cope with data dependent behaviour of co-processors. Data dependent behaviour must be approximated in our model with stochastic delays for the computation times and variable or stochastically determined consumption patterns.
We have used the structure of the Prophid architecture, shown in Figure 2 , in our Petri net model of the architecture. This makes the model easy to understand because the flow of the tokens in the Petri net corresponds to a great extent with the flow of data in the architecture. We can model the components of the architecture using the models of the l2 constructs presented in the previous section. Because we want to be able to change the number of co-processors and the number of input and output buffers for each co-processor without having to make a lot changes in our model, we designed the model such that it provides this flexibility. We mapped all components of the same type onto one component that models them on separate levels. We call this the folded model of the architecture, shown in Figure 9(a) . The Petri net model of the architecture is shown in Figure 9 (b ). We use token colouring in our Petri net to separate the tokens of the different "levels" of our folded model. The label of a token· indicates the location of the corresponding video sample in the architecture. We use timed Petri nets with delays associated with the production of tokens. The firing of a transition is instantaneous. This makes it possible to let one transition model different instances of a set of components, such as for example the input buffers. This approach keeps the Petri net model compact and every possible instance of the class of Prophid architectures is captured with this Petri net. A configuration of the architecture with a certain number of co-processors and buffers is instantiated in the model with a set of token colours and an initial marking that represents the initial hardware state. For example, the set of buffers and their sizes in a configuration is determined by the number and colour of initial tokens in the free-space places of the buffers.
We applied the building block techniques that we presented in the previous section to construct the model of the architecture. The input and output buffers are modelled by two places representing buffer content and free space according to the FIFO buffer model. The co-processors consume video samples from a number of input buffers using some consumption pattern and have an internal pipeline for the computation. Therefore the consume transition that models consumption of video samples by the co-processors is a combination of the consuming transition of the FIFO buffer building block ( Figure 5(b) ), the consuming transition of the multiple producer building block with its consumptionpattem place (Figure 7(b) ) and the consuming transitions of the pipeline building block ( Figure 6(b) ). This also applies to the produce transition that models the production of video samples by the co-processors. Here the producing transitions from the same three building block models have been combined.
There are two places with multiple consumers in our Petri net model that require tokens to be placed back after they have been consumed. These involve the tokens residing in the place current-workspace (in local-controller) and the place enabled (in global-controller).
Here we have to explicitly impose an order in the firing of transitions that consume a token from one of these places. This can be done by giving one transition priority over another.
The exact functionality of the global controller and local controller is beyond the scope of this paper. Therefore, we will not go into detail about the modelling of these components.
We measure the performance metrics of the architecture by extending the Petri net with transitions that collect tokens that mark an event in the architecture during execution of the application. We let the transitions that model the architecture, such as consume, produce a token each time a video sample token is consumed. These event tokens are translated to the appropriate value for the metric. For ex~ple, event tokens from the consume transition are translated into values for interarrival times and utilisation of the co-processor.
Simulations
We used our model to simulate an instance of the Prophid architecture class with a benchmark application mapped onto it that imposes a realistic workload on the architecture. The simulations were carried out using ExSpect [ASPT, 1994] . The architecture instance consists of 12 co-processors. The benchmark application is a video signal processing algorithm in the form of a dataflow graph consisting of 17 tasks, shown in a simplified form in Figure 10 . The algorithm combines two video signals into a multi-window video signal. Two different input sources are used for the streams of video samples. A number of tasks are performed on the streams of video samples such as resizing and quality enhancement of the image before the two streams are combined into one output stream.
The mapping that we used in the simulations is generated by a mapping tool. Different mappings are possible. The influence of different mappings on the performance can be evaluated by exercising the feedback loop to the mapping in the Y-chart (Figure 1) .
input 1 input 2 Fig.10 . Benchmark application graph.
The input and output signals consist of full size video frames. The video signals have a pattern of video data and blanking periods in between video frames. We ran a simulation for 4 video frames. The simulation time was approximately one day per video frame. This corresponds to nearly 1500 transition firings per second. We measured the following performance metrics:
• utilisation of the co-processors and routers, • buffer contents (average contents over time and relative frequency of different buffer fillirtgs ), • throughput rates of the sample streams in the co-processors, • response time for the global controller, and • error rates for a number of real-time constraints. Figure 11 shows some of the results of our simulations. In the measurements of the utilisation, shown in Figure ll(a) , we can see the frame pattern of the sample stream. This graph shows the percentage of the total available processing time that one particular task of the dataftow graph requires from the co-processor onto which it has been mapped. In this case we see that it utilises 40% of the time during the active video frames and at the beginning of each frame it peaks at 50%. Figure ll(b) shows the average usage of an input buffer as function of the time. We see that during the active video frames the buffer of size 64 contains on average 48 samples. The plotted values are averages over intervals of 8192 clock cycles (2 video lines) in order to reduce the amount of measurement data. A disadvantage of this is that it evens out peaks in the measurement data. That is why we also measure the distribution of the number of times that each buffer usage occurs during the simulation. This is plotted in Figure 1 l(c) . It shows that the same buffer after a run-in period always contains between 32 and 64 samples. These simulation results were used to verify that this co-processor never has underflow on its input buffer which was one of the real-time constraints. l)
. . 
Conclusions
We have proposed a method to use timed coloured Petri nets to model dataflow architectures at a high level of abstraction. We identified five basic hardware building blocks that can be used to construct models of dataflow architectures. These five building blocks are sufficient to model the class of dataflow architectures of our interest. We used Petri net models of these building blocks to assess the value of Petri nets for modelling dataflow architectures.
We will draw our conclusions on the value of Petri nets as a modelling tool for performance analysis of dataflow architectures based the criterions that we presented in section 2.
Executability. Petri nets are executable, which enables us to simulate the modelled architecture. With simulations we can measure metrics that quantify the performance of the architecture under a workload that is imposed by a benchmark application and the streams of input data. We consider the availability of tools such as ExSpect and its ease of use 'to be important issues for this type of research.
Ease of modelling. Petri nets facilitate the modelling of parallelism in dataflow architectures. The timing behaviour of the hardware can be modelled with delays associated with tokens in the Petri net model. There is however a difference in the communication between parallel hardware components and Petri net transitions that can fire simultaneously, which deserves some attention when we model this. In hardware the computation of output data and updating of registers is separated into two phases. In Petri nets-this is not separated. This can result in a possible state change of the modelled hardware after each firing. A sequence of transitions firing at the same moment in time to model parallel operations can have a different state for each firing. In hardware such intermediate state changes cannot occur. In some cases this requires additional Petri net constructs to explicitly impose a firing order of transitions to model the hardware behaviour correctly. Petri nets offer expressive ways to model components such as FIFO buffers and pipelines, These components can be modelled very compactly with a few Petri net places. The consumption and production patterns of co-processors in a dynamic dataflow archi-tecture cannot be modelled with transitions that have the same firing rules as the coprocessor. The patterns must be split up and modelled separately.
Configurability. The model that we constructed for a class of dataftow architectures can be configured with different instances of this class. We applied a folding technique to model a set of hardware components of the same type with one Petri net. This has the advantage that our model of the architecture can be configured without changing the Petri net. It is configured by defining the initial marking of the Petri net. By defining the initial marking we can also configure the model to execute a dataflow graph and mapping. This permits us to explore different points in the design space.
Efficiency. The main drawback of our approach is the efficiency of the simulations. Experiments showed that the efficiency was not very high for realistic applications involving large· amounts of video samples. It takes a few days to simulate real life examples. The efficiency is determined by the way we modelled the architecture and by the speed of the simulation engine. The simulation engine that we used interprets an object code of the Petri net model. Higher simulation speed can be achieved with an executable program of the Petri net model which is specified in C++.This is currently under development.
Accuracy. We chose to model datafiow architectures at a high level of abstraction with only the timing and communication behaviour. The Petri net model does not contain the functionality of the co-processors or the exact hardware implementation of the components. Therefore our model is as accurate as the timing model that we use for the architecture. This involves assumptions about the computation time and stochastic approximations for possible data dependent behaviour. Given these abstractions in our model, the Petri net models produce accurate, cycle based performance numbers. A specification of the Petri net model in C++ will facilitate to model the functionality of the co-processors. This will allow us to analyse data-dependent behaviour in the applications more accurately than by using stochastic approximations.
