systems by Arno Moonen et al.
1
Timing analysis model for network based multiprocessor
systems
Arno Moonen1,2, Marco Bekooij2 and Jef van Meerbergen1,2
1Eindhoven University of Technology
P.O. Box 513, 5600 MB Eindhoven, The Netherlands
2Philips Research Laboratories Eindhoven
WDC31, Prof. Holstlaan 4, 5656 AA Eindhoven, The Netherlands
{Arno.Moonen,Marco.Bekooij,Jef.van.Meerbergen}@philips.com
Abstract—In this paper an embedded multiprocessor sys-
tem on top of a network on chip is proposed which is
amenable for timing analysis. This multiprocessor system
is intended for multimedia application that process data
streams. The temporal behavior of applications executed
on this multiprocessor system is derived with a Synchronous
Data Flow (SDF) graph in which computation, communi-
cation, buffer sizes as well as arbitration is modeled. This
graph can be transformed in an event graph which is a spe-
cial case of a Petri net from which properties like the mini-
mal throughput can be derived with results of MaxPlus Lin-
ear System Theory [1]. Our main contribution in this pa-
per is an SDF model of the network in which an arbiter is
applied which allows the transfer of a possibly varying but
bounded number of words per period.
Keywords— Embedded Systems, Multi-processor, Pre-
dictable temporal behavior, Network-on-chip
I. INTRODUCTION
Consumers have high expectations about the quality and
robustness of multimedia systems like DVD players, set-
top boxes, and television sets. The applications executed
by these multimedia systems require a computational per-
formance in the order of 10-100 GOps. Such a perfor-
mance can be delivered by embedded programmable mul-
tiprocessor systems. In order to guarantee a certain qual-
ity level it is necessary that these multiprocessor system
exhibit a predictable temporal behavior. The communica-
tion between processors can be made predictable by mak-
ing use of a loss less packet switched network that sup-
ports connections with a guaranteed throughput service.
Connections with a guaranteed throughput service provide
a guaranteed bandwidth and a bounded delay and jitter.
Given these guaranteed throughput connections it is shown
in this paper that the worst-case temporal behavior of the
system can be derived by making use of an SDF model
in which the computation, communication, arbitration and
storage is modeled. A comparison with related network
performance analysis models is presented in the next sec-
tion before our SDF model is presented in subsequent sec-
tions.
II. RELATED WORK
An overview of service disciplines for packet-switched
computer networks with a guaranteed performance is
given in [2]. The applied service discipline determines
when packets are transferred from one speciﬁc input of a
router in the network to one speciﬁc output of the router.
The maximal delay and jitter of a connection is derived
given the applied service disciplines in the network and
a characterization of the trafﬁc at the inputs of the net-
work. Derivation of the maximal delay requires for so-
called work-conserving service disciplines that the trafﬁc
on the boundary as well as inside the network can be char-
acterized. Characterization of trafﬁc in the network is dif-
ﬁcult and may not always be possible if there are so-called
trafﬁc loops.
In this paper it is shown that the need to characterize
trafﬁc can be eliminated by making use of ﬂow control be-
tween every producer and consumer of data in the system.
Examples of producers and consumers in the system are
processors and routers. It will also be shown that due to
theﬂowcontrolthenetworkandthesystemcannotbecome
unstable even if there are trafﬁc loops. The worst-case ar-
rival time of data at any point in the system is derived from
an SDF model of the system. This SDF model includes an
SDF model of a network connection. The SDF model of
a network connection captures accurately the worst-case
temporal behavior despite that potentially a variable num-
ber of words is transferred per period.
In this paper the term arbitration policy is used instead
of service discipline because all local arbitration decisions
for links in the network as well as processors and memory
ports are handled in a uniform way.2
III. OUTLINE
For sake of clarity, we step-by-step introduce new con-
cepts and elements in this paper. Therefore the organiza-
tion of this paper is as follows. First the SDF graph and its
properties are described in section IV. Then, in section V,
an SDF graph is used to derive the temporal behavior of a
multiprocessor system which executes two software tasks
which communicate via a point to point connection. In
section VI this point to point connection is replaced by
a network with shared interconnect. It is assumed that
TDMA arbitration is applied inside the network and that
each period it is guaranteed that up to N words of data can
be transferred. For this type of arbitration an SDF model
is introduced and a proof of its correctness is presented in
section VII. That an SDF model of such an arbiter can be
derived is a surprising result because the amount of data
transfered per period does not need to be constant. Given
this SDF model of the arbiter, the temporal behavior of a
system with a network is derived. In section VIII, a net-
work is introduced in which ﬂow control via a guaranteed
throughput connection in the opposite direction is applied.
In the same section, an SDF model is presented which cap-
tures this type of ﬂow control. Finally in section IX a mul-
tiprocessor system is described which allows the execution
of multiple tasks per processor. Each of these tasks might
produce a substantial amount of data. The data produced
by a task is stored in its local memory after which it is
copied by a Communication Assist (CA) in a small but
fast FIFO at the input of the network. The worst-case tem-
poral behavior of the system is derived with an SDF model
which includes a model of a network connection. This net-
work model is applied in section X in the SDF model of a
system in which multiple software tasks are executed on a
processor.
IV. SYNCHRONOUS DATA FLOW GRAPH PROPERTIES
Some useful properties of SDF graphs are presented in
thissection. These SDFgraphs areusedin thispaper tode-
rive the worst-case temporal behavior of a multiprocessor
system which includes a packet-switched network.
First of all, an SDF graph is deﬁned as follows:
Deﬁnition 1 (Synchronous Data Flow Graph.) The tu-
ple (V,E,d,Tv) deﬁnes a Synchronous Data Flow (SDF)
graph, where
• V is the set of nodes (actors),
• E ⊆ V × V is the set of edges,
• duv : E → N is a function describing the number of
initial tokens on an edge (u,v) ∈ E,
• Tv : V → R+ is a function describing the worst-case
execution time of actor v ∈ V .
• Ouv : E → N is a function describing the number of
tokens produced on edge (u,v) ∈ E by actor u.
• Iuv : E → N is a function describing the number of
tokens consumed from edge (u,v) ∈ E by actor v.
An arbitrary SDF graph is depicted in ﬁgure 1. The
nodes in an SDF graph are called actors. Actors have a
well deﬁned input/output behavior and a worst-case exe-
cution time. Actors produce and consume tokens. A token
is a container in which a ﬁxed amount of data can be stored
and is depicted in ﬁgure 1 as a black dot. If more than
one token is present on an edge then the number of tokens
(duv)is speciﬁed next to the dot. An actor has a worst-case
execution time which is denoted by TAx in ﬁgure 1. An
actor can ﬁre (starts its execution) after at least the num-
ber of tokens is available as is speciﬁed at the head of the
data edge of every incoming edge of the actor. The spec-
iﬁed number of tokens is consumed from the input edges
of the actor before the execution of an actor ﬁnishes, that
is within the worst-case execution time of the actor. The
number at the tail of an edge denotes the number of to-
kens an actor produces before the execution of the actor
ﬁnishes. Actors with internal state are modeled in an SDF
with a self edge, like the self edge of actor A4 in ﬁgure 1.
This self edge is given one initial token such that the next
execution cannot start before the previous execution of the
actor is ﬁnished.
4
2
A2
TA2 = 1ms
1
A1
2 1
A4
TA4 = 1ms
1 1
TA1 = 1ms
1 3
A3
TA3 = 1ms
6
2
Fig. 1. A Synchronous Data Flow (SDF) graph example.
An SDF graph can be transformed into a Homogeneous
Synchronous Data Flow (HSDF) graph on which analy-
sis is performed. An algorithm which transforms any SDF
graph into an HSDF graph is described in [3]. An HSDF
graph is a special case of an SDF graph in which the exe-
cution of an actor results in the consumption of one token
from every incoming edge of the actor and the production
of one token on every outgoing edge of the actor.
An HSDF graph can be executed in a self-timed manner,
which is deﬁned as a sequence of ﬁrings of HSDF actors
in which the actors start immediately when there is at least
one token on each input of the actor. In the case that the
HSDF graph is a strongly connected graph and a FIFO or-3
dering is maintained for the tokens, then the self-timed ex-
ecution of the HSDF graph has some important properties.
A FIFO ordering is maintained if the completion events
of ﬁrings of a speciﬁc actor occurs in the same order as
the corresponding start-events. This is the case if an actor
has a constant execution time or belongs to a cycle in the
HSDF graph with only one token. In [1] are the properties
of the self-timed execution of such HSDF graphs derived
with MaxPlus algebra.
Firstofall, themostimportantpropertyoftheself-timed
execution of an HSDF graph is, that it is deadlock-free if
there is on every cycle in the HSDF graph at least one ini-
tial token. Secondly, the execution of the HSDF graph is
monotonic, i.e. decreasing actor execution times result in
non-increasing actor start times. Third, an HSDF graph
will always enter a periodic regime. More precisely, there
exist a K ∈ N, an N ∈ N and a λ ∈ R, such that for all
v ∈ V , k > K the start time s(v,k + N) of actor v in
iteration k + N is described by:
s(v,k + N) = s(v,k) + λ · N (1)
Equation 1 states that the execution enters a periodic
regime after K executions of an actor in the HSDF graph.
The time one period spans is λ · N. The number of ﬁrings
of an actor v in one period is denoted by N. Thus, λ is
equal to the inverse of the average throughput measured
over period.
The Maximum Cycle Mean (MCM) [3] of an HSDF,
which is equal to λ, is given by equation 2. The MCM
of an HSDF graph is also called in literature the maximal
cost to time ratio [4]. The Cycle Mean (CM) of a simple
cycle c in the HSDF graph G is given by equation 3. In
this equation denotes d(c) the number of initial tokens on
the edges in a cycle c. The Worst Case Execution Time
(WCET) of actor v is denoted by WCET(v). The MCM of
an HSDF graph can be derived with a pseudopolynomial
algorithm [5] [4].
MCM(G) = max
c∈CG
CM(c) (2)
CM(c) =
X
v on c
WCET(v)/d(c) (3)
The worst-case start-times of the actors during the tran-
sition state as well as the steady state can be derived by
simulation. During this simulation, all actors must have an
execution time equal to their worst-case execution time.
The start-times observed during this simulation are equal
to the worst-case start times of the actor due to the mono-
tonicity of the HSDF. From equation 1 it follows that a
periodic regime will be entered and therefore simulation
can be stopped after the ﬁrst period of the periodic regime.
V. COMMUNICATION VIA A FIFO
In this section the throughput of a simpliﬁed multipro-
cessor system is derived by applying the analysis tech-
niques that were presented in the previous section. For
ease of understanding, a multiprocessor systems is consid-
ered in this section with only two processors which com-
municate via a FIFO. The application on this multiproces-
sor system consists of two communicating HSDF actors.
Similar techniques will be applied in section IX to derive
the throughput of a multiprocessor system which commu-
nicate via a network and execute an application described
by an SDF graph.
In ﬁgure 2 a multiprocessor system is shown which con-
sists of two processors. These two processors communi-
cate via a FIFO. The application executed on this system is
represented by an HSDF graph which is shown in ﬁgure 3.
It is assumed that the actor P is executed on processor
proc1 and the actor C is executed on processor proc2. The
tokens communicated between actor P and actor C have
a size of one word which is produced (or consumed) by
a processor in one clock cycle. Processor proc1 is stalled
(for example by stopping its clock) if the FIFO is full and
processor proc2 is stalled if the FIFO is empty.
proc 2
stall stall
proc 1 FIFO
full empty
cap=2
Fig. 2. A multi processor system which consist of two pro-
cessors which communicate via a FIFO. The data produc-
ing/consumingprocessorisstallediftheFIFOisfull/empty. The
capacity of the FIFO is assumed to be 2 tokens.
P
1 1
C
1 1 1 1
Fig. 3. An implementation unaware HSDF graph. Succesive
executions of the actors result in a stream of tokens between the
actors.
The throughput of this multiprocessor system can be de-
rived with the implementation aware HSDF graph shown
in ﬁgure 4. Each actor in this HSDF graph is annotated
with its Worst-Case Execution Time (WCET). The WCET
of an actor must include the time the processor is stalled
during the execution of the actor. The processors are not
stalled in this example during the execution of an actor be-
cause the actors ﬁre as soon as all space or data is available4
to ﬁnish their execution. This is the case because the next
ﬁring of actor P takes place as soon as the previous execu-
tion of this actor ﬁnishes and there is space for one token
in the FIFO. That there is space is indicated by the deacti-
vation of the full signal. After the full signal is deactivated
the processor can start executing the actor and it will ﬁn-
ish its execution because it is assumed that the full signal
is ignored by the processor till the execution of the actor
is ﬁnished. The actor C will ﬁre as soon as there is one
token in the FIFO and the empty signal is deactivated. The
processor will not stall during the execution of actor C be-
cause the empty signal is ignored till the processor ﬁnishes
the execution of actor C.
In this example it is assumed that the capacity of the
FIFO between the processors is two tokens. This FIFO
capacity is modeled in the implementation aware HSDF in
ﬁgure 4 by the 2 initial tokens on the edge from actor C to
actor P.
2ms 1ms
C P
2
Fig. 4. An implementation aware HSDF graph in which the
edge from C to P with 2 initial tokens models a FIFO with a
capacity of 2 tokens.
The MCM of the HSDF graph in ﬁgure 4 equals the in-
verse of the guaranteed minimal throughput of the system.
This throughput will be obtained if there is no interaction
with the environment. For systems that do interact with the
environment it should be guaranteed that no data is lost
due to overﬂow or underﬂow of the buffers between the
system and the environment. A typical example of such a
system is a system in which the input data is provided by a
strictperiodexternalsourcelikeanA/Dconverterandcon-
sumed by a strict period external sink like a D/A converter.
In order to verify whether the FIFOs at the input and out-
put of the system do not overﬂow or underﬂow it is needed
that the strict period source and sink are modeled as actors
in the HSDF graph as is shown in ﬁgure 5. The source and
sink actors are given a WCET equal to the length of pe-
riod of the A/D’s and D/A’s strict periodic clock. The self
edge with one initial token ensures that the next execution
of the source and sink actor cannot start before the previ-
ous execution is ﬁnished. Thus, the self-edge in combina-
tion with the WCET of the source and sink actors enforces
a maximal execution frequency of these actors during the
self-timed execution of the HSDF graph in the simulator.
During selftimed execution of the HSDF in a simulator it
2ms 1ms 2ms 2ms
3 4 2
So P C Si
system boundary
Fig. 5. An HSDF graph which is used to prove that, given a
strict periodic source and sink, the FIFOs at the input and at the
output of the system have sufﬁcient capacity.
should be veriﬁed that the time between successive exe-
cutions of the source as well as the sink actor is equal to
the WCET of these actors. This simulation can be stopped
after the ﬁrst period of the periodic regime. If the source
and sink actor execute strict periodic in the simulator then
it is guaranteed that, in an implementation of the system,
the FIFO between source and the system never overﬂows
andtheFIFObetweenthesystemandthesinkneverunder-
ﬂows. The reason is that, due to monotonicity, tokens will
not arrive and depart later in the implementation than dur-
ing a simulation run in which all actors have an execution
time equal to their worst-case execution time. If tokens do
not depart later than during the simulation run, then this
results in the same or less tokens in the FIFO between the
source and the system. If tokens do not arrive later than
during the simulation run, then a greater or equal number
of tokens is in the FIFO between the system and the sink
in the implementation.
In the next section a network instead of FIFO between
two communicating processors will be introduced and the
worst-case arrival times of tokens will be derived.
VI. COMMUNICATION VIA A NETWORK
In this section a network for the communication be-
tween processors is introduced in the multiprocessor sys-
tem. It is assumed that this network supports the guaran-
teed throughput service which guarantees that a predeﬁned
amount of data can be transferred per period. Given such
a network it is possible to derive the worst-case temporal
behavior of the system.
The packet switched network that is considered in this
paper contains shared interconnect between two routers
which is called a link between these routers. The routers
in this network are schematically depicted in ﬁgure 6 as
switches which are during a predeﬁned time interval in one
position and then switched to the next position. During
this time interval a predeﬁned amount of data can be trans-
ferred from an input FIFO into an output FIFO. Data does
not need to be buffered inside the network because it is as-
sumed that the switches in this network are synchronized.5
The controller which moves the switch into the next
position can be seen as a Time Division Multiple Ac-
cess (TDMA) arbiter which grants the link during a time-
interval for transfer of data from an input FIFO into an
output FIFO. The TDMA arbitration performed by this ar-
biter is modeled in an HSDF model which is used for the
derivation of the worst-case temporal behavior of the sys-
tem.
Fig. 6. Model of a time shared unidirectional connection in the
network.
The switch in ﬁgure 6 is represented as a router in the
multiprocessor system in ﬁgure 7. This router can be seen
as a special purpose processor which transfers data during
an interval from one of its inputs to one of its outputs. No
data is transferred if the input FIFO of the router is empty
or if the output FIFO is full.
proc 2
stall
proc 1 router
full empty
FIFO1
full empty
FIFO2
stall stall stall
Fig. 7. Two processors which communicate via a packet
switched network. The data producing/consuming processor is
stalled if the local FIFO is full/empty.
An abstract model of the TDMA arbitration performed
in this router is depicted in ﬁgure 8. The arrival time of
the j-th token in FIFO1 and FIFO2 is respectively denoted
by a(j) and b(j). The TDMA arbitration is represented by
a time wheel which rotates every period TA. During the
interval TA1 up to N tokens are transferred from FIFO1
into FIFO2. It is not exactly known when these tokens are
transferred during the interval TA1 but it is sure that these
tokens have arrived in FIFO2 at the end of interval TA1.
The end of the interval TA1 is denoted by f(j). At this
time the j-th token is transferred, therefore a(j) ≤ f(j)
and b(j) ≤ f(j).
In ﬁgure 9 an HSDF model is shown of the arbitration
performed inside the router. In the next section it will be
proven that if the j-th token does not arrive later at obser-
vation point a in the implementation than at observation
point ˆ a in the HSDF model that then it will also not ar-
rive later at observation point b in the implementation than
at point ˆ b in the HSDF model. In other words it will be
proven that if equation 4 holds that this implies that equa-
tion 5 holds. In [6] it is proven that in this case tokens
in the HSDF model of a system, which includes the HSDF
FIFO2
A1
FIFO1
a(j) b(j) TA
An
0 ≤ n ≤ N
n n
Fig. 8. Abstract model of a TDMA arbiter which allows the
transfer of maximally N tokens during the interval TA1 per pe-
riod TA.
modelof anarbiter, donot arrivelaterat observationpoints
than tokens in the implementation.
A1
ˆ b(j)
TA1
N
ˆ a(j)
A
TA
ˆ c(j)
Fig. 9. HSDF model of the TDMA arbiter which allows the
transfer of maximally N tokens during the interval TA1 per pe-
riod TA.
a(j) ≤ ˆ a(j), j ≥ 0 (4)
b(j) ≤ ˆ b(j), j ≥ 0 (5)
The implementation aware HSDF model of the multi-
processor system in ﬁgure 10 includes the HSDF model
of the arbitration performed inside the router. The N1 ini-
tial tokens on the edge from actor A1 to actor P model
that FIFO1 has a capacity of N1 tokens and that processor
proc1 will be stalled if FIFO1 is full. The N2 initial tokens
on the edge from actor C to actor A model that FIFO2 has
a capacity of N2 tokens and that the router does not trans-
fer data if FIFO2 is full. The minimum throughput of the
system in ﬁgure 7 is equal to the inverse of the MCM of
the HSDF graph in ﬁgure 10.
A1 A
N
N1
P C
N2
router
Fig. 10. Implementation aware HSDF model which includes
an HSDF model of the TDMA arbitration performed inside the
router.6
VII. HSDF MODEL OF A TDMA ARBITER
In this section it is proven that if equation 4 holds that
this implies that equation 5 holds for the TDMA arbiter in
ﬁgure 8 and the HSDF graph in ﬁgure 9.
It is given that FIFO1 in ﬁgure 8 is initially empty.
Therefore, the arrival of the ﬁrst N tokens (0 ≤ j ≤ N−1)
in FIFO2 satisﬁes inequality 6.
f(j) ≤ a(j) + TA + TA1 (6)
Given that initially N tokens are on the selfedge of actor
A in ﬁgure 9, it will be the case that the arrival time ˆ b(j)
of ﬁrst N tokens is according to equation 7.
ˆ b(j) = ˆ a(j) + TA + TA1 (7)
From equation 4, equation 6 and equation 7 it follows
that equation 8 holds.
f(j) ≤ ˆ b(j), 0 ≤ j ≤ N − 1 (8)
Now we want to establish our inductive step by showing
how the truth of our induction hypothesis in equation 9
forces us to accept the truth of f(j + N) ≤ ˆ b(j + N) for
j ≥ 0.
f(j) ≤ ˆ b(j) (9)
For the implementation and j ≥ 0 the following equa-
tions hold in which the intermediate variables tx and ty
are deﬁned:
tx = a(j + N) + TA + TA1 (10)
ty = f(j) + TAn + TA1 = f(j) + TA (11)
f(j + N) ≤ max(tx,ty) (12)
Equation 12 holds because two situations can occur. If
a(j+N) > f(j) then token j+N has arrived after the ex-
ecution of actor A1 is ﬁnished (see ﬁgure 11). In this case
there are less than N tokens in FIFO1 at the moment that
token j+N arrives in FIFO1. After arrival of token j+N
it will take maximally TA + TA1 before this token departs
from FIFO1 because up to N tokens can be transferred per
rotation of the time-wheel.
If a(j + N) ≤ f(j) then token j + N has arrived in
FIFO1 before the execution of actor A1 is ﬁnished (see
ﬁgure 12). After the execution of actor A1, where the j-th
token has transferred, it takes maximally TAn+TA1 before
the execution of actor A1, where the (j + N)-th token has
transferred, is ﬁnished.
f(j)
TA + TA1
f(j+N) a(j+N)
Fig. 11. Arrival of token j+N in FIFO1 after the execution of
actor A1, where the j-th token has transferred, is ﬁnished.
a(j+N)
TAn + TA1
f(j+N) f(j)
Fig. 12. Arrival of token j+N in FIFO1 before the the execution
of actor A1, where the j-th token has transferred, is ﬁnished.
For the SDF model in ﬁgure 9 the following equations
hold for j ≥ 0.
ˆ c(j + N) = max(ˆ a(j + N),ˆ b(j) − TA1) + TA (13)
ˆ b(j + N) = ˆ c(j + N) + TA1 (14)
Equation 13 substituted in equation 14 results in:
ˆ b(j+N) = max(ˆ a(j+N),ˆ b(j)−TA1)+TA+TA1 (15)
Therefore, for j ≥ 0 the following equations hold in
which the intermediate variables tp and tq are deﬁned:
tp = ˆ a(j + N) + TA + TA1 (16)
tq = ˆ b(j) + TA (17)
ˆ b(j + N) = max(tp,tq) (18)
It follows from equation 4 that tx ≤ tp and from our in-
duction hypothesis in equation 9 it follows that ty ≤ tq.
Because equation 19 holds we have proven that equa-
tion 20 holds for j ≥ 0:
tx ≤ tp ∧ ty ≤ tq ⇒ max(tx,ty) ≤ max(tp,tq) (19)
f(j + N) ≤ ˆ b(j + N) (20)
And given that equation 8 holds we have therefore
proven that if equation 4 holds that this implies that equa-
tion 5 holds because b(j) ≤ f(j).7
VIII. MULTIPLE ROUTERS
In section VI a network was considered with only one
router. However large scale multiprocessor systems will
contain networks with a number of routers. Buffering of
datainsidethenetworkisundesirableandcanbeprevented
by synchronization of the switches inside the routers and
by taking care that data is only transferred through the net-
work if it can be stored at the output of the network. In this
case it is necessary to inform the sending router how much
space there is in the output buffer. This information is send
in the opposite direction via a second guaranteed through-
put connection. In order to derive the throughput of the
system this second connection must be taken into account
in the HSDF model of the system. In this section such a
HSDF model is derived for the multiprocessor system in
ﬁgure 13 on which the application in ﬁgure 3 is executed.
The amount of space that is available in FIFO2 in ﬁg-
ure 13 depends on when processor 2 reads data from this
FIFO. As soon as data has been read it is necessary to in-
form router R1 that space has become available in FIFO2
such that router R1 can start to transfer data from FIFO1 to
FIFO2. A straightforward solution would be the transfer of
a credit token to router R1 for each word read from FIFO2.
However, in order to save bandwidth in the network the
amount of space that is available in FIFO2 is sampled by
the router R2 with a ﬁxed period and send as one data word
to router R1.
F
I
F
O
2
F
I
F
O
1 #space in FIFO
R1 R2
bi-directional network link
network
processor 2 processor 1
stall stall
Fig. 13. Network with multiple routers.
In ﬁgure 14 an implementation aware HSDF model is
shown which includes a model of the network. Actor C
sendinthismodelacredittokentoactorR1toindicatethat
a data token has been consumed from FIFO2 by this actor.
This credit token arrives at actor R after TLc which reﬂects
that router R1 in the implementation receives periodically
a credit token in which a value is stored which indicates
how much space has become available since the last credit
token has been sent.
R1 R11
N3
N2
C P
Lc
Lp
N1
Fig. 14. HSDF model of a connection in the network with mul-
tiple routers.
IX. MULTIPROCESSOR TEMPLATE
Anassumptionmade intheprevioussection wasthatthe
application was described as an HSDF and that the tokens
had a size of one word. In this section, the multiprocessor
system is extended such that it can be used to execute an
application which is described by arbitrary SDF graph in
which there are no restriction on the size of the tokens. The
throughput of this system is derived with an SDF model.
The type of applications for which this multiprocessor sys-
tem is intended, is described in more detail in [7].
The multiprocessor template in ﬁgure 15 includes be-
sides a network also a local data memory (DMEM) for
each processor and a Communication Assist (CA) which
copies data between the FIFOs in the Network Interface
(NI) and the data memory. The input and output data of the
actors that are executed on a processor are stored in logical
FIFO buffers in the local memory of the processor. These
logical FIFO buffers can be implemented in software with
a Cheap [8] like protocol. The data can be randomly writ-
ten and read by the processor within the space reserved for
a token in these logical FIFOs.
In order to derive the WCET of an actor it is necessary
to know in advance the number of cycles that the processor
will be stalled during the execution of the actor. Stalling of
the processor due to absence of data is prevented by check-
ing that all input tokens for the actor are present in the lo-
cal memory as well as sufﬁcient output space is present
in the local memory before an actor ﬁres. The number of
stall cycles caused by each load and store operation that
is executed on the processor will be maximally one cycle
by taking care that the bus is granted to the processor at
least once every other clock cycle. Given maximally one
stall cycle for each memory access it is possible to derive
the WCET of an actor with static program analysis tech-
niques [9].
Actors executed on a processor access the logical FIFOs
in the memory instead of the FIFOs in the network inter-
face in order to allow the production and consumption of
tokens that are larger than the capacity of the FIFOs in the8
FIFO
filling
FIFO
filling
leaf leaf
I
M
E
M
I
M
E
M
b
u
s
b
u
s
R R
stall stall
CA1 CA2
DMEM2
NI2 NI1
DMEM1
network
processor 1 processor 2
Fig. 15. Multiprocessor template suitable for the execution SDF
graphs. The dashed line indicate the path the data ﬂows from an
actor that produces the data to an actor that consumes this data.
network interface. These tokens are typically larger than
the capacity of these FIFOs (32 words) because these FI-
FOs must have a small physical size in order to handle the
high accesses frequency of the routers in the network.
The communication assists are responsible for the trans-
fer of data between the FIFOs in the network interface and
the data memory. A communication assist transfers up to
N tokens during a predeﬁned interval just like a router in
the network does. It can therefore be represented by the
HSDF model that is shown in ﬁgure 9.
The SDF model from which the throughput of the mul-
tiprocessor system can be derived is obtained by replacing
each edge in the implementation unaware SDF graph by
the model of a guaranteed throughput connection. Such an
SDF model is shown in ﬁgure 16. The model of a guar-
anteed throughput connection is surrounded by a dashed
box in this ﬁgure. This model includes the arbitration
performed by the CAs. In this model it is assumed that
credit tokens in the network are handled by the network
interfaces instead of by the routers. Each actor inside the
dashed box consumes and produces one token per execu-
tion. The number of tokens produced or consumed by ac-
tor P and C is respectively equal to the values of the pa-
rameters n8 and n9. The values of the parameters N1, N3,
N5, and N7 are respectively equal to the capacities of the
FIFOs in DMEM1, NI1, NI2, and DMEM2. The values
of the parameters N2, N4 and N6 represent the maximum
number tokens transferred per period by respectively CA1,
NI1 and CA2.
connection
Lc
C
N1
CA11
CA1
NI
NI1
N4
Lp CA2
CA21
N5
n9
n9
P
n8
n8
N2 N6
N7 N3
Fig. 16. SDF model of a guaranteed throughput connection.
X. PROCESSORS SHARING
In the previous section an example was presented in
which only one actor was executed on each processor. In
this section the execution of multiple actors on a proces-
sor is considered. The execution of multiple actors on one
processor requires a local scheduler. This local scheduler
can be seen as an arbiter whose task is to grant one actor
out of a set of actors to start its execution on the proces-
sor. Before an actor executes, it ﬁrst checks whether there
are sufﬁcient tokens on each of its inputs and there is suf-
ﬁcient amount of space available on each of its outputs.
The actor returns immediately if it detects that there are
an insufﬁcient number of tokens or there is an insufﬁcient
amount of space available.
W
Twait
ˆ c(j)
A
TA
ˆ b(j) ˆ a(j)
Fig. 17. SDF model in case round-robin arbitration is applied.
In [6] it has been shown that the arbitration of a pro-
cessor can be modeled in an SDF graph. This is possi-
ble if so-called predictable arbitration policies are applied
for which a waiting time Twait and a use time Tuse can
be deﬁned. Examples of predictable arbitration policies
are Rate Monotonic, Earliest Deadline First, TDMA and
round-robin. If for example round-robin arbitration is ap-
plied then the effects of the arbitration can be modeled by
replacing each actor in the implementation unaware SDF
graph with the SDF graph that is shown in ﬁgure 17.
In ﬁgure 18 an implementation aware SDF graph is
shown for which it is assumed that actor A1 and A3 are
executed on processor 1. The implementation aware SDF
graph for this application is shown in ﬁgure 19. Each box
in this ﬁgure represents an HSDF model of a guaranteed
throughput connection. The WCET of actor W1 in this9
A3 A1 A2
n6
n5
n2
n3 n4 n1
Fig. 18. Implementation unaware SDF model.
model is equal to the WCET of actor A3 and the WCET
of actor W3 is equal to WCET of actor A1 because it was
assumed that actor A1 and A3 share processor 1. Actor A2
in ﬁgure 18 is not replaced in ﬁgure 19 by the SDF graph
shown in ﬁgure 17 because this actor is the only actor that
is executed on processor 2.
CN CN
CN
A2
N2
n1
A1
W1
N6
N3
N5
A3
n6
n5
n4 n3 N4
W3
n2
n1
n2
n5
n6 N1 n3 n4
Fig. 19. Implementation aware SDF model. Each box in this
ﬁgure represents an HSDF model of a connection.
The worst-case arrival times of tokens in the system can
befoundbysimulatingtheSDFgraphinﬁgure19inaself-
timed manner. The MCM of the system can be determined
after the SDF graph in ﬁgure 19 is transformed into an
HSDF graph. This MCM is equal to the inverse of the
minimal throughput of the system.
XI. CONCLUSION
In this paper an embedded multiprocessor system is pro-
posed which includes a packet switched network that sup-
ports communication between processors with a prede-
ﬁned guaranteed bandwidth and a maximum latency. A
synchronous data ﬂow model of such a network is pre-
sented which is used to derived the worst-case temporal
behavior of an application that is executed on this multi-
processor system. It is proven that this synchronous data
ﬂow model captures the worst-case behavior of the TDMA
arbiter in the network despite that this arbiter allows a vari-
able number of tokens to be transferred in one time slice.
Also the local scheduling performed on the processors can
be captured in the same SDF graph. The minimal through-
put and the worst-case arrival times of tokens in the system
can be derived given this SDF graph.
REFERENCES
[1] F. Bacelli, G. Cohen, G.J. Olsder, and J-P. Quadrat, Synchroniza-
tion and Linearity, John Wiley & Sons, Inc., 1992.
[2] H. Zhang, “Service disciplines for guaranteed performance ser-
vices in packet-switching networks”, Proceedings of the IEEE,
October 1995.
[3] S. Sriram and S.S. Bhattacharyya, Embedded Multiprocessors:
Scheduling and Synchronization, Marcel Dekker, Inc, 2000.
[4] E.L. Lawler, Combinatorial optimization: Networks and Matroids,
Holt, Reinhart, and Winston, New York, NY, USA, 1976.
[5] J. Cochet-Terrasson, G. Cohen, S. Gaubert, M. McGettrick, and J.-
P. Quadrat, “Numerical computation of spectral elements in max-
plus algebra”, in Proc. IFAC Conf. on Syst. Structure and Control,
1998.
[6] Marco Bekooij and Jef van Meerbergen, “Timing analysis of data
driven hard-RT multiprocessor systems”, Not published yet.
[7] M. Bekooij, O. Moreira, P. Poplavko, B. Mesman, M. Pastrnak,
andJ.vanMeerbergen, “Predictableembeddedmultiprocessorsys-
tem design”, Accepted for: Proceeding of the SCOPES workshop,
September 2004.
[8] O.P. Gangwal, A. Nieuwland, and P. Lippens, “A scalable and
ﬂexible data synchronization scheme for embedded hw-sw shared-
memory systems”, International Symposium on System Synthesis,
2001, pp. 1–6, ACM.
[9] Y-T. S. Li and S. Malik, Performance analysis of real-time embed-
ded software, ISBN 0-7923-8382-6, Kluwer academic publishers,
1999.