Semantics-Preserving Implementation of Synchronous Specifications Over Dynamic TDMA Distributed Architectures by Potop-Butucaru, Dumitru et al.
Semantics-Preserving Implementation of Synchronous
Specifications Over Dynamic TDMA Distributed
Architectures
Dumitru Potop-Butucaru, Akramul Azim, Sebastian Fischmeister
To cite this version:
Dumitru Potop-Butucaru, Akramul Azim, Sebastian Fischmeister. Semantics-Preserving Im-
plementation of Synchronous Specifications Over Dynamic TDMA Distributed Architectures.
International Conference on Embedded Software (EMSOFT), Oct 2010, Scottsdale, AZ, United
States. ACM, pp.199-208, 2010, <10.1145/1879021.1879048>. <inria-00544665>
HAL Id: inria-00544665
https://hal.inria.fr/inria-00544665
Submitted on 8 Dec 2010
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entific research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destine´e au de´poˆt et a` la diffusion de documents
scientifiques de niveau recherche, publie´s ou non,
e´manant des e´tablissements d’enseignement et de
recherche franc¸ais ou e´trangers, des laboratoires
publics ou prive´s.
Semantics-Preserving Implementation of Synchronous
Specifications Over Dynamic TDMA Distributed
Architectures
Dumitru Potop-Butucaru
INRIA,
Unité de recherche de
Rocquencourt,
France
dumitru.potop@inria.fr
Akramul Azim
Department of Electrical and
Computer Engineering
University of Waterloo,
Canada
aazim@uwaterloo.ca
Sebastian Fischmeister
Department of Electrical and
Computer Engineering
University of Waterloo,
Canada
sfischme@uwaterloo.ca
ABSTRACT
We propose a technique to automatically synthesize pro-
grams and schedules for hard real-time distributed (embed-
ded) systems from synchronous data-flow models. Our tech-
nique connects the SynDEx scheduling tool and the Network
Code toolchain in a seamless flow of automatic model trans-
formations that go all the way from specification to imple-
mentation.
Our contribution is the non-trivial connection between the
models manipulated by SynDEx and by the Network Code
toolchain, at both formal and tool level. We provide an
algorithm for converting the data-dependent schedule ta-
bles output by SynDEx into Network Code programs which
can be seen as an “assembly code” level for time-driven dis-
tributed real-time systems. The main difficulty is to en-
sure the preservation of both functionality and the real-time
guarantees computed by SynDEx in the presence of clock
drifts (which are abstracted away in the scheduling model
of SynDEx). Existing tools can convert the resulting Net-
work Code programs into software and hardware-accelerated
execution units.
Categories and Subject Descriptors: D.3.4 [Program-
ming languages]: Processors–Code generation; D.4.7 [Op-
erating systems]: Organization and Design–Distributed sys-
tems, Real-time systems and Embedded systems
General Terms: Algorithms
Keywords: synchronous model, distributed real-time im-
plementation, SynDEx, Clocked Graphs, Network Code, dy-
namic TDMA, clock synchronization
1. INTRODUCTION
This work addresses the implementation of real-time em-
bedded control systems. In the context of model-driven de-
velopment, software developers specify functions in a high-
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
EMSOFT’10, October 24–29, 2010, Scottsdale, Arizona, USA.
Copyright 2010 ACM 978-1-60558-904-6/10/10 ...$10.00.
level data-flow language such as SCADE [20] or Simulink [8].
These formalisms follow cycle-based execution models, where
the various dataflow blocks are cyclically executed in an order
compatible with the data dependencies specified by dataflow
arcs. Conditional execution mechanisms encode execution
modes where each block is executed in specific states and for
specific inputs of the system.
The cyclic execution model is also that of periodic real-
time systems. However, producing efficient real-time sched-
ules and implementations for data-flow specifications with
data dependencies and conditional execution remains chal-
lenging. Spreading the application across a network of in-
terconnected processors further complicates this problem, as
algorithms need to consider several computing elements and
communication lines.
To allow formal reasoning of tractable complexity, related
approaches [17, 23, 11, 29, 8] usually work with high-level
abstractions of the execution hardware. For instance, Wu et
al. [11] assume that communications take zero time, whereas
Caspi et al. [8] work on time-triggered architectures (TTA),
which already offer high-level services such as a global time
reference.
Of particular relevance to this work is the architecture
description formalism of SynDEx [17, 23]. It offers abstract
notions of processor and contention-free communication line,
including an execution model and timing information. The
SynDEx tool takes as input an architecture description and
a SCADE-like synchronous dataflow specification. It pro-
duces a model of the scheduled system in the form of a
data-dependent schedule table assigning a fixed start date
to each (conditioned) computation and communication. Ex-
isting “back-end” code generators use this schedule table to
generate event-driven implementation code by basically im-
plementing the abstract architecture model over the actual
hardware.
In this paper, our synthesis target is the hardware abstrac-
tion layer provided by the Network Code formalism [13],
which we use as an “assembly code” level for time-triggered
distributed embedded systems. To improve the timing pre-
dictability of a processor and its communication interfaces,
the Network Code formalism forces each computation and
communication to run in a fixed amount of time. Existing
tools can convert such programs into efficient implementa-
tions in both software and hardware.
Previous work relied on introducing new abstractions or
algorithms to map the high-level models to low level exe-
Specification
Distributed real-time scheduling
Update timing model
Raw timing model Clock drift model
Functional specification
(Scicos, SCADE, Simulink,...) Timing model
Static real-time schedule
(conditional schedule table)SynDEx
Add clock synchronization messages
in long bus idle sections
(no changes to existing operations)
Static real-time schedule
(conditional schedule table)
Translate into Network Code
(TDMA implementation model)
Network Code programs
Compilation/Synthesis
Running implementationNetwork code
toolchain
Architecture (HW) description
Architecture (HW) description
1: Proposed implementation flow. The dashed boxes iden-
tify the existing SynDEx and Network Code scheduling/syn-
thesis flows.
cution models. Unlike such approaches, we integrate in a
single flow the existing SynDEx scheduling tool and
the code generation toolchain built around the Net-
work Code formalism as pictured in Figure 1. Thus, we
allow the automatic synthesis of hard real-time distributed
(embedded) systems with dynamic TDMA communica-
tion buses from SCADE-like synchronous data-flow formalisms.
Our contribution is the non-trivial glue connecting
the two tools and the associated formal models.
To implement the schedule table generated by SynDEx,
we synthesize the Network Code programs of each proces-
sor. To ensure the correct synchronization between the time-
triggered Network Code programs on the separate processors
of the network, a form of global time is needed on top of the
Network Code formalism and tools. Our main contribution
for this is the definition of an efficient clock synchroniza-
tion mechanism that takes advantage of the specific form
of the SynDEx-generated schedules. We use the clock drift
model to (1) automatically update the timing model taken
as input by SynDEx so that the output scheduling table
also considers the overheads due to clock drift, and (2) to
automatically update the output of SynDEx to insert clock
synchronization communications whenever the bus is idle for
too long (but all operations scheduled by SynDEx are left
unchanged). The resulting implementation has no media
access conflicts, so that the temporal guarantees originally
computed by SynDEx hold.
The remainder of the paper is structured as follows: Sec-
tion 2 gives a brief overview of related work. Section 3
presents the SynDEx scheduling flow, insisting on the defini-
tion of the functional specification, architecture, and static
schedule models. Section 4 presents the Network Code for-
malism, and Section 5 gives an overview of our implemen-
tation flow. Then, Section 6 explains how Network Code
programs are generated from static schedules, and Section 7
explains how the clock drift is accounted for. We conclude
in Section 8.
2. RELATEDWORK
We already cited several approaches to the (distributed)
real-time implementation of conditional data-flow specifica-
tions. Our work differs from them in two main points: (1)
the generation of dynamic TDMA communication protocols,
and (2) the reliance on architecture abstraction techniques,
instead of defining new scheduling algorithms.
On the protocol level, our approach differs from existing
work, as we automatically synthesize an application-tailored
optimized medium access protocol. Traditional real-time
and embedded networking protocols grant applications only
limited control over the communication behaviour at run
time. For example, all message identifiers (=priorities) on
a CAN bus [24] must be unique and application developers
usually ensure this by statically assigning priorities to mes-
sages and thereby defining the behaviour oﬄine. Although
it is technically feasible to reserving multiple identifiers for
the same messages, this approach complicates the design.
More flexible protocols for embedded networking like Power-
link Ethernet [12], FlexRay [15], VARAN [27], FTT-CAN [1]
and its derivate FTT-Ethernet [21] partially depend on ap-
plication state information as for example an application can
request to use the asynchronous slot at the end of cycle. Yet
still, decisions in these protocols are made at the start of the
communication round or earlier. In contrast to this work,
our work tries to optimally tailor the communication and
computation schedule to the needs of the data flow applica-
tion by permitting decisions at anytime in the schedule.
Tailoring the communication and computation behaviour
to data-flow dependencies can significantly improve the per-
formance. Several case studies across different application
areas showed this including control theory [28], hybrid sys-
tems [4], video-on-demand, hierarchical scheduling frame-
works [10] and in general bursty demand models [9, 22].
Some recent work explores similar ideas, but use differ-
ent mechanisms. For example in [28], the authors generate
so-called state-based schedules from high-level specifications
such as control systems. Their work as well as the initial
work [2] uses automata to express the schedule and thus
requires regular specifications. Our work uses the notion of
communication rounds which permits a lower complexity for
analysis, verification, and generation [3].
3. THE SYNDEX SCHEDULINGAPPROACH
The AAA methodology and the SynDEx scheduling tool
[17, 18] have been developed by a team lead by Yves Sorel to
allow the fast automatic generation of efficient distributed
real-time implementations of synchronous dataflow specifi-
cations. As pictured in Figure 1 (the upper dashed box), the
LP
F1 ID
ID
C1
C2
M
N
G ID
F3F2
V
FS IN
FS
ID
¬FS
FS
LP IN
¬LP
LP
2: Example of dataflow specification
SynDEx scheduling tool takes as input a functional specifi-
cation and a timed model of the target execution architec-
ture, and produces a schedule table which is a model of the
statically-scheduled real-time implementation. To exemplify
the SynDEx flow we shall use throughout the paper a simple
example taken from [23].
3.1 Functional specification
The functional specification formalism of SynDEx is a hi-
erarchical synchronous dataflow language similar to Lustre
[20] or its SCADE graphical counterpart. However, various
gateways allow the use of specifications written in Scicos [6]
(a free Simulink-like formalism), Signal [19] (a synchronous
dataflow language), etc.
A SynDEx specification consists of dataflow blocks, which
have input and output ports, and dataflow arcs. Each da-
taflow arc connects one output port to one input port. At
each execution cycle where it is executed, a dataflow block
reads all its input ports and computes all its output ports.
Dataflow nodes can be elementary, which corresponds to
calls to library functions performing elementary computa-
tions such as “+”, “read_data” etc., or composed nodes de-
fined as a dataflow formed of other nodes. The execution of
a composed node takes place as if the node has been replaced
by its dataflow expansion. To allow conditional execution, a
composed node can have several expansions, each one with
its own activation condition telling when the particular ex-
pansion is used. Such nodes are called conditioned nodes.
Figure 2 shows an example input for SynDEx. Instead of
the standard graphical representation, we use an ad-hoc one
allowing the compact representation of the hierarchy lev-
els. The specification represents a system with two switches
(Boolean inputs) controlling its execution: low precision
(LP ) vs. normal precision (¬LP ), and fail-safe (FS) vs.
normal operation (¬FS). In low-precision mode, less oper-
ations are executed than in the normal precision mode. In
the fail-safe mode the actuation operation that gets executed
(N) does not use any of the inputs, because the sensors or
treatment chain are assumed to be faulty (control is done
using default values).
The specified system behaves as follows: At each execu-
tion cycle, the dataflow nodes FS IN and LP IN read FS
and LP from the environment. If LP = false then the exe-
cution of the conditioned node C1 is given by its expansion
of the upper tab (labeled with activation condition ¬LP ).
If LP = true, then the expansion of the lower tab is used,
with activation condition LP . Similarly, the upper and lower
tabs of C2 give its expansions with activation conditions FS,
respectively ¬FS. The input and output ports of the condi-
tioned nodes are represented with circles in the expansions
LP IN=1
M=3
F1=3
FS IN=1
G=3
N=3
F3=2
F3=3
V type=2
ID type=5
boolean=2
F2=8
P1
P3
P2
broadcast bus
3: Example of architecture specification
(as opposed to boxes, which are reserved to nodes). Note
that the ID output of C1 is computed by both of C1’s ex-
pansions, but only the upper expansion of C2 uses the it in
computations. The activation conditions of the expansions
of a conditioned node are Boolean expressions over the in-
puts of the node.
Dataflow blocks having no dependency between them can
execute concurrently. For instance, if FS = true then N
can be executed as soon as FS is read, independently of the
execution of F1, F2, F3, or G. On the contrary, the com-
putation of M must wait until both FS (with value false)
and ID have arrived.
3.2 Architecture description
A hardware architecture description in SynDEx is a bi-
partite graph defining the interconnect between processors
(computing elements) and communication lines, and anno-
tated with timing information. The execution model associ-
ated to processors and buses, as well as the associated timing
model remains abstract, but we shall see that it is adapted
to our TDMA target platforms. An example of architecture
description is given in Figure 3. It has 3 processors, named
P1, P2, and P3, and one broadcast bus connecting them.
1
Each processor is capable of executing one sequential pro-
gram describing the execution of the assigned dataflow nodes.
Each bus is capable of executing a sequence of non-overlapping
broadcast message passing communications (all processors
receive all messages sent on a bus). The buses are free from
errors (permanent or transient). They do not provide con-
tention detection/handling mechanisms, because SynDEx
will ensure that no time frame is assigned to two commu-
nications that can happen in the same execution cycle (and
successive execution cycles do not overlap).
3.2.1 Timing model
To allow real-time scheduling, durations are associated to
all:
• Elementary dataflow blocks on the processors that can
execute them. For instance, node F1 can be executed
only by P1, and its duration is three time units.
• Data transfers of the various types of messages ex-
changed between nodes, on the buses that can perform
these communications. For instance, the transmission
of a data of type ID type can be done by our bus in
five time units.
In our case, most dataflow nodes can only be executed on
a single processor, the notable exception being F3, which
1Other types of buses and communication lines can be de-
fined, but we are not interested in them in this paper.
@(LP=false)
@(LP=false
∧FS=false)
Send(P1,V)
FS IN@true
P1 P2
3
2
1
0
4
5
6
7
8
9
10
11
12
13
14
15
16
P3 Bus
Computation and communication resource
T
im
e
fl
o
w
LP IN@true
F1@(LP=false)
F3@(LP=false)
Send(P1,LP)@true
Send(P1,FS)@true
G@LP=true
@(FS=true)
N
M
@(FS=false)
Send(P2,ID)
@(FS=false
∧ LP=true)
Send(P1,ID)
F2@(LP=false)
4: Schedule table for our example
can be executed, with different durations, by both P2 and
P3. The durations can be used to specify complex partial
allocations of the computations to processors, be them due
to physical constraints (e.g., I/O operations on the I/O pro-
cessor), or to allocation patterns desired by the user.
Real-time durations are interpreted as worst-case dura-
tions in the absence of all interference.
All control (tests, branching, protocol stacks, etc.) are as-
sumed to be executed in zero time and their actual cost must
be included in the worst-case durations provided above. The
system has a precise global real-time reference and uses
time-triggered execution of computation and communication
based on this global clock.
3.3 Static scheduling
Recall that our synchronous functional specifications have
a cycle-based execution model where the same finite descrip-
tion of decisions and computations is traversed at each exe-
cution cycle. Then, a natural way of of providing a real-time
schedule of its infinite execution is to compute a schedule of
one execution cycle. This finite schedule is then repeated
each repetition representing one cycle.
This is the approach taken by SynDEx. The schedule
computed for our small example is given in Figure 4. The
schedule is static, in the sense that each computation and
communication has a fixed start date inside the execution of
the cycle.
Figure 4 shows the schedule table. Each column represents
one processor or the bus. The description of each scheduled
operation o (computation or communication) includes the
operation to be performed (denoted operation(o)), the start
date date(o), the duration duration(o), and its activation
condition clk(o). For instance, the last operation on the
bus is the operation send by processor P1 of the variable
V . This operation takes two time units and is executed in
cycles where LP = false.
At most one operation is active at a time on each pro-
cessor or bus. Therefore, two operations in one column can
only overlap if their activation conditions are mutually ex-
clusive. In our example, the two communication operations
that overlap in time have to have exclusive activation con-
ditions. In Figure 4, the width (horizontal projection) of
the operation boxes intuitively represents their activation
conditions, in a way similar to Venn diagrams. Two oper-
ations have exclusive conditions when their projections are
exclusive.
The scheduling algorithm of SynDEx respects the data de-
pendencies of the specification. More precisely, in the sched-
ule table, all data needed to perform a scheduled operation o
is available at the operation start. When o is a computation
on a processor, all the variables needed to compute the ac-
tivation condition clk(o) and all the inputs of the dataflow
node operation(o) are available on the processor at date
date(o). When o is the sending of variable V by processor
P , then (1) V is available on P at date date(o), and (2) all
the variables needed to compute clk(o) are available on all
processors connected to the bus at date date(o). Condition
(2) ensures that all processors can decode all bus traffic.
These properties ensure that the bus schedule is easily
implementable as a set of distributed programs on the ac-
tual hardware. Various back-ends [17] exist translating such
schedule tables into code running over asynchronous commu-
nication buses such as CAN [24] or Ethernet. We shall see
in the next section that the time-triggered nature of these
tables is easily mapped onto the time-triggered execution
model of Network Code.
4. THE NETWORK CODE FORMALISM
Network Code is a domain-specific programming language
for the implementation of distributed real-time systems with
time-triggered (TDMA) communication systems. It orig-
inates from the goal to functionally and temporally tai-
lor properties of the computation and communication be-
haviour. The formalism specifically concentrates on mes-
saging and the media access control layer. The language
consists of a small set of assembly-like instructions with
well defined operational semantics. Besides software proto-
types, Network Code has been implemented as a hardware-
accelerated special processor [14] on top of Ethernet and
inside a network switch [7]. We use Network Code to give
a time-triggered implementation of the static schedule table
generated by SynDEx.
Network Code assumes a set of computation units, cor-
responding to the processors of a SynDEx architecture de-
Communication programn
. . .
Processor
Busn
Bus1
Local I/O
Computation program
S
h
ar
ed
m
em
or
y
Common time reference bound
Network processor
Network processor
Communication program1
5: Internal organization and execution model of one pro-
cessor
scription, and a set of broadcast buses connecting these pro-
cessors.
The buses use a time-division multiple access (TDMA)
communication scheme which, if correctly scheduled, elimi-
nates all errors due to message collision/interference. Com-
munication slots and rounds form the building blocks of such
a scheme. A slot starts at a given time and ends after a
known duration. Within the slot’s time span at most one
computation unit has write access to the bus. A sequence
of slots defines a communication round after which the be-
haviour repeats.
1 START : wait (1 )
L1 : i f true then
3 future ( L2 , 2 )
send ( bus_id , sizeof ( LP ) , LP )
5 halt ( )
endif
7 wait (16)
goto ( START )
9 L2 : i f true then
future ( L3 , 2 )
11 send ( bus_id , sizeof ( FS ) , FS )
halt ( )
13 endif
wait (14)
15 goto ( START )
L3 : i f (not LP ) and (not FS ) then
17 future ( L5 , 8 )
send ( bus_id , sizeof ( ID ) , ID )
19 halt ( )
endif
21 wait (1 )
goto ( L4 )
23 L4 : i f LP and (not FS ) then
future ( START , 1 1 )
25 wait (5 )
receive ( bus_id , ID )
27 halt ( )
endif
29 wait (7 )
goto ( L5 )
31 L5 : i f (not LP ) then
future ( START , 4 )
33 send ( bus_id , sizeof ( V ) , V )
halt ( )
35 endif
wait (4 )
37 goto ( START )
Listing 1: The Network Code program generated from the
schedule table of Figure 4 for the network processor of P1
Figure 5 outlines the architecture of one computation unit
in our system showing the hardware in greyed boxes and
the software in white boxes. To execute the computation
operations (in our case dataflow nodes) scheduled on it, a
computation unit has one computation processor. To control
the TDMA bus connections (i.e., send and receive messages),
a computation unit has one network processor per connected
bus. The processor and network processors are connected
through a shared memory. They also share a common time
reference, called the clock of the computation unit. Each of
the computation and network processors is executing exactly
one program and the programs of a computation node are
in sync due to the common time reference.
In our concrete implementation, the network processor
is an application-specific instruction set (ASIP) processor.
The Network Code toolchain (the lower dashed box in Fig-
ure 1) takes the Network Code programs associated to the
network processors of the various computation units and
generates a binary executable for the available hardware-
accelerated implementation called the Network Code Pro-
cessor [14]. The binary executable contains the Network
Code program and the memory setup as a byte stream that
is uploaded into the configuration memory of the FPGA
hardware. It also takes the Network Code program of the
computation processor and generates executable code.
Listing 1 shows a simplified, readable version of the the
Network Code program which we automatically synthesize
from the schedule table of Figure 4 for the network pro-
cessor of the P1 processor of our example. This program
handles the bus communications of P1 and implements the
bus schedule of Figure 4. It starts by waiting for two time
units. Then, it sends on the bus a message containing the
value of LP . The future statement of line 3 sets up a timer
that will jump to label L2 after two time units. The com-
bination of future and the halt statement in line 5 (which
blocks execution) ensures that L2 will be reached at date
four, even if the send completes earlier.
Starting from label L4, the program shows the code cor-
responding to the send of ID by processor P2 (which P1
receives). The execution condition of the receive operation
(LP ∧ ¬FS) guards its execution. The receive statement
simply collects the value of ID from a buffer after the com-
pletion of the actual communication (after waiting for 5 time
units). The future statement prescribes a jump to the pro-
gram start, because after the receive operation completes
there is no bus operation of compatible activation condition
in the current execution cycle. Therefore, we need to start
the next cycle (after the prescribed ten time units, counted
from the beginning of the receive operation).
5. FRAMEWORK OVERVIEW
The main advantage in giving a Network Code-based im-
plementation to the SynDEx-generated schedule tables is
that both formalisms have a time-triggered execution model.
However, several problems remain.
The main formal problem is that the Network Code for-
malism does not define a global time reference for the dis-
tributed platform. Therefore, the development of a running
TDMA distributed system using Network Code necessar-
ily passes through the definition of a clock synchronization
mechanism. Moreover, the use of SynDEx for scheduling
means that the clock synchronization overheads must be
compatible with the timing model of SynDEx. We shall
address this problem in Section 7. Until then, we shall
assume that we have a global time reference.
To complete the definition of our implementation problem,
we assume that (1) there are no permanent or transient mes-
sage transmission errors on the bus, and (2) there is no need
for data packetization, the data being already cut into small-
enough pieces at specification level. These assumptions will
be the subject of future work.
Under these assumptions, our implementation problem is
that of synthesizing the Network Code programs implement-
ing the schedule table of SynDEx. For each processor P
specified in the SynDEx architecture description (=Network
Code computation unit), we need to synthesize:
• One “computation program”, denoted
ComputationProgram(P )
• One “communication program”, denoted
NetworkProgram(B,P ) per bus B the processor P is
connected to (the program of the associated network
processor).
It is interesting to note that existing SynDEx back-ends
structure the generated event-driven code in a similar way,
with one computation thread and one communication thread
per connected bus for each processor.
The main difficulty during the generation of the Network
Code programs is related to the move from the absolute
time references (start dates) of the SynDEx schedule tables
to the relative time (timeouts) manipulated by the future
and wait statements of Network Code.
6. NETWORK CODE GENERATION
This section explains how the Network Code programs
mentioned above are generated from the schedule table. We
provide the two algorithms generating respectively: (1) the
Network Code programs of the various bus interfaces and
(2) the computation program.
6.1 Network Processor code generation
To simplify the algorithm, we introduce the following def-
initions and notations:
• λ denotes the global length of the schedule.
• SB is the set of communication operations scheduled
on bus B.
• D = {d | ∃o ∈ S : date(o) = d)} is the set of dates of
operations in SB . We shall assume thatD = {d1, . . . , dn}
with d1 < . . . < dn.
• Under the previous notation, the generated network
code program uses n+1 jump labels, which are START,
L1, . . ., Ln.
• SiB = {o ∈ SB | date(o) = di} is the set of bus oper-
ations scheduled on date di for some 1 ≤ i ≤ n. We
shall assume that SiB = {o
i
1, . . . , o
i
ki
} where ki > 0 is
the number of elements of SiB .
• clki =
Wki
j=1
clk(oij) is the union of the activation con-
ditions of the operations of SiB .
• ci is the activation condition defining when Li is jumped
to. It is computed by the translation algorithm. By
construction, clki implies ci.
The algorithm building the Network Code program
NetworkProgram(B,P ) is provided in Function 1. It fol-
lows the simplified Network Code conventions used through-
out the paper for the send and receive primitives:
• The send operations take as argument the bus identi-
fier, the variable (memory zone), and the variable size.
• The receive operations take as argument a variable
(memory zone) which must be large enough to allow
the storage of the transmitted data.
Function 1 Computation of NetworkProgram(B,P )
Input: P , SB , bus_id, λ
Output: NetworkProgram(B,P )
1: for i = 1 to n do ci := false done
2: for i = 1 to n do
3: for j = 1 to ki do
4: {Step 1: Build the code for oij}
5: if operation(oij) = Send(P, V ) for some variable V
then
6: Let OP ij be the piece of code:
7: send (bus_id, sizeof(V), V)
8: else
9: {operation(oij) = Send(P
′, V ) for some P ′ 6= P}
10: Let OP ij be the piece of code:
11: wait(duration(oij))
12: receive (bus_id,V)
13: end if
14: {Step 2: Build the future statement}
15: Let dm be the smallest element of D with dm ≥
di + duration(o
i
j) and such that there exists o
′ ∈
SB(dm) with clk(o
i
j) ∧ clk(o
′) 6= false.
16: if such a dm exists then
17: cm := cm ∨ clk(oij)
18: Let F ij be the piece of code:
19: future(dm − di,Lm)
20: else
21: Let F ij be the piece of code:
22: future(λ− di,START)
23: end if
24: {Step 3: Assemble the full code for oij}
25: Let Cij be the piece of code:
26: if( clk(oij) ) then
27: F ij
28: OP ij
29: halt()
30: endif
31: end for
32: {Step 4: Where to go if no oij is executed}
33: Let dm be the smallest element of D with dm ≥ di +
max{duration(oij) | 1 ≤ j ≤ k
i} and such that there
exists o′ ∈ SB(dm) with clk(o
′)∧ (ci ∧¬clki) 6= false.
34: if such a dm exists then
35: cm := cm ∨ (ci ∧ ¬clki)
36: Let F i be the piece of code:
37: wait (dm − di)
38: goto (Lm)
39: else
40: Let F i be the piece of code:
41: wait (λ− di)
42: goto (START)
43: end if
44: {Step 5: Assemble the full code for SiB}
45: Let Ci be the piece of code:
46: Li: Ci1
47: Ci2
48: . . .
49: Ci
ki
50: F i
51: end for
52: {Step 6: Final code assembly}
53: Let NetworkProgram(B,P ) be the piece of code:
54: START: wait(d1)
55: C1
56: . . .
57: Cn
To clarify the way the algorithm works, we already gave in
Listing 1 of Section 4 the Network Code program it generates
from the schedule table of Figure 4 for the network processor
of processor P1. We also explained there the functioning of
the generated code. We explain here how code generation
takes place.
The algorithm works by grouping together all scheduled
operations of SB with the same start date, and assigns a
jump label to every such date/group. In our example, each
operation has a different start date, so we have 5 jump labels
(L1 to L5) plus the START label defining the entry point of
the program, which is also reached at the beginning of every
cycle. Each of the Li labels points to a sequence of guarded
statements, followed by a delayed unconditional jump state-
ment. Each of the guarded statements corresponds to an op-
eration of SB belonging to the group associated with Li. In
the guarded statement associated to operation o, the guard
is the activation condition clk(o), and the statement includes
the actual code for executing operation(o) (a send or re-
ceive statement). The remaining future and halt state-
ments ensure 2 important properties:
• The operation code takes a fixed amount of time (the
one specified by the schedule).
• After this fixed duration, control is given to the next
Li jump label (=date in the schedule table) where an
operation may be activated. If no such jump labels
exist, control is given to START.
For instance, the guarded code after label L3 encodes the
Send(P1,ID)(LP=false∧FS=false) operation of Figure 4.
Starting from its start date (5), the next time date where an
operation has an activation condition which is not exclusive
with the one of our operation is 13. Therefore, the future
statement is set to give control in 8 (=13-5) time units to
the label L5 corresponding to date 13.
It is possible that even if we jump to a certain Li label,
none of the associated operations are activated. This can
happen in cases where ci ∧ ¬clki 6= false. In our example,
L3 exhibits this behavior, because ci = true and clki =
((LP = false) ∧ (FS = false)). To cover these cases, the
code following each label ends with an unconditional jump
to the next jump label where an operation has a condition
which is not exclusive with ci ∧ ¬clki. The computation of
the ci activation conditions is realized in lines 1,17, and 35
of the algorithm.2
2The advantage of this translation scheme is its simplic-
The algorithm works as follows: Step 1 builds the code
for the actual send or receive operations. Step 2 builds
the associated future statement, and Step 3 assembles the
whole guarded code. Step 4 builds the unconditional jump
code, and Step 5 builds the whole code associated with a
jump label Li. Finally, Step 6 assembles the whole program.
Note that the generated code includes the jumps to the
START label that are needed to start a new computation at
each cycle. Also note, in lines 14 and 32, the slight opti-
mization that uses an analysis of the activation conditions
to minimize the number of jumps between labels (the code
would also work by replacing Lm with the label L(i + 1),
but with larger computational requirements). Of course,
the code can still be largely optimized.
6.2 The computation program
Given that the execution model is identical to that of the
communication programs, the only modification needed in
Function 1 to generate ComputationProgram(P ) is in Step
1, which must be fully replaced with:
1: {Step 1: Build the code for oij}
2: Assume operation(oij) is the execution of function F
3: Let OP ij be the piece of code:
4: call (F,parameter_list )
where the call statement executes the library function F
(provided by the user) with the given parameter list (list of
variables).
7. CLOCK DRIFT MANAGEMENT
We have seen in the previous section that the Network
Code language includes instructions for temporal control,
such as wait(d) and future(d,l). Given the nature of
our time-triggered implementations, the system must mea-
sure time consistently on all processors. So far, we assumed
that, for instance, any two calls of wait(d) (on the same
processor, or on different processors) will wait for the same
duration on a real-time clock. In this section, we investigate
how we provide means to realize this assumption, so all pro-
cessors precisely execute the scheduling table produced by
SynDEx.
In general, if the assumption is false, then the generated
code will be a priori incorrect. In the context of Figure 4, if
the clock of P2 is faster than that of P1, then it will poten-
tially trigger the execution of G (at local date 4) while the
data needed to compute its activation test (LP) is still un-
available on P2 (i.e., not received and therefore potentially
invalid). Similar problems occur if the speeds of the local
clocks can change over time.
The assumptions made in Section 5 require us to address
the classical problem of clock synchronization in distributed
systems. A number of different approaches exist for this
problem [25, 5, 16, 25, 26, 30]. Our novel contribution to
this area is that we fit the clock drift management around
the application demands. Previous work treated clock drift
as a separate problem, different from the application. We
ity and closeness to the scheduling tables of SynDEx, in
the sense that execution decisions are done at the operation
start date, not before. Better algorithms can be built which
anticipate these decisions, minimizing at the same time the
number of jumps. However, this requires a form of lifetime
analysis which would needlessly complicate this paper.
integrate the clock drift management directly into the ap-
plication and the communication schedule.
Our approach has two main elements: (1) Take advantage
of our specific scheduling approach to integrate a clock drift
model directly in our scheduling and synthesis flow and (2)
if necessary, use an additional standalone clock synchroniza-
tion algorithm on top of the integrated drift model.
During the synthesis, we have full control over the com-
putations and communications in the system. Thus, our
tool can exert fine control over the communication struc-
ture to attain very low synchronization overhead without
changing the already scheduled computations and communi-
cations generated by SynDEx. Supplementary synchroniza-
tion messages will only be added, if the bus is idle for long
periods of time, to ensure that clock drift remains within the
required bounds. The advantage of our technique is that the
SynDEx scheduling tool can be used as-is, without changes
to the scheduling policies to account for clock drift gradually
along the schedule.
For simplicity, we define our clock drift management tech-
nique for the case where the system has a single bus. To de-
fine it, we assume that the developer knows that the target
hardware satisfies the following properties:
1. The real-time durations of any two wait(d) statements
in the system differ by less than α∗d, where α is a given
constant.
2. The low-level communication hardware (in our imple-
mentation, Ethernet controllers) detects and signals
the end of send and receive operations. These events
serve as clock synchronization points. Moreover, the
end of a receive operation occurs after the end of the
corresponding send, but less than β time units later,
where β is a given constant.
3. The duration of a communication can be precisely com-
puted from the length l of the transmitted data using
a function comm(l).
In addition, we assume that the developer provided a full
SynDEx architecture description, including timing informa-
tion. We shall denote with γ the longest duration of a bus
communication specified in the architecture description.
Recall now that the SynDEx scheduler ensures that all
processors know the expected start date of each communica-
tion. Assume that a communication operation o with start
date so transported data of legth lo. Then, the expected
date of the operation end is eo = so + comm(lo). Then, us-
ing the communication end events to change the local clock
value of all processors to eo will synchronize them with an
accuracy of β. The timers associated with future and wait
statements must be updated with the difference between the
old clock value and eo. If the timers reach 0 through this
operation, the associated jumps are triggered. From now
on, we shall assume that this clock synchronization
mechanism is provided by the Network Code code
generators over the given hardware.
However, by itself, this clock synchronization mechanism
does not ensure the correctness of the implementation be-
cause the clock synchronization is not exact and because
between the synchronization points the clocks may drift fur-
ther apart. Assume, for instance, that β = 0, α = 0, and
that the duration of a send operation is exactly that spec-
ified in the SynDEx schedule. Then, if the receiver has a
wait(d) start
t0
t2wait(d) end
t1
t3
wait(d) end
Real time
flow
operations
P1 P2
operations
Send end
Receive endwait(d) start
6: Clock drift after a bus communication. Here, we have
−α ∗ d ≤ t3 − t2 ≤ β + α ∗ d. No supplementary commu-
nication or piggybacking of time information is needed to
achieve this precision.
Bus
Send(P1,Data)@true
Producer@true
Consumer@true
3
2
1
0
4
5
6
7
P1 P2
7: Scheduling table for a producer-consumer example on
two processors
faster clock than the sender, then the wait statement of the
receiver will trigger the reading of the reception buffer be-
fore the send is complete, potentially resulting in corruption
of data.
Figure 6 shows the model for computing the clock drifts
starting from a clock synchronization point (communication
end). If we bound the bus idle space between two succes-
sive communications, then we obtain a bound on the clock
drift. For instance, if we bound the idle space by γ, the
real-time distance between the timeouts of two wait or fu-
ture statements measured on two different processors and
starting from the same synchronization point is bounded by
−2∗α∗γ ≤ t3− t2 ≤ β+2∗α∗γ (because the d of Figure 6
is bounded by 2 ∗ γ).
When applying this drift model to the schedules gener-
ated by SynDEx and implemented into Network Code, the
first problem that arises is that a processor with a faster
clock can terminate a receive while the send is incomplete
(the case where t3 − t2 is negative). To ensure that this
never happens, we have to reserve more time than actually
needed by the message transmission to compensate for the
differences in clock speed and bus access overheads. Thus,
to schedule a message of type t (with transmission duration
duration(t) = dt), SynDEx will use instead a longer du-
ration d′t to ensure that the reception is never interrupted.
Under the given drift model, the best (smallest) value for
d′t to ensure correctness is dt + ⌈2 ∗ α ∗ γ⌉. A similar prob-
lem occurs with computations, meaning that we need to add
⌈2 ∗ α ∗ γ⌉ to each operation duration. The updated timing
model is the one given as input to SynDEx.
Finally, the output of SynDEx needs to be updated to
ensure that the bus cannot be idle for longer than γ time
10
Producer@true
Send(P1,Sync)@true
Send(P1,Sync)@true
3
2
1
0
4
5
6
7
P1 P2 Bus
Send(P1,Data)@true
Consumer@true
8
9
8: The example of Figure 7 with clock drift management
units. To ensure this, clock synchronization communications
are inserted in bus idle sections, before giving the resulting
scheduling table to our translation algorithms. When no
specific synchronization mechanisms exist on the bus, clock
synchronization messages can be assumed to be communi-
cations of the shortest available type the bus can transmit
(so that the communication takes minimal time on the bus).
The generated code ensures exclusive bus usage by itself,
thus the system is free of collisions and requires no white
space detection mechanism.
We show how our clock drift management technique works
using two examples. The first one is a very simple producer-
consumer example. Its schedule table generated by SynDEx
without the use of clock drift management is given in Fig-
ure 7. The producer on P1 executes the code to generate the
data. Then it communicates the data on the bus, and finally
the consumer in P2 executes and uses the data. To produce
the schedule table using our clock drift management tech-
nique, we assume that ⌈2 ∗ α ∗ γ⌉ = 1, so that the duration
of each computation and communication is increased by 1
in the timing model given to SynDEx. The output of our
technique is given in Figure 8. This schedule table includes
the dataflow operations scheduled by SynDEx, as well as the
two Sync operations added in the long bus idle sections. We
have assumed that the shortest bus communication takes 1
time unit.
The second example, given in Figure 9 shows the sched-
ule table of our initial example (of Section 3), as produced
when using the clock drift management. We assumed that
⌈2∗α∗γ⌉ = 1 and that the Sync operations take the shortest
duration of a communication, as specified by the architec-
ture description of Figure 3 (no increment is needed, because
we are not interested in the data). A single, conditioned
Sync operation is added, represented by the two boxes with
identical label in our graphical representation.
8. CONCLUSION
Model-driven development relies on sound mechanisms
and tools for generating code from high-level specifications.
A well understood approach is to create abstractions for
modelling, architecture, and executing systems and rely on
their semantics when generating code. A number of abstrac-
tions exist and related work defines usually one and then
relies on other work to combine the different models.
In this work, we create a full suite as a framework in which
we combine a modelling abstraction (synchronous data-flow),
an architecture abstraction (execution, communication du-
rations, and clock drift), with an execution abstraction (time-
triggered computation and execution as found in the Net-
work Code formalism). We show how we solved the chal-
lenges occurring when building such a framework and the
accompanying algorithms, and we provided a guiding exam-
ple to illustrate the tool chain.
For the future, we mention here only two extensions of
our implementation flow which seem particularly interest-
ing. The first one is the ability to take as input multi-clock
synchronous specifications, and to output multi-period real-
time implementations. This implies extensions to both the
SynDEx flow, and the SynDEx-Network Code glue.
The second extension line aims at refining the execution
model of the Network Code formalism, to take into account
the costs of control. The current approach, which considers
that all tests, jumps, and timeouts take no time works well
when these costs are negligible with respect to the costs of
communications and dataflow computations. But having a
finer accounting of control costs would allow us to handle
specifications of finer grain. This direction should also in-
clude work on (1) the synthesis of time-triggered code that
minimizes the cost of control, and (2) better clock synchro-
nization mechanisms.
9. ACKNOWLEDGEMENTS
This research was supported in part by NSERC DG 357121-
2008, ORF RE03-045, and ISOP IS09-06-037.
10. REFERENCES
[1] L Almeida, P. Pedreiras, and J.A.G. Fonseca. The
FTT-CAN protocol: Why and how. IEEE Trans. on
Industrial Electronics (TIE), 49(6):1189–1201,
December 2002.
[2] R. Alur and G. Weiss. Regular specifications of
resource requirements for embedded control software.
In Proceedings RTAS’08, Washington, DC, USA, 2008.
[3] M. Anand. Conditional models for compositional
design of real-time embedded systems. Ph.D. thesis,
University of Pennsylvania, Philadelphia, PA, USA,
May 2008.
[4] M. Anand, S. Fischmeister, Y. Hur, J. Kim, and
I. Lee. Generating reliable code from hybrid systems
models. IEEE Transactions on Computers, 2010. To
appear.
[5] E. Armengaud and A. Steininger. Remote
measurement of local oscillator drifts in FlexRay
networks. In Proceedings DATE’09, pages 1082–1087,
2009.
[6] S. L. Campbell, J.-P. Chancelier, and R. Nikoukhah.
Modeling and Simulation in Scilab/Scicos with
ScicosLab 4.4 (second edition). Springer, 2010.
[7] G. Carvajal and S. Fischmeister. A TDMA Ethernet
switch for dynamic real-time communication. In
Proceedings FCCM’10, Charlotte, United States, May
2010.
[8] P. Caspi, A. Curic, A. Maignan, C. Sofronis,
S. Tripakis, and P. Niebert. From Simulink to
SCADE/Lustre to TTA: a layered approach for
distributed embedded applications. In Proceedings
LCTES’03, San Diego, California, USA, 2003.
[9] S. Chakraborty, L.T.X. Phan, and P.S. Thiagarajan.
Event count automata: A state-based model for
20
@(LP=false
∧FS=false) Send(P2,ID)
@(FS=false
∧ LP=true)
M
@(FS=false)
F3@(LP=false)
P1 P2
0
P3
Computation and communication resource
T
im
e
fl
o
w
FS IN@true
LP IN@true
F1@(LP=false)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
21
22
Send(P1,V)
@(LP=false)
Bus
Send(P1,LP)@true
Send(P1,FS)@true
G@LP=true
N
@(FS=true) Send(P1,ID)
F2@(LP=false)
(FS=true)
Sync@
(FS=true)
Sync@
9: Schedule table for our example, including clock drift management
stream processing systems. In Proceedings RTSS’05,
Washington, DC, USA, 2005.
[10] A. Easwaran, M. Anand, and I. Lee. Compositional
analysis framework using EDP resource models. In
Proceedings RTSS’07, Washington, DC, USA, 2007.
[11] P. Eles, K. Kuchcinski, Z. Peng, A. Doboli, and
P. Pop. Scheduling of conditional process graphs for
the synthesis of embedded systems. In Proceedings
DATE’98, Paris, France, 1998.
[12] Ethernet Powerlink Standadisation Group (EPSG).
Ethernet Powerlink V2.0 – Communication Profile
Specification, 2003.
[13] S. Fischmeister and I. Lee. A verifiable language for
programming real time communication schedules.
IEEE Transactions on Computers, 56(11):1505–1519,
2007.
[14] S. Fischmeister, R. Trausmuth, and I. Lee. Hardware
acceleration for conditional state-based
communication scheduling on real-time Ethernet.
IEEE Transactions on Industrial Informatics, 5, 2009.
[15] FlexRay Consortium. FlexRay Communications
System – Protocol Specification, June 2004. Version
2.0.
[16] M. Fugger, E. Armengaud, and A. Steininger. Safely
stimulating the clock synchronization algorithm in
time-triggered system: A combined formal and
experimental approach. IEEE Transactions on
Industrial Informatics, 5(2):132–146, 2009.
[17] T. Grandpierre, C. Lavarenne, and Y. Sorel.
Optimized rapid prototyping for real-time embedded
heterogeneous multiprocessors. In Proceedings
CODES’99, Rome, Italy, May 1999.
[18] T. Grandpierre and Y. Sorel. From algorithm and
architecture specification to automatic generation of
distributed real-time executives: A seamless flow of
graphs transformations. In Proceedings
MEMOCODE’03, Mont Saint-Michel, France, June
2003.
[19] P. Le Guernic, J.-P. Talpin, and J.-C. Le Lann.
Polychrony for system design. Journal for Circuits,
Systems and Computers, 12:261–304, 2002.
[20] N. Halbwachs, P. Caspi, P. Raymond, and D. Pilaud.
The synchronous dataflow programming language
Lustre. Proceedings of the IEEE, 79(9):1305–1320,
September 1991.
[21] P. Pedreiras, P. Gai, L. Almeida, and G.C. Buttazzo.
FTT-Ethernet: a flexible real-time communication
protocol that supports dynamic QoS management on
Ethernet-based systems. IEEE Transactions on
Industrial Informatics, 1(3):162–172, August 2005.
[22] L. T. X. Phan, S. Chakraborty, and P. S. Thiagarajan.
A multi-mode real-time calculus. In Proceedings
RTSS’08, Barcelona, Spain, 2008.
[23] D. Potop-Butucaru, R. Simone, Y. Sorel, and
J. Talpin. Clock-driven distributed real-time
implementation of endochronous synchronous
programs. In Proceedings EMSOFT’09, Grenoble,
France, 2009.
[24] Robert Bosch GmbH. CAN Specification, Version 2,
September 1991.
[25] D. Salyers, A. Striegel, and C. Poellabauer. A light
weight method for maintaining clock synchronization
for networked systems. In Proceedings ICCCN’08,
pages 522–526, 2008.
[26] K. Sun, P. Ning, and C. Wang. Secure and resilient
clock synchronization in wireless sensor networks.
IEEE Journal on Selected Areas in Communications,
24(2):395–408, 2006.
[27] VARAN–versatile automation random access network.
www.varan-bus.net. Visited Mar. 2009.
[28] G. Weiss, S. Fischmeister, M. Anand, and R. Alur.
Specification and analysis of network resource
requirements of control systems. In Proceedings
HSCC’09, San Fransisco, United States, April 2009.
[29] Dong Wu, B. M. Al-Hashimi, and P. Eles. Scheduling
and mapping of conditional task graphs for the
synthesis of low power embedded systems. In
Proceedings DATE’03, Munich, Germany, 2003.
[30] M. Zhang, J. Shi S. Shen, and T. Zhang. Simple clock
synchronization for distributed real-time systems. In
Proceedings ICIT’08, pages 1–5, 2008.
