Verifying parallel dataflow transformations with model checking and its application to FPGAs by Stewart, R. (Robert) et al.
Journal of Systems Architecture 101 (2019) 101657 
Contents lists available at ScienceDirect 
Journal of Systems Architecture 
journal homepage: www.elsevier.com/locate/sysarc 
Verifying parallel dataflow transformations with model checking and its 
application to FPGAs 
Robert Stewart a , ∗ , Bernard Berthomieu d , Paulo Garcia c , Idris Ibrahim a , Greg Michaelson a , 
Andrew Wallace b 
a Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh, UK 
b Engineering and Physical Sciences, Heriot-Watt University, Edinburgh, UK 
c Faculty of Engineering and Design, Carleton University, Ottawa, Canada 
d LAAS-CNRS, Université de Toulouse, Toulouse, France 
a r t i c l e i n f o 
Keywords: 
Dataflow 
FPGAs 
Model checking 
Program transformation 
Parallelism 
a b s t r a c t 
Dataflow languages are widely used for programming real-time embedded systems. They offer high level abstrac- 
tion above hardware, and are amenable to program analysis and optimisation. This paper addresses the challenge 
of verifying parallel program transformations in the context of dynamic dataflow models, where the scheduling 
behaviour and the amount of data each actor computes may depend on values only known at runtime. 
We present a Linear Temporal Logic (LTL) model checking approach to verify a dataflow program transfor- 
mation, using three LTL properties to identify cyclostatic actors in dynamic dataflow programs. The workflow 
abstracts dataflow actor code to Fiacre specifications to search for counterexamples of the LTL properties using 
the Tina model checker. We also present a new refactoring tool for the Orcc dataflow programming environment, 
which applies the parallelising transformation to cyclostatic actors. Parallel refactoring using verified transfor- 
mations speedily improves FPGA performance, e.g. 15.4 × speedup with 16 actors. 
1
 
o  
c  
 
g  
w  
p  
r  
a  
a  
p  
b  
d  
a  
d  
m  
h  
p  
f
1
 
c  
t  
t  
t
 
e  
p  
s  
p  
a  
[
 
p  
d  
a  
c  
t  
r
h
R
A
1. Introduction 
In the dataflow model of execution, the firing of actors depends only
n data availability, allowing each actor in a program to execute asyn-
hronously without a global control flow sequentialising their execution.
Dataflow languages are a natural abstraction for FPGAs, since a pro-
ram’s dataflow structure is distributed across the programmable hard-
are fabric. In other words, each actor in a dataflow program is com-
iled to an independent processing hardware block, subject to FPGA
esource availability. Moreover, large kernel variables like intermedi-
te arrays may use on-chip block RAM (BRAM). BRAMs are distributed
cross FPGA fabric, meaning there is no memory contention when multi-
le actors update their kernel variables, e.g. intermediate arrays. That is,
oth computation and memory access is inherently parallel. For FPGAs,
ataflow programming environments represent a high level abstraction
bove Hardware Description Languages (HDLs) i.e Verilog or VHDL. The
ataflow programming model can be exploited for high FPGA perfor-
ance, e.g. [1] , or as an Intermediate Representation in compilers for
igher level FPGA languages e.g. [2] . The dataflow abstraction enables
rogram analysis e.g. optimal static scheduling [3] and program trans-
ormation (this paper). ∗ Corresponding author. 
E-mail address: r.stewart@hw.ac.uk (R. Stewart). 
ttps://doi.org/10.1016/j.sysarc.2019.101657 
eceived 26 February 2019; Received in revised form 13 September 2019; Accepted 
vailable online 23 October 2019 
383-7621/© 2019 The Authors. Published by Elsevier B.V. This is an open access ar.1. Parallel program transformation 
The goal of parallelisation is to speed up performance by executing
ode across multiple processing elements e.g. the cores of a CPU. Parallel
ransformations must preserve a program’s functional semantics, i.e a
ransformed program must have identical output for the same inputs as
he original program. 
Refactoring Software Code . Refactoring tools exploit language prop-
rties to parallelise code. For example, exploiting the referential trans-
arency of pure code and equational laws to rewrite a map , which
equentially applies a function to every element in a collection, as a
arMap to perform them in parallel. Software languages for which par-
llel transformation tools exist include Haskell [4] , Erlang [5] and C ++
6] . 
Refactoring Embedded Systems Code. Static dataflow code is simple to
arallelise because information about scheduling and data rates can be
educed at compile time. However, static dataflow language primitives
re inexpressive, inhibiting the expression of algorithms with dynamic
ontrol flow and dynamic data rates. Introducing dynamic dataflow fea-
ures to a language, e.g. value-dependent scheduling and dynamic data
ates, complicates auto-parallelisation of dataflow code. 5 October 2019 
ticle under the CC BY license. ( http://creativecommons.org/licenses/by/4.0/ ) 
R. Stewart, B. Berthomieu and P. Garcia et al. Journal of Systems Architecture 101 (2019) 101657 
Fig. 1. Verifying and Transforming Cyclo-Static Dataflow. 
1
 
s  
s  
[  
f  
H  
a  
t  
d  
w  
d  
t  
d  
v  
f  
w  
a  
t
 
p  
a  
e  
b  
m  
o  
m  
s  
o  
c  
d
1
 
y  
w  
i  
b  
c  
s  
p  
p  
s  
d  
a
 
 
 
 
 
 
 
 
2
 
m  
l  
n  
o  
l  
t  
t  
fi  
c
 
p  
p  
m  
c  
a  
w  
i
2
 
a  
s  
p  
t  
r
 
 
 
 
 
 
 
 
 
 
 
 .2. Related work 
This work is influenced by the Box Calculus [7] , a program rewrite
ystem for the Hume embedded systems language [8] . The approach
pecifies Hume programs with Lamport’s temporal logic of actions (TLA)
9] and uses deductive verification in TLA to verify Box Calculus trans-
ormations [10] . This is exploited for parallelism by transforming single
ume boxes into multiple boxes using high level patterns like divide-
nd-conquer. The language for implementing Hume boxes (analogous
o actors ) is purely functional and stateless, mapping input patterns
irectly to output expressions and any state is carried via feedback
ires. This contrasts with our work, where we aim to support stateful
ataflow languages like CAL, where actors can have internal variables
hat persist values between firings. This complicates parallelisation,
ue to the possibility of changing the read/write order on mutable
ariables, or introducing race conditions. Moreover, Hume lacks tooling
or programmers to apply program transformations. In contrast, our
ork uses model checking to verify transformations of stateful actors,
nd we have extended the Eclipse based Orcc IDE with a graphical tool
o apply a parallel transformation when model checking enables it. 
The most salient related work in the context of parallelising dataflow
rograms is the StreamIt compiler. That approach is restricted to par-
llelising stateless actors only [11] , whereas our use of model checking
nables the parallelisation of stateful actors too. Related work that com-
ines dataflow models with model checking includes determining mini-
um dataflow buffer sizes [12] , and enabling compile-time scheduling
f multirate static actors [13] . On the hardware side, related work uses
odel checking to verify that two Verilog/VHDL modules satisfy the
ame global requirements [14] and hence enabling one to replace an-
ther with dynamic partial reconfiguration. None of these approaches
onsider the parallelisation of hardware designs or the verification of
ataflow graph transformations. 
.3. Our approach and contributions 
Approach . Our approach ( Fig. 1 ) extends dataflow parallelisation be-
ond stateless actors, to actors that are stateful and contain firing rules
ith different data rates, provided they have cyclo-static properties. This
s important because in practise cyclo-static actors very often hold state
etween action firings ( Section 7.2 ). This is achieved by using model
hecking to identify parallelisable multi-rate static (MRDF) and cyclo-
tatic (CSDF) actors that coexist with dynamic (DDF) actors in dataflow
rograms. These dataflow models are described in Section 2.2 . The ap-
roach is general to any dynamic dataflow language, this paper demon-
trates the approach with the CAL dataflow language [15] in the Orcc
evelopment environment [16] . It marks cyclostatic actors as parallelis-
ble, to enable an interactive graphical program transformation tool. 
Contributions This paper makes the following contributions: 
• An abstraction of dataflow actors to Fiacre, a formalised language
for representing behavioural and timing aspects of embedded and
distributed systems ( Section 3 ). • Three Linear Temporal Logic (LTL) properties that model cyclo-
static dataflow properties. Model checking Fiacre abstractions of
actors identifies cyclostatic actors in dynamic dataflow programs
( Section 4 ). 
• A graphical interactive refactoring tool that automates the paral-
lelisation of potentially stateful cyclostatic actors ( Section 5 ). 
• An evaluation showing counterexamples of the cyclostatic LTL
properties, and the efficacy of parallel transformations of a cy-
clostatic actor on an FPGA with two case studies ( Section 6 ). 
. Background 
Task parallel languages provide either a high level programming
odel of spawning tasks and defining synchronisation points, or a lower
evel model with a fixed set of actors, explicit point-to-point FIFO con-
ections and FIFO depths. Software oriented task parallel models are
ften more high level than parallel languages for hardware . Software
anguages often support spawning new tasks/actors at runtime across
hreads on a multicore CPU. This is unsuitable when compiling programs
o programmable hardware ( i.e. application specific circuits), where the
xed task graph is mapped into the hardware with place and route, and
annot be changed without re-synthesis. 
There are numerous programming models to express task parallel
rograms, e.g. fork/join APIs and threads, where data flows are im-
licit between parallel tasks. Dataflow programming models offer a
ore explicit way to express independent parallel tasks, and which tasks
ommunicate. They clearly separate computation and communication,
nd are popular for embedded systems programming, especially FPGAs,
here the entire task graph and communication routing must be fixed
n hardware prior to execution. 
.1. Properties of dataflow languages 
Dataflow programs are directed graphs of connected actors. There
re many dataflow languages for multicore processors and embedded
ystems, each exhibiting a trade-off between expressivity and reasoning
ower. The hardware architecture they are designed to target reflects
heir functionality, specifically the graph structure, scheduling and data
ates of programs expressed with them, shown in Fig. 2 . 
Graph structure. Some languages support dynamic task graphs, i.e.
where the number of actors change during runtime. An example
is Cilk [17] , a C ++ fork/join model that creates parallel threads
with spawn and synchronises their results with sync . This is
similar to OpenMP’s task thread creation and taskwait syn-
chronisation pragmas [18] . With other languages, the number of
tasks (or actors) and the task graph topology is static at compile
time, and does not change at runtime. 
Scheduling. This is the selection of executable code within an ac-
tor. Some dataflow languages support only fixed scheduling poli-
cies e.g. PREESM [19] , whilst other models support code execu-
tion choice driven by data availability and data values using pat-
tern matching e.g. Hume [20] and CAPH [21] . Some languages
R. Stewart, B. Berthomieu and P. Garcia et al. Journal of Systems Architecture 101 (2019) 101657 
Fig. 2. Task Parallel Languages. 
 
 
 
 
 
 
2
 
 
 
 
 
 
 
 
 
 
 
fi  
w
 
d  
[  
d
 
t  
d  
i  
i  
m  
c  
e
2
 
s  
i  
s  
a  
m  
e  
p  
o  
t
 
n  
k  
n  
c  
t  
p  
F  
s  
m  
a
2
 
i  
i  
t  
e  
s  
A
 
t  
v  
e
 
 
a  
t  
r
2
 
F  
i  
b
 
d  
s  
u  
n  
o  
c  
c
D  
a  
n  
𝐶
 
fi  
i
 
semantics? support both, e.g. Cx [22] has non-blocking reads/writes on de-
fault ports (static scheduling) and synchronised ports (dynamic
scheduling), where data availability determines if an executable
rule is enabled. 
Data rates. An actor with a static data rate produces and consumes
the same number of values every time it is executed. An actor
with a dynamic data rate is free to consume/produce data
sequences of arbitrary length with no periodic pattern. 
.2. Dataflow models 
The models considered in our approach are: 
1. Multi-Rate Dataflow (MRDF) [23] , where actors consume and pro-
duce a fixed number of tokens every firing. 
2. Cyclo-Static Dataflow (CSDF) [24] , where actors contain mul-
tiple fireable actions. Actions may have different consump-
tion/production rates, but the sequence of action executions must
be periodic. 
3. Dynamic Dataflow (DDF) [25] , where, when a DDF actor is fired
its scheduler picks an enabled execution rule and executes it. The
enabled status of an execution rule can depend on both the avail-
ability and the values of tokens, and hence the data rate is un-
known at compile time. If there are no fireable rules, execution
is deferred and the actor’s scheduler tries again to find an enabled
execution rule. 
Dataflow actors can have multiple actions but only one action can
re at a time, in contrast to synchronous languages e.g. Quartz [26] ,
here all enabled actions execute in a single firing. 
Some dataflow programming environments only support static
ataflow models, e.g. direct feedthrough function blocks in Simulink
27] . The open source Orcc development environment [16] supports
ynamic dataflow programming with CAL [15] . 
Real world application of dynamic dataflow properties include real-
ime Twitter data processing [28] and MPEG decoding, a widely used
ataflow benchmark with practical relevance. [29] presents an MPEG
mplementation in CAL. The disadvantage of dynamic dataflow is that
t inhibits program analysis, meaning a programmer has to revert to
anual program refactoring to improve performance, and although this
an improve FPGA performance [30] , this approach is cumbersome and
rror-prone. 
.3. Constructing static dataflow graph structures with CAL 
CAL is a dataflow language for programming real-time embedded
ystems, which was developed as part of the Ptolemy II project ( [31] ). It
s a dataflow language for constructing dataflow graphs with (1) a fixed
et of actors and connections , (2) support for dynamic scheduling
nd (3) support for dynamic data rates . These two dynamic properties
ake CAL suitable for implementing relatively complex algorithms onmbedded systems, e.g. algorithms for which control flow and the in-
ut/output data size is determined by runtime values, at the expense
f lost static analysis and program transformation opportunities due to
hese dynamic properties. 
Each CAL actor encapsulates an algorithmic kernel. Actors are con-
ected together with lossless order-preserving FIFOs, through which to-
ens flow. Fig. 3 a shows a volume amplifier example. Actor ports are
amed ❶. The actor may have a store of private variables ❷. Actors
ontain discrete non-interruptible execution rules, called actions. Ac-
ions match on input patterns to consume tokens and output patterns
roduce tokens ( ❸ and ❺). Actions may also update store variables ❹. A
inite State Machine (FSM) determines enabled actions from each FSM
tate ❻, and the initial FSM state which in this example is s0 . When
ultiple actions are fireable from a given state, a priority block can dis-
mbiguate multiple enabled actions ❼. 
.4. Implementing dynamic dataflow languages 
Since action selection for dynamic actor firing is a runtime decision,
mplementations of dynamic dataflow models need a runtime schedul-
ng mechanism to determine which action to execute next. Fig. 4 shows
he architecture components necessary for supporting dynamic dataflow
xecution. Each actor consumes input streams and produces output
treams via ports. Connections (edges) enable data to flow between ports.
n internal store associates values with actor-local variables. 
CAL includes a primitive called guard ( Section 3.1 ), which supports
he firing of an action depending on the values of incoming tokens and of
ariables in the actor’s store. Satisfying both of the following conditions
nables an action for firing: 
1. there is a sufficient number of tokens to match its input patterns,
2. its guard evaluates to true (if a guard exists). 
Fig. 5 shows the scheduling algorithm for action selection. If there
re multiple enabled actions, and an absence of a priority statement
o disambiguate action selection, the runtime scheduler chooses one at
andom. 
.5. Actor data rates 
An FSM transition system drives actor execution, e.g. the FSM in
ig. 3 b for the amplifier actor. Actions have data rates [ C / P ], mean-
ng they consume C tokens and produce P tokens when transitioning
etween FSM states when their actions are fired. 
Consider an actor A consuming tokens from a connection v and pro-
ucing tokens on a connection u . Firing an action in this actor will con-
ume 𝑐 𝑣 
𝐴 
tokens from connection v and produce 𝑝 𝑢 
𝐴 
tokens to connection
 . Successive execution of actor A may select different actions inter-
ally, e.g. 𝑐 𝑣 
𝐴 
(1) = 3 means the 1 st firing of A resulted in a consumption
f 3 tokens whilst 𝑐 𝑣 
𝐴 
(2) = 5 means that on the 2 nd firing 5 tokens were
onsumed. More generally, 𝑐 𝑣 
𝐴 
( 𝑛 ) and 𝑝 𝑢 
𝐴 
( 𝑛 ) are the number of tokens
onsumed and produced on the n th firing of actor A . 
efinition 1. Data rates . The number of tokens produced on edge u
fter n firings for actor A is 𝑃 𝑢 
𝐴 
( 𝑛 ) = 
∑𝑛 
𝑖 =1 𝑝 
𝑢 
𝐴 
( 𝑖, 𝑎 ) where a ∈Actions A . The
umber of tokens consumed from edge v after n firings for actor A is
 
𝑣 
𝐴 
( 𝑛 ) = 
∑𝑛 
𝑖 =1 𝑐 
𝑣 
𝐴 
( 𝑖, 𝑎 ) . 
Consider actor A with actions a 1 and a 2 , both enabled after 5 actor
rings. If 𝑐 𝑢 
𝐴 
(5 , 𝑎 1 ) ≠ 𝑐 𝑢 𝐴 (5 , 𝑎 2 ) , then the actor’s data rate may not be static,
.e. it would be a dynamic actor, and it may not be possible to answer: 
1. can the program execute with bounded buffer size? 
2. will parallelising an actor preserve the actor’s functional
R. Stewart, B. Berthomieu and P. Garcia et al. Journal of Systems Architecture 101 (2019) 101657 
Fig. 3. An Amplifier Dataflow Actor. 
Fig. 4. An Architecture for Dynamic Actors. 
2
 
a  
t  
m
 
 
 
 
 
 
 
 
e  
p
3
 
p  
a  
t
3
 
t  
t  
a  
F  
q  
i  
c  
m  
s
𝜎  
d  
a  
s  
o
 
s  
T
 
p  
g  .6. Program preserving parallelisation 
Parallelism can speed up execution. However, parallelisation (and
ny other program transformation) must preserve a program’s func-
ional semantics. Two properties of dynamic dataflow programs that
ust be preserved by parallel program transformation are: 
1. Functional data rates: An actor may require multiple input to-
kens to compute its function and produce an output. Such an
atomic operation is often implemented as multiple action firings
( Section 7.2 ). Parallel instances of an actor should read and write
enough tokens to preserve the atomic behaviour of the original
sequential actor. 
2. Stores: An actor may have a store of local variables. A program
transformation should preserve the modification sequence of an
actor’s store. 
Section 4 shows that, with model checking, one can parallelise actors
xpressed with a dynamic dataflow language to increase its throughput
erformance, whilst preserving the actor’s functional semantics. . Abstracting actors to models for verification 
Our approach uses model checking to identify cyclostatic actors for
arallelisation. Since CAL is a dataflow programming language and not
 model description language, this section shows how we abstract per-
inent features of actor code into Fiacre. 
.1. Abstract dataflow transition system for CAL 
An abstract transition machine model for describing dataflow ac-
ors [32] is now presented to formalise the actor model. It describes
he data rates and scheduling rules for actors in terms of transitions on
n abstract machine. The firing of actions is determined by an actor’s
SM, a labeled transition system between control states. An actor is a se-
uential process, communicating values by transitioning between states
n the actor’s FSM. Executions of statements in the body of an action
an compute output token values and can update the store. The state
achine receives tokens and reacts to them, possibly entering another
tate, and possibly producing tokens. The dataflow transition: 
⟨𝑎𝑐 𝑡𝑖𝑜𝑛 1 ← 𝑒 ⟩
𝑠 ↦𝑠 ′
⟶ 𝜎′⟨𝑎𝑐𝑡𝑖𝑜𝑛 2 ← 𝑒 ′⟩ (fire)
escribes a transition from FSM control state 𝜎 to 𝜎′ by firing action 1 ,
nd at 𝜎′ the enabled action is action 2 . The store e with the input stream
 is the input state, and the firing of action 1 produces a new store e ′ and
utputs the s ′ stream. In the following example: 
The store e is {accum = 0} before the 1 st firing. The action is
umStream , the input stream s is [x] consumed from input port in1 .
he new store e ′ is {accum = 0+x} after the firing. 
The following (fire-guard) rule supports dynamic dataflow. It adds a
redicate function that guards the firing of the transition. This function
uard E must evaluate to true for the action to fire. The E predicate canFig. 5. Scheduling Algorithm to Execute Dynamic 
Actors. 
R. Stewart, B. Berthomieu and P. Garcia et al. Journal of Systems Architecture 101 (2019) 101657 
u
 
a  
c  
d  
a
 
a  
b  
e
 
g  
n  
t  
v  
i  
t  
t
3
 
 
 
 
 
 
 
 
3
 
c  
t  
Fig. 6. Automota for Fiacre Process P . 
d  
g  
i  
t  
p  
[
3
 
c  
(  
b  
s  
F  
p  
 
f
 
t  
w  
n  
c
 
F  
t  
a  
c
3
 
r
w  
w
1 from [35] . se token values from s and store values from e . 
𝑒 ⊢ 𝐸 𝑔𝑢𝑎𝑟𝑑 𝐸 ⇝ 𝑡𝑟𝑢𝑒 
𝜎⟨𝑎𝑐 𝑡𝑖𝑜𝑛 1 ← 𝑒 ⟩
𝑠 ↦𝑠 ′
←←  ← ← ← ← ← ← ← ←→
𝑔𝑢𝑎𝑟𝑑 𝐸 
𝜎′⟨𝑎𝑐𝑡𝑖𝑜𝑛 2 ← 𝑒 ′⟩
(fire-guard) 
An example is: 
This sumTen action consumes token x , adding it each time to
ccum in the store with 10 successive firings. The guard E predicate
hecks that count is less than 10. Another action (omitted) might pro-
uce the value of accum on the 11 th firing, resetting the count and
ccumulation variables. 
If there are multiple fireable actions from 𝜎′ then we use • to indicate
 runtime scheduler choice to disambiguate two or more actions that
oth can fire from the same control state. Hence when there are multiple
nabled actions at 𝜎′ we write: 
𝑒 ⊢ 𝐸 𝑔𝑢𝑎𝑟𝑑 𝐸 ⇝ 𝑡𝑟𝑢𝑒 
𝜎⟨𝑎𝑐 𝑡𝑖𝑜𝑛 1 ← 𝑒 ⟩
𝑠 ↦𝑠 ′
←←  ← ← ← ← ← ← ← ←→
𝑔𝑢𝑎𝑟𝑑 𝐸 
𝜎′⟨∙← 𝑒 ′⟩
(fire-guard) 
An example of two (fire-guard) actions in an actor is: 
The positivesRepeat action produces twice any positive inte-
er consumed, whilst negativesIdentity simply propagates any
egative or 0 values consumed. Here, • means that after firing either of
hese, the scheduler will not know which action to fire next until the
alue of x becomes known after consuming the token on the next fir-
ng. This becomes problematic for verifying cyclo-static data rates when
wo or more ambiguous actions have different production or consump-
ion rates. 
.2. Actor abstractions 
The CAL primitives we abstract for model checking are: 
1. The production and consumption rates of each action . We
need the model checker to verify if there exists a cyclic data rate
sequence. 
2. Updates to variables in an actor’s store . These variables can
be used to conditionally fire actions, i.e. the (fire-guard) rule. 
3. Statements inside action bodies . Specifically, if/then/else,
variable assignment and loop statements, because they can up-
date variables in an actor’s store. 
4. Actor FSMs . They enforce action scheduling and we need to
know (1) if exactly one periodic cycle of visited states exists, and
(2) the cycle’s data rate, i.e. the sum of data rates for each action
representing a state transition in the cycle. 
.3. Fiacre 
We abstract CAL code to Fiacre [33] to abstract actor to a model for
hecking cyclostatic properties. Fiacre is a formal intermediate model
o represent both the behavioural and timing aspects of embedded andistributed systems for formal verification. It is a target language of pro-
ram modelling tools for use with verification engines. From Fiacre spec-
fications, the Fiacre compiler produces enriched Petri nets, in which
ransitions can test and modify a set of data variables. Therefore Fiacre
rovides a high level, compositional syntax for Time Petri Nets. TINA
34] is a model checking toolset to analyse these nets. 
.3.1. Fiacre language 
The primary Fiacre primitive is a process . Fiacre processes can
ontain deterministic statements including assignment and loops
 Section 3.4 ). As with CAL actors, a Fiacre process describes sequential
ehaviour defined by a set of control states and a set of process tran-
itions. Processes can contain non-deterministic choice, which we map
SM scheduling to in Section 3.5 . CAL actor ports are mapped to Fiacre
rocess ports, and dataflow connections are mapped to Fiacre channels.
The following Product Fiacre example 1 reads pairs of integers
rom port b then sends their product out from port a . 
The evaluation sequence in a Fiacre process is guided by a labelled
ransition system. The Fiacre process P below has four control states
ith labelled transitions between them. Ports denote here pure synchro-
ization labels (without communications). The automaton for this pro-
ess is in Fig. 6 . 
In the abstraction for model checking, actor FSMs map naturally to
iacre process transitions. The transition labels are the actions to execute
o get to new FSM control states. In our approach, action code bodies are
bstracted ( Section 3.4 ) and then inlined into Fiacre transitions between
ontrol states. 
.3.2. Fiacre evaluation rules 
The Fiacre semantics presented here are taken from [36] . Labelled
elations express the semantics of Fiacre statements. 
The evaluation rules that follow have the shape: 
𝑃 1 …𝑃 𝑛 
𝑒 ⊢ 𝐸 ⇝ 𝑣 
hich states that under conditions P 1 to P n , the value of expression E
ith store e is v , where ⇝ is expression evaluation. 
R. Stewart, B. Berthomieu and P. Garcia et al. Journal of Systems Architecture 101 (2019) 101657 
 
r  
s  
w  
u  
l
3
 
t  
s  
w
3
 
T  
s
 
i


 
t
 
o  
c  
C  
t  
t  
l
n) 
3
 
e  
w  
o  
c  
f  
F  
v
 
t

3
 
t  
i  
d  
i  
i
 
s
 
t
 
(  
t  
W  
i  
t  
a  
e  
p  
r  
b  
s  
a
 
a
 
 
 
 
c  
t  
t  
r  
a  
a  The semantics of statements is expressed operationally by a labelled
elation. The big step relation holds triples ( 𝑆 , 𝑒 ) 
𝑙 
⇒ ( 𝑆 ′, 𝑒 ′) in which S is a
tatement, e and e’ are stores, 𝑆 ′ ∈ { done } ∪ { self } ∪ { target 𝑠 |𝑠 ∈ Λ} ,
here Λ is the declared set of states of the process. The ⇒ transition
pdates the store from e to e ′ , moving execution to statement S ′ , and the
abel is either a communication action l or silent action 𝜖. 
.4. Abstracting dataflow actions to small step fiacre evaluation rules 
This section shows the abstraction of the pertinent CAL code primi-
ives given in Section 3.2 , i.e. procedures, data rates, and actor FSMs, to
mall step Fiacre rules. Each abstraction rule from actors to processes is
ritten   = … , which translates CAL to Fiacre evaluation rules. 
.4.1. Action procedures 
Action bodies may include statements that modify the actor’s store.
he translation maps these statements, i.e. for loops and if/then/else
tatements, to deterministic Fiacre process statements. 
Conditional if/then/else CAL statements are translated to the follow-
ng evaluation rules: 
 
𝐢𝐟 𝐸 𝐭𝐡𝐞𝐧 𝑆 1 𝐞𝐥𝐬𝐞 𝑆 2 𝐞𝐧𝐝 
 
= 
𝑒 ⊢ 𝐸 ⇝ 𝑡𝑟𝑢𝑒 ( 𝑆 1 , 𝑒 ) 
𝑙 
⇒ ( 𝑆, 𝑒 ′) 
( 𝐢𝐟 𝐸 𝐭𝐡𝐞𝐧 𝑆 1 𝐞𝐥𝐬𝐞 𝑆 2 𝐞𝐧𝐝 , 𝑒 ) 
𝑙 
⇒ ( 𝑆, 𝑒 ′) 
(if - then - else) 
𝑒 ⊢ 𝐸 ⇝ 𝑓𝑎𝑙𝑠𝑒 ( 𝑆 2 , 𝑒 ) 
𝑙 
⇒ ( 𝑆, 𝑒 ′) 
( 𝐢𝐟 𝐸 𝐭𝐡𝐞𝐧 𝑆 1 𝐞𝐥𝐬𝐞 𝑆 2 𝐞𝐧𝐝 , 𝑒 ) 
𝑙 
⇒ ( 𝑆, 𝑒 ′) 
(if - then - else) 
Conditional while loops are translated as: 
 
𝐰𝐡𝐢𝐥𝐞 𝐸 𝐝𝐨 𝑆 𝐞𝐧𝐝 
 
= 𝑒 ⊢ 𝐸 ⇝ 𝑡𝑟𝑢𝑒 ( 𝑆; 𝐰𝐡𝐢𝐥𝐞 𝐸 𝐝𝐨 𝑆 𝐞𝐧𝐝 , 𝑒 ) 
𝑙 
⇒ ( 𝑆 ′, 𝑒 ′) 
( 𝐰𝐡𝐢𝐥𝐞 𝐸 𝐝𝐨 𝑆 𝐞𝐧𝐝 , 𝑒 ) 
𝑙 
⇒ ( 𝑆 ′, 𝑒 ′) 
(while) 
𝑒 ⊢ 𝐸 ⇝ 𝑓𝑎𝑙𝑠𝑒 
( 𝐰𝐡𝐢𝐥𝐞 𝐸 𝐝𝐨 𝑆 𝐞𝐧𝐝 , 𝑒 ) 
𝜖
⇒ ( done , 𝑒 ) 
(while) 
CAL loops with a fixed iteration count are translated as: 
 
𝐟𝐨𝐫𝐞𝐚𝐜𝐡 𝑣 1 .𝑣 2 𝐝𝐨 𝑆 𝐞𝐧𝐝 
 
= 
( 𝑉 ∶= 𝑣 1 ; 𝑆 …𝑉 ∶= 𝑣 2 ; 𝑆 , 𝑒 ) 
𝑙 
⇒ ( 𝑆 ′, 𝑒 ′) 
( 𝐟𝐨𝐫𝐞𝐚𝐜𝐡 𝑉 𝐝𝐨 𝑆 end , 𝑒 ) 
𝑙 
⇒ ( 𝑆 ′, 𝑒 ′) 
(for each) 
For assignment, Fiacre offers a number of forms but we only need
he simplest here. e [ X ↦u ] is store e extended with the pair ( X, u ). 
 
𝑣 ∶= 𝐸 
 
= 𝑒 ⊢ 𝐸 ⇝ 𝑢 
( 𝑋 ∶= 𝐸, 𝑒 ) 
𝜖
⇒ ( done , 𝑒 [ 𝑋 ↦ 𝑢 ]) 
(assignment) 
CAL action bodies can contain multiple procedures, i.e. a sequence
f assignments, if/then/else blocks and loops. Fiacre statements can be
ompositions of other statements, and this is used to compile multiple
AL statements ( S 1 ; S 2 ), e.g. an if statement following a while loop, to
he following Fiacre (composition) evaluation rules. In the second rule,
he static semantics of Fiacre ensures that at most one of labels l 1 and
 2 is not empty, so l 1 . l 2 resumes to either l 1 or l 2 . 
 
𝑆 1 ; 𝑆 2 
 
= 
( 𝑆 1 , 𝑒 ) 
𝑙 
⇒ ( target 𝑠, 𝑒 ′) 
( 𝑆 1 ; 𝑆 2 , 𝑒 ) 
𝑙 
⇒ ( target 𝑠, 𝑒 ′) 
(compositio
( 𝑆 1 , 𝑒 ) 
𝑙 1 
⇒ ( done , 𝑒 ′) ( 𝑆 2 , 𝑒 ′) 
𝑙 2 
⇒ ( 𝑆 ′, 𝑒 ′′) 
𝑙 1 .𝑙 2 
(composition) ( 𝑆 1 ; 𝑆 2 , 𝑒 ) ⇒ ( 𝑆 ′, 𝑒 ′′) h  .4.2. Actions with guards 
Actions with guards can only fire if the guard predicate function
valuates to true in the current context. Fiacre has an on statement,
hich blocks if its expression evaluates to false that prevents execution
f subsequent statements. The mapping models the actor guard predi-
ate guard E with on , s is the input queue and s ′ is the output queue
or action 1 when executed. The execution of a sequential composition of
iacre statements abstract from action bodies depends on the boolean
alue of the on condition. 
With an environment context of an actor’s store and input tokens s ,
he abstraction of action guards is: 
 𝑔𝑢𝑎𝑟𝑑 𝐸 = 𝑒 ⊢ 𝐸 ⇝ 𝑡𝑟𝑢𝑒 ( 𝑆, 𝑒 ) 𝑙 ⇒ ( 𝑆 ′, 𝑒 ′) 
( 𝐨𝐧 𝐸 𝑆, 𝑒 ) 
𝑙 
⇒ ( 𝑆 ′, 𝑒 ′) 
(on) 
.4.3. Actor communication 
Actors share the results of their computations by explicitly passing
okens to each other via connections between their ports. The parallelis-
ng transformation of actors in Section 5 needs to know its cyclo-static
ata rate, so we capture token communication in the abstraction. That
s, during execution of Fiacre statements we keep a count of the incom-
ng and outgoing tokens into and out of the process. 
The syntax for token communication in Fiacre is ! and ? , corre-
ponding to the (send) and (receive) Fiacre rules respectively: 
𝑒 ⊢ 𝐸 1 ⇝ 𝑣 1 𝑒 ⊢ 𝐸 𝑛 ⇝ 𝑣 𝑛 
( 𝑝 𝜏 ! 𝐸 1 , … , 𝐸 𝑛 , 𝑒 ) 
𝑝 𝜏 𝑣 1 , … 𝑣 𝑛 
⟹ ( done , 𝑒 ) 
(send) 
𝑒 ′ = 𝑒 [ 𝑋 1 ↦ 𝑣 1 , … , 𝑋 𝑛 ↦ 𝑣 𝑛 ] [ 𝑒 ′ ⊢ 𝐸 ⇝ 𝑡𝑟𝑢𝑒 ] 
( 𝑝 𝜏? 𝑋 1 …𝑋 𝑛 [ 𝐰𝐡𝐞𝐫𝐞 𝐸] , 𝑒 ) 
𝑝 𝜏 𝑣 1 , … 𝑣 𝑛 
⟹ ( done , 𝑒 ′) 
(receive) 
Communications are sequences 𝑝 𝜏 𝑣 1 … 𝑣 𝑛 where p 𝜏 is a port iden-
ified by a label 𝜏 and the 𝑣 1 … 𝑣 𝑛 are values. 
Fiacre has a single-communication constraint whereby only one
send) or (receive) statement can occur in a (process-action) execu-
ion. In Fiacre, values are consumed from ports with ? , e.g. in1?x .
hilst it is possible to consume multiple values from a Fiacre port, e.g.
n1?x,y,z , this is constrained by the data type of the port, e.g. where
he bytepair channel is a byte # byte tuple in the earlier ex-
mple in Section 3.3.1 , where exactly two values must be consumed
very time, e.g. with in1?x,y . The same Fiacre constraint applies for
roducing values with ! , e.g. out1!x,y . The single communication
ule is enforced by Fiacre’s type checker to ensure that only one la-
el is ever carried by a ⇒ transition update. In CAL, values are con-
umed/produced from ports by pattern matching, e.g. in1:[x,y,z]
nd out1:[3,x+2,0] . 
The communication semantics for dynamic dataflow differ from Fi-
cre’s single-communication semantics in two ways: 
1. An action can have multiple communication events, because an
action can have both an input pattern and an output pattern and
these patterns can match on multiple input/output ports. 
2. Actions within the same actor can pattern match on the same
ports but consume/produce a different number of tokens, e.g. : 
Therefore to support dynamic dataflow with Fiacre’s single-
ommunication constraint, we give the Fiacre process ports a channel
ype. For each token in an action’s input pattern, Fiacre’s front func-
ion on the queue reads from this internal channel, the value is then
emoved with dequeue and the counter monitoring how many tokens
re consumed is incremented. Likewise for an action’s output patterns,
 Fiacre queue receives all output tokens, and the counter monitoring
ow many tokens are produced is incremented by the size of that queue.
R. Stewart, B. Berthomieu and P. Garcia et al. Journal of Systems Architecture 101 (2019) 101657 
Fig. 7. Abstracting Actor Scheduling. 
 
k  
p  
e
3
 
t  
m  
F
 
t
 
t  
c
 
e  
t
 
(  
b  
t  
u  
c  
s  
s
 
t  
3
 
F  
t
 
c  
e  
c  
g  
w
A  
r  
m  
p  
s
 
t  
w  
s  
u  
(  
m
 
m  
t

C  
fi


3
 
P  
p  
C
 
o  
i
𝑠
𝑠Actions that have multiple communication events, i.e. multiple to-
ens in input/output patterns or consuming/producing from/to multiple
orts, are separated into multiple intermediate single-communication
vents when abstracted to Fiacre. 
.4.4. Actor schedules to process selections 
The selection of actions in dataflow actors is determined by the ac-
or’s FSM between control states. Given an actor’s FSM that specifies
ultiple next states from the current state 𝜎, the abstraction maps to
iacre’s select statement as: 
Where [] separates the non-deterministic choices. The following ac-
or FSM example has one control state s0 : 
These FSMs are abstracted to Fiacre selection statements. Consider
he following contrived example for actn1 and actn2 , where both
onsume a token but produce no tokens: 
Because the actions actn1 and actn2 have one communication
vent, and a guard (abstracted to an on statement after consuming x ),
he FSM will be abstracted to the following Fiacre selection statement: 
The abstraction of actor FSMs uses three Fiacre rules (from)(to) and
select) , shown in Fig. 7 . This requires FSM transitions to be grouped
y the control states, for all control states Σ in the FSM, that transi-
ions start from. The (from) rule is used for each source state, the action
sed to transition between states is then abstracted and inlined into each
orresponding transition, and finally the (to) rule is appended to these
tatements to arrive at the destination state, shown with set comprehen-
ion as follows: 
 
𝐹 𝑆𝑀 
 
= {( 𝐟𝐫𝐨𝐦 𝜎 ( 𝐬𝐞𝐥𝐞𝐜𝐭 ( 
 
𝑎𝑐𝑡𝑖𝑜𝑛 
 
; ( 𝐭𝐨 𝜎′)) + 𝐞𝐧𝐝 )) 
| 𝜎, 𝜎′ ∈ Σ
∧ {( 𝑎𝑐 𝑡𝑖𝑜𝑛, 𝜎′) | ( 𝜎, 𝑎𝑐 𝑡𝑖𝑜𝑛, 𝜎′) ∈ 𝐹 𝑆𝑀}} 
This restructures the FSM to group transitions by their source state,
o map actor FSMs to the syntactic structure of Fiacre’s select statement..5. Abstracting action firing to big step fiacre evaluation rules 
The previous section abstracted data rates, procedures and actor
SMs to small step Fiacre rules. We now require a big step Fiacre rule
o abstract the atomic firing of a dataflow actor’s action. 
The following big-step (process-action) rule abstracts the sequential
omposition of multiple small-step executions. It describes statement
xecutions between (from) and (to) rules, i.e. originating from an initial
ontrol state and arriving at a target state, where → is a process action
etting from control state s to s ′ and 
𝑙 
⇒ is a big-step statement execution
ith the (process-action) rule: 
( 𝐟𝐫𝐨𝐦 𝑠 𝑆, 𝑒 ) 
𝑙 
⇒ ( target 𝑠 ′, 𝑒 ′) 
( 𝑠, 𝑒 ) → ( 𝑠 ′, 𝑒 ′) 
(process-action) 
 transition from from to the target control state in the (process-action)
ule happens in totality or not at all. If a transition includes an on state-
ent then tokens produced/consumed will only be consumed if the on
redicate evaluates to true . This peek semantics is equivalent to CAL’s
emantics for guard . 
This big step rule corresponds to firing a dataflow actor’s action, i.e.
he (fire) or (fire-guarded) dataflow execution rules from Section 3.1 ,
hich capture the actor CAL code we intend to model check. One big
tep (process-action) evaluation corresponds to multiple small step eval-
ations of action selection (select) , production and consumption rates
send) and (receive) , and procedures inside an action body i.e. (assign-
ent), (if-then-else) and (foreach) . 
Consider an action 1 with an action body with a sequence of state-
ents S 1 ; S 2 , and the body of an action 2 with statement S 3 . Abstracting
his (fire-guard) to Fiacre is as follows: 
 
𝜎⟨( 𝑆 1 ; 𝑆 2 ) ← 𝑒 ⟩
𝑠 ↦𝑠 ′
⇒ 𝜎′⟨( 𝑆 3 ) ← 𝑒 2 ⟩
 
(f ire) 
=  𝑆 1  = ( 𝑆 1 , 𝑒 ) 𝑙 1 ⇒ ( 𝑆 2 , 𝑒 1 ) 
 𝑆 2  = ( 𝑆 2 , 𝑒 1 ) 𝑙 2 ⇒ ( 𝑆 3 , 𝑒 2 ) 
( 𝐟𝐫𝐨𝐦 𝜎 ( 𝑆 1 ; 𝑆 2 , 𝑒 ) 
𝑙 
⇒ ( target 𝜎′, 𝑒 2 ) 
( 𝜎, 𝑒 ) → ( 𝜎′, 𝑒 2 ) 
(process - action) 
onsider the same action, but this time with a guard predicating its
ring. Abstracting this (fire-guard) to Fiacre is as follows: 
 
𝑒 ⊢ 𝐸 𝑔𝑢𝑎𝑟𝑑 ( 𝑠, 𝐸 ) ⇝ 𝑡𝑟𝑢𝑒 
𝜎⟨( 𝑆 1 ; 𝑆 2 ) ← 𝑒 ⟩
[ 
⇒ 𝑔𝑢𝑎𝑟𝑑 ( 𝑠, 𝐸 )] 𝑠 ↦ 𝑠 ′𝜎′⟨( 𝑆 3 ) ← 𝑒 2 ⟩
	 
= (f ir e - guar d) 
 𝑆 1  = ( 𝑆 1 , 𝑒 ) 𝑙 1 ⇒ ( 𝑆 2 , 𝑒 1 ) 
 𝑆 2  = ( 𝑆 2 , 𝑒 1 ) 𝑙 2 ⇒ ( 𝑆 3 , 𝑒 2 ) 
( 𝐟𝐫𝐨𝐦 𝜎 (( 𝐨𝐧 𝐸 𝑆 1 ); 𝑆 2 , 𝑒 ) 
𝑙 
⇒ ( target 𝜎′, 𝑒 2 ) 
( 𝜎, 𝑒 ) → ( 𝜎′, 𝑒 2 ) 
(process - action) 
.6. Tracking data rates 
We introduce a P_static boolean and two numbers P and
_prev that count the number of produced tokens in the current and
revious cycle respectively, to check for cyclostatic data rates. Variables
_static , C and C_prev are injected to track consumption rates. 
To demonstrate this with an example, the CAL actor implementation
n the left of Fig. 8 is a predicate filter that outputs only positive integer
nput tokens and discards negative integers. The actor scheduler is: 
 0 ⟨𝑝𝑜𝑠 ← {} ⟩ [ 𝑥 ] ↦[ 𝑥 ] ←←  ← ← ← ← ← ← ← ← ← ← ← ← ←→
𝑥> =0 
𝑠 0 ⟨∙ ← {} ⟩
 0 ⟨𝑛𝑒𝑔 ← {} ⟩ [ 𝑥 ] ↦[ 𝑥 ] ←←  ← ← ← ← ← ← ← ← ← ← ← ← ←→
𝑥> =0 
𝑠 0 ⟨∙ ← {} ⟩
R. Stewart, B. Berthomieu and P. Garcia et al. Journal of Systems Architecture 101 (2019) 101657 
Fig. 8. Translating a Predicate Filter to Fiacre. 
 
p  
C  
t  
d
 
a  
s  
r  
c
4
 
t  
p  
c  
w  
a
4
 
c  
o  
h  
t  
i  
a
 
 
 
 
 
4
 
c  
d
 
 
 
 
i  
pThe Fiacre translation is on the right of Fig. 8 . In the first com-
lete cycle of a cyclo-static actor, i.e. ( s 0, e ) → ( s 0, e ), P_static and
_static will be false since this cycle computes the actor’s produc-
ion rate. In all subsequent cycles, these are true if the actor has a fixed
ata rate. 
To determine data rates for cyclostatic actors, the abstraction to Fi-
cre intercepts all actor transitions to the user defined initial state, e.g.
 0, introducing an additional Fiacre state sInit that compares previously
ecorded data rates for 𝑠 0 → . → 𝑠 0 . The sInit → s 0 Fiacre transition also
onsumes tokens from the FIFO for this transition sequence. 
. Cyclo-Static actors as LTL formula 
Once the mapping abstracts Fiacre process descriptions from actors,
he Fiacre compiler translates them to enriched Petri nets ( [37] ) for the
urpose of LTL verification by TINA to search for counterexamples of
yclostatic properties. LTL (Linear Temporal Logic) is a temporal logic in
hich a formula can encode time (logical) and future state observations
long runs, where each state in time has a single successor. 
.1. LTL Model checking approach 
The LTL formula we use has logical operators negation ( not ),
onjunction ( and ), disjunction ( or ) and implication ( ⇒), and temporal
perators eventually ( ◊), always ( □) and until ( until ). In addition, it
as atomic properties asserting that a particular process instance in
he Fiacre description is in some particular Fiacre state and that one of
ts variables has a particular value. In the following LTL formulas thse
tomic properties are written State(s) and Value(x = v) , respectively. Our LTL model checking approach is: 
1. Translate the actor to a Fiacre process ( Section 3 ). 
2. Abstract an actor’s initial FSM state and store variable values into
three LTL formulas ( Sections 4.2.1, 4.2.2 and 4.2.3 ). 
3. Model check the Fiacre process for counterexamples of the LTL
properties that model cyclostatic actors. 
• If a counterexample cannot be found, the actor is classified as
a multirate static or cyclostatic actor and can be parallelised
using the algorithm in Section 5.2 . 
• Otherwise the actor is classified as dynamic and is not suitable
for parallelisation. 
.2. The cyclo-static assertions 
Cyclo-static actors always have an infinitely repeating periodic cy-
le of FSM transitions between control states. We capture cyclostatic
ataflow semantics in the following LTL properties: 
1. The variables in an actor’s store are bound to their initial value
after each periodic cycle. Property cyclic_store , Section 4.2.1 . 
2. There exists a periodic cycle between FSM control states. Property
periodic_sequence , Section 4.2.2 . 
3. The data rates of an actor are static. Property static_rate ,
Section 4.2.3 . 
The model checker searches for violations of these LTL properties to
dentify dynamic actors. We now give more detail of these three LTL
roperties. 
R. Stewart, B. Berthomieu and P. Garcia et al. Journal of Systems Architecture 101 (2019) 101657 
4
 
a  
e  
i  
i  
f  
F  
t
4
 
a  
u  
a  
i  
i
w  
f
 
o  
s  
P  
s  
o  
s  
t  
fi  
c  
v
4
 
t
 
r  
t  
p  
s
 
b  
s  
v  
e  
b
4
 
w  
Fig. 9. Orcc IR for Dataflow Transformation (from [38] with permission). 
c  
p
 
c
5
 
s  
I  
t  
r
5
 
v  
l  
t  
h  
h  .2.1. Periodic process configuration 
This LTL property is for checking that all values in an actor’s store
re reset on completion of the periodic cycle. The LTL formula is gen-
rated from actor code, and ensures that all store variables are at their
nitial value when the initial FSM state is returned to. In the follow-
ng example the actor’s store is { 𝑥 = 0} , which can be extracted directly
rom the programmer’s code e.g. int x: = 0; , and the actor’s initial
SM state is s0 , which the programmer specifies in the FSM declara-
ion ( Section 2.3 ): 
.2.2. Periodic state transition sequence 
The LTL property to verify the existence of a periodic cycle in an
ctor’s state transition combines observations of control states with the
ntil temporal property. Consider an actor with states s 0, s 1, s 2 and s 3,
nd a periodic cycle s 0 → s 1 → s 3 → s 2 → s 0. The property for prov-
ng that this is the sole possible sequence, and that it infinitely occurs
s: 
here, for any two Fiacre states i and j , To(i,j) stands for the
ormula: 
State(i) and (State(i) until State(j)) 
At some position along a run, To(si,sj) holds if the Fiacre state
f the actor at that position is si , the actor is in Fiacre state sj at
ome latter position in the run, and it remains in state si until then.
roperty periodic_sequence asserts that, in each run, the initial
tate of the actor is s0 and that at each position in the run exactly
ne of the To(si,sj) propositions in the disjunction holds. The ab-
ence of counterexamples proves an infinitely periodic FSM cycle for
his actor. In practice, extracting this LTL formula for automated veri-
cation will require a CAL language extension for expressing explicit
yclic sequences that the programmer intends the model checker to
erify. 
.2.3. Cyclo-Static periodic data rate 
Finally, we check for the same static data rate for every loop through
he verified periodic cycle with: 
The LTL static_production_rate property reads: at each state along each
un, either P_static is true or it eventually becomes continuously
rue . A static consumption rate is checked with the static_consumption_rate
roperty, which tests for the same temporal observations for cyclic con-
umption rates. 
Whilst a simple mechanism, the P_static and C_static
ooleans enable the model checker to search for a single action firing
equence counterexample whereby the P and C data rate counters de-
iate from a fixed value after an initial periodic cycle. Should a counter
xample not exist, i.e. an actor is cyclo static, these two values are used
y the parallelisation algorithm in Section 5.2 . 
.3. Model checking an actor 
Processes cannot be model checked against the LTL rules above
ithout communicating with another process that feeds data to it andonsumes its results. Therefore the following testbench is composed in
arallel with the actor being model checked: 
Their parallel composition is expressed in a top level Fiacre
omponent: 
. Dataflow transformations 
Our dataflow transformation system supports the parallelisation of
tatic and cyclostatic actors. It is implemented using the Java based Orcc
ntermediate Representation (IR) API, shown in Fig. 9 , by instantiating
he Actor and Connection classes to implement the parallelisation algo-
ithm in Section 5.2 . 
.1. Programming environment 
We have extended the open source Orcc dataflow programming en-
ironment ( [16] ) with a refactoring tool, shown in Fig. 10 , for paral-
elising static and cyclo-static actors which introduces fork and join ac-
ors to scatter/gather data. The user chooses the parallelism degree, i.e.
ow many actors to take the place of the previous single actor. Orcc
as multiple backends for different target architectures. We have added
R. Stewart, B. Berthomieu and P. Garcia et al. Journal of Systems Architecture 101 (2019) 101657 
Fig. 10. Interactive Dataflow Transformation. 
t  
a  
i  
t
5
 
t
 
 
 
 
 
 
 
 
t  
F  
c  
c  
p  
a  
s  
a  
b  
t  
s  
s
6
 
t  
s  
a  
a  
t  
T
6
 
c  
i  
v
𝑠
𝑠
T  
t  
d  
i  
h
 
c  
t  
l  
m  
S
 
6  
l  
e
T  
s
i
d  
t  
d
6
 
m  
n
6
 
n  
d  
t  
T  
s
𝑠
𝑠
𝑠
𝑠wo additional backends to support the tool: 1) CAL, to generate new
ctor code for the fork and join actors and 2) Fiacre, for model check-
ng actors. Fig. 11 shows the graphical consequence of applying the
ransformations. 
.2. Parallelisation algorithm 
For an actor A consuming from edge u and producing to edge v , the
ransformation algorithm is: 
1. Extract using model checking the consumption and production
data rates 𝐶 𝑢 
𝐴 
( 𝑛 ) and 𝑃 𝑣 
𝐴 
( 𝑛 ) for a complete cyclostatic cycle, where
n is the number of firings in a periodic cycle. This is done by
parsing the P and C values from the output from the Tina model
checker. 
2. Create N parallel instances A 1 to A N with the Orcc IR API, where
N is the user-selected parallelism factor ( Fig. 10 ). 
3. Create a fork actor with Orcc IR that distributes data from edge
u across 𝐴 1 , ., 𝐴 𝑁 in 𝐶 
𝑢 
𝐴 
( 𝑛 ) chunks. 
4. Create a join actor with Orcc IR to consume 𝑃 𝑣 
𝐴 
( 𝑛 ) chunks from
actors A 1 to A N . 
5. Output the joined chunks as a sequential stream to edge v . 
After this transformation the graph remains partially sequential due
o the linear stream of data that the fork actor consumes then distributes.
ig. 12 shows the sequential nature of stream propagation as it broad-
asts tokens to each parallel instance in sequence on an FPGA. In this
ase each parallel actor is receiving streams of 10 elements, 1 token
er cycle. Here, the high out1_SEND signal instructs the first parallel
ctor that the data signal is valid for the first 10 cycles and the low
ignals out2_SEND, out3_SEND and out4_SEND instructs the other three
ctors to not read the data signal during these cycles. Stream gathering
y the join actor is sequential also. This data propagation latency limits
he speedup since some actors may be idle waiting for their next input
tream, so there is a balance between the task size, potential parallel
peedups and the latency overheads of parallelism. 
. Evaluation 
This section evaluates the model checking based classification of ac-
or models using three actors that implement (1) sparse matrix compres-
ion, (2) matrix row sorting and (3) dynamic time warp. One of these
ctors is identified as not being cyclostatic, the other two are, and hence
re parallelisable. They are parallelised using the interactive transforma-
ion tool, replacing these sequential actors with up to 16 parallel actors.
he speedup results are given in Section 6.3 . 
.1. Counterexample 
An actor can implement a sparse matrix compression algorithm in-
rementally as a matrix streams through an actor. It consumes the matrix
n row major order, and outputs a (row,column,value) tuple for non-zero
alues. The transition rules of the actor are: 
 0 ⟨𝑐𝑜𝑚𝑝𝑟𝑒𝑠𝑠 ← { 𝑤, ℎ } ⟩ ⟨𝑣 ⟩↦⟨𝑣,𝑤,ℎ ⟩←←  ← ← ← ← ← ← ← ← ← ← ← ← ← ← ← ← ← ← ← ← ←→
𝑣 !=0 
𝑠 0 ⟨∙← { 𝑤 ′, ℎ ′} ⟩
 0 ⟨𝑖𝑔𝑛𝑜𝑟𝑒 ← { 𝑤, ℎ } ⟩ ⟨𝑣 ⟩↦⟨⟩←←  ← ← ← ← ← ← ← ← ← ← ← →
𝑣 ==0 
𝑠 0 ⟨∙← { 𝑤 ′, ℎ ′} ⟩he actor has one control state s 0 and two actions compress and ignore
hat both transition from s 0 back to s 0. Each matrix value in the stream
isambiguates action scheduling: compress fires on non-zero values, and
gnore fires on zero values. The compress action increments width and
eight counters, as w ′ and h ′ in the store after the transition. 
The actor is 25 lines of CAL code. The Fiacre description is 79 lines of
ode. The TINA model checker generates a synchronised Büchi automa-
on and a transition system from the LTL formula in Section 4 , then
ooks for paths through the transition system for counterexamples. The
odel checker finds a counterexample for the static_data_rate property.
tate 60 in the Büchi automaton is: 
The model checker finds a transitional path
0 →76 →88 →97 →109 →115 →127 →47 →59 →76. There is a
oop from state 76 to itself. In this loop is state 97, which in the counter
xample is: 
herefore the model checker proves static_production_rate property false,
ince P_static is false at state 60 serves as the antecedent of the ⇒
mplication, and the consequent: 
oes not hold, because state 97 is always returned to. This is due to
he nature of sparse arrays, number of non-zero values in the matrix
etermines the data rates. 
.2. Parallelising verified cyclo-static actors 
Two benchmarks are now used to show the performance improve-
ents of parallelisation of verified cyclostatic actors with the mecha-
ised algorithm from Section 5.2 . 
.2.1. Matrix row sorting 
The matrix row sorting actor sorts every row in a matrix in ascending
umerical order using bubble sort. The pre-sorted nature of the input
etermines the clock cycle latency to perform the sorting. There is also
he latency of consuming and producing rows, one cycle per element.
he actor’s FSM is in Fig. 13 . With an initial store { 𝑟𝑜𝑤 = [] , 𝑖 = 0} , the
cheduling is: 
 0 ⟨𝑟𝑒𝑐𝑒𝑖𝑣𝑒 ← { 𝑒𝑙𝑒𝑚𝑠, 𝑖 } ⟩ ⟨𝑎 ⟩↦⟨⟩←←  ← ← ← ← ← ← ← ← ← ← ← →
𝑖<𝑤𝑖𝑑𝑡ℎ 
𝑠 0 ⟨∙← { 𝑟𝑜𝑤 + { 𝑎 } , 𝑖 + 1} ⟩
 0 ⟨𝑠𝑜𝑟𝑡 ← { 𝑒𝑙𝑒𝑚𝑠, 𝑖 } ⟩ ⟨⟩↦⟨⟩←←  ← ← ← ← ← ← ← ← ← →
𝑖 == 𝑤𝑖𝑑𝑡ℎ 
𝑠 1 ⟨∙← {{ 𝑟𝑜𝑤 ′} , 0} ⟩
 1 ⟨𝑜𝑢𝑡𝑝𝑢𝑡 ← {( 𝑒 ∶ 𝑒𝑙𝑒𝑚𝑠 ′) , 𝑖 } ⟩ ⟨⟩↦⟨𝑒 ⟩←←  ← ← ← ← ← ← ← ← ← ← ← →
𝑖<𝑤𝑖𝑑𝑡ℎ 
𝑠 1 ⟨∙← {{ 𝑒𝑙𝑒𝑚𝑠 ′} , 𝑖 + 1} ⟩
 1 ⟨𝑟𝑒𝑠𝑒𝑡 ← {{} , 𝑖 } ⟩ ⟨⟩↦⟨⟩←←  ← ← ← ← ← ← ← ← ← →
𝑖 == 𝑤𝑖𝑑𝑡ℎ 
𝑠 0 ⟨∙← {{} , 0} ⟩
R. Stewart, B. Berthomieu and P. Garcia et al. Journal of Systems Architecture 101 (2019) 101657 
Fig. 11. Parallelising an Actor with the Dataflow Transformation Tool. 
Fig. 12. Scattering the Data Stream from the fork 
Actor. 
Fig. 13. FSM of the Matrix Row Sort Actor. 
 
p  
o  
a  
p  
a
6
 
i  
s  
p  
m  
o  
i  
s
𝑠
𝑠
𝑠
Fig. 14. FSM of the Dynamic Time Warp Actor. 
𝑠
𝑠
 
L  
a
r  
s  
e
6
 
c  
F  
i  
w  
s  The model checker is unable to find counterexamples of the LTL
roperties. For a matrix of width 100, the sorting actor A has a peri-
dic cycle of 202 firings: 100 × receive , 100 × output , 1 × sort
nd 1 ×reset . The fork actor scatters 𝐶 𝑣 
𝐴 
(202) = 100 tokens to each
arallel actor and the join actor gathers 𝑃 𝑢 
𝐴 
(202) = 100 tokens from each
ctor. 
.2.2. Dynamic time warp 
Dynamic time warping (DTW) is an algorithm to measure similar-
ty of two temporal sequences. Applications of DTW include automated
peech recognition. The actor takes two sequences S and T , then out-
uts the optimal match between them. Consuming each sequence ele-
ent costs one cycle, followed by multiple clock cycles to compute the
ptimal match, then one cycle to output that match. The actor’s FSM
s in Fig. 14 . With an initial store { 𝑠 = [] , 𝑡 = [] , 𝑑𝑤𝑡 = [][] , 𝑖 = 0} and a
equence length n , the scheduling is: 
 0 ⟨𝑟𝑒𝑐𝑒𝑖𝑣𝑒𝑆 ← { 𝑠, 𝑡, 𝑑𝑤𝑡, 𝑖 } ⟩ ⟨𝑎 ⟩↦⟨⟩←←  ← ← ← ← ← ← ← ← ← ← ← →
𝑖<𝑤𝑖𝑑𝑡ℎ 
𝑠 0 ⟨∙← { 𝑠 + { 𝑎 } , 𝑡, 𝑑𝑤𝑡, 𝑖 + 1} ⟩
 0 ⟨𝑟𝑒𝑠𝑒𝑡 ← { 𝑠, 𝑡, 𝑑𝑤𝑡, 𝑖 } ⟩ ⟨⟩↦⟨⟩←←  ← ← ← ← ← ← ← ← ← →
𝑖 == 𝑤𝑖𝑑𝑡ℎ 
𝑠 1 ⟨∙← { 𝑠, 𝑡, 𝑑𝑤𝑡, 0} ⟩
 1 ⟨𝑟𝑒𝑐𝑒𝑖𝑣𝑒𝑇 ← { 𝑠, 𝑡, 𝑑𝑤𝑡, 𝑖 } ⟩ ⟨𝑎 ⟩↦⟨⟩←←  ← ← ← ← ← ← ← ← ← ← ← →
𝑖<𝑤𝑖𝑑𝑡ℎ 
𝑠 1 ⟨∙← { 𝑠, 𝑡 + { 𝑎 } , 𝑑𝑤𝑡, 𝑖 + 1} ⟩ 1 ⟨𝑑𝑤𝑡𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒 ← { 𝑠, 𝑡, 𝑑𝑤𝑡, 𝑖 } ⟩ ⟨⟩↦⟨⟩←←  ← ← ← ← ← ← ← ← ← →
𝑖 == 𝑤𝑖𝑑𝑡ℎ 
𝑠 2 ⟨∙← { 𝑠, 𝑡, 𝑑𝑤𝑡, 0} ⟩
 2 ⟨𝑑𝑤𝑡𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒 ← { 𝑠, 𝑡, 𝑑𝑤𝑡, 𝑖 } ⟩ ⟨⟩↦⟨𝑑𝑤𝑡 [ 𝑛 ][ 𝑛 ] ⟩←←  ← ← ← ← ← ← ← ← ← ← ← ← ← ← ← ← ← ← ← ← ← ← ← ← ← → 𝑠 0 ⟨∙ ← {[] , [] , [][] , 0} ⟩
Again, the model checker is unable to find counterexamples of the
TL properties. For input sequence lengths of 40, the DTW actor A has
 periodic cycle of 83 firings: 40 × receiveS , 40 ×receiveT , 1 ×
eset , 1 ×dwtDistance and 1 × outputMatch . The fork actor
catters 𝐶 𝑣 
𝐴 
(83) = 80 tokens to each parallel actor and the join actor gath-
rs 𝑃 𝑢 
𝐴 
(83) = 1 token from each actor. 
.3. Performance results 
For the two benchmarks, the Orcc compiler generates an FPGA cir-
uit description using the Xronos Verilog backend ( [39] ). The results in
ig. 15 show the parallel speedup using 4, 8, 12 and 16 actors. Speedup
s T 1 / T n , where T 1 is the clock cycles with one actor and T n is the cycles
ith n actors. The dashed line shows ideal linear speedup. The Verilog
imulation testbench randomises the input values for 128 input sets for
R. Stewart, B. Berthomieu and P. Garcia et al. Journal of Systems Architecture 101 (2019) 101657 
Fig. 15. Speedups after Parallel Actor Transformation. 
b  
p
 
T  
o  
s  
t  
T  
o  
t  
j  
n  
i
 
s  
c
 
l  
t  
a  
i  
t
 
t
7
 
s  
o  
t  
l  
o  
g
 
a  
u  
a  
T  
w  
i  
g
7
 
F  
Table 1 
Actor properties. 
Property Occurrence 
Guard on input token value 15% of actions 
Guard on store value 56% of actions 
Shared input ports 49% of input ports 
Shared output ports 41% of output ports 
Actors with a store 78% of actors 
Stores modified by multiple actions 62% of store variables 
c  
a  
t
 
l  
c  
d  
l  
a
7
 
s  
H  
l  
b  
a  
a  
l
 
i  
t  
d  
c
 
i  
o  
A  
1  
p  
t  
c  
2 https://github.com/orcc/orc-apps . oth benchmarks, i.e. when using 16 actors each actor processes 8 in-
uts. 
Fig. 15 a shows matrix row sorting speedups for different row lengths.
he number of elements in a matrix row, and the pre-sorted arrangement
f its values, determines the clock cycle latency of sorting the elements
ince randomly ordered inputs require repeated sorting. The sequen-
ial algorithm for a row length of 1000 requires 1,546,477 clock cycles.
here is a 5.6 × speedup for 12 and 16 actors when processing rows
f lengths between 10 and 200. From a speedup of 4.3 with 12 actors
here is a drop in performance to a 3.3 speedup with 16 actors to sort
ust 10 elements. At such small workloads, most of the 16 actors will
ot be firing and instead the data propagation latency overheads result
n degraded performance. 
Fig. 15 b shows dynamic time warping speedups for a length of 40 for
equences S and T . The sequential algorithm requires 1,332,979 clock
ycles. Speedup with 16 actors is almost linear (15.4). 
Speedups of 5.6 × and 15.4 × with our approach take significantly
ess time to achieve compared to using direct Verilog or VHDL optimisa-
ions. This is because our tooling automates fork/join data management
nd verifies the optimisation can be safely applied with model check-
ng, which if done at the Verilog/VHDL level would require extensive
estbench simulation. 
The CAL and Fiacre implementations, generated hardware descrip-
ions and raw results are in an Open Access dataset [40] . 
. Discussion 
The previous section demonstrated how parallelising actors can elicit
peedups. However this is not always the case, e.g. when an actor is not
n the critical path of a program’s execution [41] . Worse still, rather
han increasing performance it can have a countereffect of generating
arger programs, which in turn can reduce achievable clock frequencies
r exceed hardware resource constraints. Tooling can help direct a pro-
rammer to bottlenecks, which parallelism may alleviate. 
In Section 7.1 we present a tool we have developed for Orcc that
ddresses this issue. The motivation is to present hardware costs to the
ser, e.g. identifying the actor with the longest datapath, which in turn
ffects the overall achievable clock speed for the entire dataflow graph.
hen in Section 7.2 we quantify the CAL language features used in real
orld code. The use of some CAL language features potentially results
n dynamic dataflow behaviours. The frequent occurrence of these lan-
uage constructs in each actor justifies the need for model checking. 
.1. Hardware profiling 
The Xronos FPGA backend for Orcc uses Xilinx’s open source Open-
orge [42] compiler to generate hardware cost reports for each actor atompile time. These are: 1) the number of BRAM blocks, and 2) the dat-
path depth for each action in an actor. The datapath depth determines
he minimum latency (microseconds) between each clock cycle. 
We have added to the Orcc programming environment a tool that
ifts these hardware costs into the visual dataflow editor ( Fig. 16 ), to
ommunicate hardware bottlenecks to the user. Green actors have small
atapath depths, relative to the orange critical actor(s) which have the
ongest datapath. BRAM counts for implementing actor internal vari-
bles are also shown to the programmer. 
.2. Real world dynamic dataflow programs 
A static analysis approach with small actors may be able to identify
tatic, cyclostatic and dynamic actors, e.g. an actor with just one action.
owever scaling static analysis to identify models of computation of
arger actors, e.g. multiple guarded actions and store variables updated
y multiple actions, would be very challenging and the authors are not
ware of a dataflow framework that can do this automatically. That is,
pplying static analysis on actors implemented in a dynamic dataflow
anguage to classify their dataflow model of computation. 
We assess dataflow language features used in a real world code repos-
tory 2 to evaluate the size of CAL actors in practise. It includes 823 ac-
ors in 555 dataflow graphs written by 23 developers, across multiple
omains including cryptography, image processing and network proto-
ols. 
Table 1 shows the frequency of CAL language features in this repos-
tory that can impact on the complexity of classifying an actor’s model
f computation, e.g. whether it has cyclostatic or dynamic data rates.
n external control signal from an input port is in a guard predicate for
5% of actions, and internal control signals in the store appear in a guard
redicate for 56% of actions, i.e. whose private variable values may de-
ermine runtime scheduling choices. Most actors are stateful, with 78%
ontaining at least one variable in its store. Multiple actions consume
R. Stewart, B. Berthomieu and P. Garcia et al. Journal of Systems Architecture 101 (2019) 101657 
Fig. 16. Visualising FPGA Resource Costs. 
t  
s  
v
 
i  
o  
p
 
C  
m  
t  
p  
b  
w  
C  
F  
t  
t  
i  
m  
r  
b
8
 
s  
n  
c  
s  
c  
t  
p  
a  
s  
p  
i
 
p  
p  
w  
a  
a
D
 
i  
t
A
 
s  
d  
p  
M  
I  
a  
r
R
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
[  he same port for 49% of all input ports. Multiple actions write to the
ame port for 41% of all output ports. Multiple actions modify store
ariables for 62% of all store variables. 
Static analysis would not be feasible at these scales of code complex-
ty. Machine checked verification would be the only feasible option to
vercome this, e.g. with the model checking approach that this paper
resents. 
Automating the verification and parallelisation approach for the full
AL language will require some additional engineering: (1) the imple-
entation of a parser for the TINA model checking output, (2) an ex-
ension of the verification approach to support multiple input or output
orts, and (3) a CAL language extension for explicit cyclic sequences to
e expressed ( Section 4.2.2 ). Supporting multiple actor ports (task 2)
ill likely be the most trivial, because there is a direct mapping from
AL ports to Fiacre ports ( Section 3.3.1 ). Extending the syntax of CAL
SMs to support a programmer’s claim of a deterministic cyclic sequence
hrough states (task 3), for model checking to verify, will require an ex-
ension to Orcc’s frontend i.e. changes to the Xtext grammar, and adding
t to Orcc IR by extending the Orcc EMF (Eclipse Modelling Framework)
odel. Processing the output of the TINA model checker (task 1) will a
equire a parser, or a machine readable output format added to TINA’s
ackend. 
. Conclusion 
As the performance of embedded architectures increases, program
izes that they can support also grow, as does the complexity and dy-
amic nature of algorithms in programs. As embedded accelerators be-
ome more widely adopted, the importance of effective code optimi-
ations grows. Previous work ( [43] ) explores the NP-hard problem of
yclic scheduling of multiple actors mapped to parallel processing archi-
ectures. In contrast, this paper presents a program transformation ap-
roach that parallelises a single, potentially stateful, actor into multiple
ctors to exploit parallel architectures. The approach separates cyclo-
tatic from dynamic actors using LTL model checking. The verification
hase takes seconds to run. Th graphical parallel refactoring tool speed-
ly increases FPGA performance, e.g. a 15.4 × speedup with 16 actors. 
This paper is a step towards auto-parallelisation of dynamic dataflow
rograms. Verifying other parallelising transformations, e.g. task and
ipelined parallelism, and verifying transformation of dynamic actors,
ould extend our approach. The broader aim of this work is to integrateutomated formal verification more widely into optimising compilers
nd parallel runtime systems for embedded systems. 
eclaration of Competing Interest 
The authors declare that they have no known competing financial
nterests or personal relationships that could have appeared to influence
he work reported in this paper. 
cknowledgements 
We acknowledge the support of the Engineering and Physical Re-
earch Council grant references EP/K009931/1 (Programmable embed-
ed platforms for remote and compute intensive image processing ap-
lications), EP/N014758/1 (The Integration and Interaction of Multiple
athematical Reasoning Processes) and EP/N028201/1 (Border Patrol:
mproving Smart Device Security through Type-Aware Systems Design),
nd the Scottish Funding Council for a SICSA Postdoctoral and Early Ca-
eer Researcher Exchanges grant. 
eferences 
[1] D. Bhowmik , P. Garcia , A.M. Wallace , R. Stewart , G. Michaelson , Power efficient
dataflow design for a heterogeneous smart camera architecture, in: 2017 Confer-
ence on Design and Architectures for Signal and Image Processing, DASIP 2017,
September 27–29, 2017, IEEE, Dresden, Germany, 2017, pp. 1–6 . 
[2] R. Stewart , K. Duncan , G. Michaelson , P. Garcia , D. Bhowmik , A. Wallace , RIPL: A
Parallel Image processing language for FPGAs, TRETS 11 (1) (2018) 7:1–7:24 . 
[3] E.A. Lee , D.G. Messerschmitt , Static scheduling of synchronous data flow programs
for digital signal processing, IEEE Trans. Comput. 36 (1) (1987) 24–35 . 
[4] C. Brown , H. Loidl , K. Hammond , ParaForming: forming parallel haskell programs
using novel refactoring techniques, in: TFP 2011, May 16–18, 2011, Revised Selected
Papers, Springer, Madrid, Spain, 2011, pp. 82–97 . 
[5] C. Brown , M. Danelutto , K. Hammond , P. Kilpatrick , A. Elliott , Cost-Directed refac-
toring for parallel erlang programs, Int. J. Parallel Program. 42 (4) (2014) 564–582 .
[6] C. Brown , K. Hammond , M. Danelutto , P. Kilpatrick , H. Schöner , T. Breddin , Para-
phrasing: generating parallel programs using refactoring, in: FMCO 2011, October
3–5, 2011, Revised Selected Papers, Springer, Turin, Italy, 2011, pp. 237–256 . 
[7] G. Grov , G. Michaelson , Hume box calculus: robust system development through
software transformation, Higher-Order Symbol. Comput. 23 (2) (2010) 191–226 . 
[8] K. Hammond , G. Michaelson , The design of Hume: a high-level language for the
real-time embedded systems domain, in: International Seminar on Domain-Specific
Program Generation, Dagstuhl Castle, Germany, March 23–28, 2003, Springer, 2003,
pp. 127–142 . 
[9] L. Lamport , The temporal logic of actions, ACM Trans. Programm. Lang.Syst. 16 (3)
(1994) 872–923 . 
10] G. Grov , Reasoning About Correctness Properties of a Coordination Programming
Language, Heriot-Watt University, Edinburgh, UK, 2009 Ph.D. thesis . 
R. Stewart, B. Berthomieu and P. Garcia et al. Journal of Systems Architecture 101 (2019) 101657 
[  
 
[  
 
[  
 
[  
[  
 
[  
 
[  
 
[  
[  
 
[  
 
[  
 
[
[  
[  
[  
[  
 
[
[  
 
[  
 
[  
[  
 
[  
[  
 
[  
 
[  
 
[  
 
[  
[  
 
 
 
[  
 
[  
 
[  
 
 
[  
 
[  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 11] W. Thies , M. Karczmarek , S.P. Amarasinghe , StreamIt: a language for streaming ap-
plications, in: Proceedings of the 11th International Conference on Compiler Con-
struction, Springer-Verlag, London, UK, UK, 2002, pp. 179–196 . 
12] M. Geilen , T. Basten , S. Stuijk , Minimising buffer requirements of synchronous
dataflow graphs with model checking, in: DAC 2005, San Diego, CA, USA, June
13–17, 2005, ACM, 2005, pp. 819–824 . 
13] X. Zhu , R. Yan , Y. Gu , J. Zhang , W. Zhang , G. Zhang , Static optimal scheduling for
synchronous data flow graphs with model checking, in: FM 2015, Oslo, Norway,
June 24–26, 2015, Springer, 2015, pp. 551–569 . 
14] I. Grobelna , Model checking of reconfigurable FPGA modules specified by petri nets,
J. Syst. Archit. - Embed. Syst.Des. 89 (2018) 1–9 . 
15] J. Eker , J.W. Janneck , CAL Language Report Specification of the CAL Actor Lan-
guage, Technical Report, EECS Department, University of California, Berkeley, 2003 .
16] H. Yviquel , A. Lorence , K. Jerbi , G. Cocherel , A. Sanchez , M. Raulet , Orcc: multime-
dia development made easy, in: ACM Multimedia Conference, October 21–25, 2013,
ACM, Barcelona, Spain, 2013, pp. 863–866 . 
17] R.D. Blumofe , C.F. Joerg , B.C. Kuszmaul , C.E. Leiserson , K.H. Randall , Y. Zhou , Cilk:
an efficient multithreaded runtime system, J. Parallel Distrib. Comput. 37 (1) (1996)
55–69 . 
18] L. Dagum , R. Menon , Openmp: an industry standard API for shared-memory pro-
gramming, IEEE Comput. Sci. Eng. 5 (1) (1998) 46–55 . 
19] M. Pelcat , K. Desnos , J. Heulot , C. Guy , J.-F. Nezan , S. Aridhi , Preesm: A
dataflow-based rapid prototyping framework for simplifying multicore DSP pro-
gramming, in: EDERC 2014 6th European Embedded Design in, 2014, pp. 36–40 . 
20] K. Hammond , G. Michaelson , Hume: a domain-specific language for real-time em-
bedded systems, in: GPCE 2003, Erfurt, Germany, September 22–25, 2003, Proceed-
ings, in: Lecture Notes in Computer Science, Springer, 2003, pp. 37–56 . 
21] J. Sérot , F. Berry , S. Ahmed , CAPH: a language for implementing stream-processing
applications on FPGAs, in: Embedded Systems Design with FPGAs, Springer New
York, 2013, pp. 201–224 . 
22] Synflow, The Cx programming language, 2015, ( https://synflow.gitlab.io ). 
23] R. Lauwereins , M. Engels , M. Adé, J.A. Peperstraete , Grape-II: a system-Level proto-
typing environment for DSP applications, IEEE Comput. 28 (2) (1995) 35–43 . 
24] G. Bilsen , M. Engels , R. Lauwereins , J.A. Peperstraete , Cycle-static dataflow, IEEE
Trans. Signal Process. 44 (2) (1996) 397–408 . 
25] E.A. Lee , T.M. Parks , Dataflow process networks, Proc. IEEE 83 (5) (1995) 773–801 .
26] K. Schneider , J. Brandt , Quartz: a synchronous language for model-based design
of reactive embedded systems, in: S. Ha, J. Teich (Eds.), Handbook of Hard-
ware/Software Codesign, Springer Netherlands, Dordrecht, 2017, pp. 29–58 . 
27] MATLAB, Simulink, 2018, ( https://uk.mathworks.com/products/simulink.html ). 
28] D.G. Murray , F. McSherry , R. Isaacs , M. Isard , P. Barham , M. Abadi , Naiad: a timely
dataflow system, in: SOSP 2013, Farmington, PA, USA, November 3–6, 2013, ACM,
2013, pp. 439–455 . 
29] J.W. Janneck , I.D. Miller , D.B. Parlour , G. Roquier , M. Wipliez , M. Raulet , Synthe-
sizing hardware from dataflow programs - an MPEG-4 simple profile decoder case
study, Signal Process. Syst. 63 (2) (2011) 241–249 . 
30] R. Stewart , D. Bhowmik , A.M. Wallace , G. Michaelson , Profile guided dataflow trans-
formation for FPGAs and CPUs, Signal Process. Syst. 87 (1) (2017) 3–20 . 
31] J. Eker , J.W. Janneck , E.A. Lee , J. Liu , X. Liu , J. Ludvig , S. Neuendorffer , S.R. Sachs ,
Y. Xiong , Taming heterogeneity - the ptolemy approach, Proc. IEEE 91 (1) (2003)
127–144 . 
32] J.W. Janneck , Actors and their composition, Formal Aspect. Comput. 15 (4) (2003)
349–369 . 
33] B. Berthomieu , J.-P. Bodeveix , P. Farail , M. Filali , H. Garavel , P. Gaufillet , F. Lang ,
F. Vernadat , Fiacre: an intermediate language for model verification in the topcased
environment, ERTS 2008, Toulouse, France, 2008 . 
34] B. Berthomieu , P. Ribet , F. Vernadat , The tool TINA - Construction of abstract state
spaces for petri nets and time petri nets, Int. J. Prod. Res. 42 (14) (2004) 2741–2756 .
35] B. Berthomieu, S. dal Zilio, F. Vernadat, A FIACRE V3.0 Primer,
Technical Report, LAAS-CNRS Université de Toulouse, France, 2012.
http://projects.laas.fr/fiacre/doc/primer.pdf 
36] B. Berthomieu , J.-P. Bodeveix , M. Filali , H. Garaval , F. Lang , D.L. Botlan , F. Verna-
dat , S. dal Zilio , The Syntax and Semantics of Fiacre, Technical Report, LAAS-CNRS
Université de Toulouse, France, 2012 . 
37] T. Murata , Petri nets: properties, analysis and applications., Proc. IEEE 77 (4) (1989)
541–580 . 
38] E. Bezati , S.C. Brunet , M. Mattavelli , J.W. Janneck , High-level synthesis of dynamic
dataflow programs on heterogeneous MPSoC platforms, in: International Confer-
ence on Embedded Computer Systems: Architectures, Modeling and Simulation,
SAMOS 2016, July 17–21, IEEE, Agios Konstantinos, Samos Island, Greece, 2016,
pp. 227–234 . 
39] E. Bezati , High-Level Synthesis of Dataflow Programs for Heterogeneous Platforms:
Design Flow Tools and Design Space Exploration, School of Engineering, Ecole Poly-
technique Fédérale de Lausanne, Switzerland, 2015 Ph.D. thesis . 
40] R. Stewart, Open Access dataset for ”Verifying Parallel Dataflow Trans-
formations with Model Checking and its Application to FPGAs ”, 2019,
https://doi.org/10.17861/85ff96b4-2c6b-4f58-8322-74f0ab45f684 
41] S.C. Brunet , M. Mattavelli , J.W. Janneck , Buffer optimization based on critical path
analysis of a dataflow program design, in: 2013 IEEE International Symposium on
Circuits and Systems (ISCAS2013), Beijing, China, May 19–23, 2013, IEEE, 2013,
pp. 1384–1387 . 
42] E. Bezati , H. Yviquel , M. Raulet , M. Mattavelli , A unified hardware/software Co-syn-
thesis solution for signal processing systems, in: DASIP 2011, Tampere, Finland,
November 2–4, 2011, IEEE, Tampere, Finland, 2011, pp. 186–191 . 
43] C. Hanen , A. Munier , A study of the cyclic scheduling problem on parallel processors,
Discrete Applied Mathematics 57 (2–3) (1995) 167–192 . Dr Robert Stewart is an Assistant Professor at Heriot-Watt
University. His interests are at the interface between program-
ming languages and computer architectures. They span paral-
lel programming functional languages for multicore HPC and
embeded architectures, dataflow models for embedded sys-
tems and hardware verification. 
Dr Bernard Berthomieu has interests in semantics and im-
plementation of concurrent programming languages. Bernard
is the developer of the programming language LCS, based on
a higher order variant of Robin Milner’s Calculus of Commu-
nicating Systems embedded into Standard ML. Bernard is also
interested in the analysis techniques for Petri Nets, Time Petri
Nets, and related formalisms for concurrent systems. Bernard
is the developer of the TINA model checking toolbox. 
Dr Paulo Garcia is an Assistant Professor of Systems and Com-
puter Engineering at Carleton University. Paulo received his
BSc, MSc and PhD degrees from the University of Minho, in
2008, 2011 and 2015. His research interests include languages
for hardware-software co-design, computer architectures, sys-
tem software and hybrid CPU-FPGA systems design. 
Dr Idris Skloul Ibrahim is an Assistant Professor at Heriot-
Watt University. His interests are ad hoc network protocols,
computer systems, computer algebra on Cloud infrastructures,
mobile applications and parallel dataflow transformations. 
Professor Greg Michaelson is a Professor of Computer Sci-
ence at Heriot-Watt University. He has BSc Computer Science
from the University of Essex, MSc Computational Science from
the University of St Andrews and a PhD from Heriot-Watt Uni-
versity. His expertise is in the design, analysis and implemen-
tation of programming languages, in particular functional lan-
guages for multi-processor platforms. He is a Fellow of the
British Computer Society. 
Professor Andrew Wallace is a Professor of Signal and Im-
age Processing at Heriot-Watt University. Andrew received his
BSc and PhD degrees from the University of Edinburgh in 1972
and 1975, with particular interests in LiDAR signal processing,
video analytics and parallel, many core architectures. He has
published extensively, receiving a number of best paper and
other awards. He has secured funding from EPSRC, the EU
and other industrial and government sponsors. He is a char-
tered engineer and a Fellow of the Institute of Engineering
Technology. 
