Multi-threaded code generation from Signal program to OpenMP by Hu, Kai et al.
Open Archive TOULOUSE Archive Ouverte (OATAO) 
OATAO is an open access repository that collects the work of Toulouse researchers and
makes it freely available over the web where possible. 
This  is  an author-deposited version published in  :  http://oatao.univ-toulouse.fr/
Eprints ID : 12684
To link to  this  article :  DOI  :10.1007/s11704-013-3906-4
URL : http://dx.doi.org/10.1007/s11704-013-3906-4
To cite this version : Hu, Kai and Zhang, Teng and Yang, Zhibin 
Multi-threaded code generation from Signal program to OpenMP. 
(2013) Frontiers of Computer Science in China, vol. 7 (n° 5). pp. 617-
626. ISSN 1673-7350
Any correspondance concerning this service should be sent to the repository
administrator: staff-oatao@listes-diff.inp-toulouse.fr
DOi 10.1007/sl1704-013-3906-4 
Multi-threaded code generation from Signal 
program to OpenMP 
Kai HU (01, Teng ZHANG2, Zhibin YANG2,3 
1 State Key Laboratory of Software Development Environment, Beihang University, Beijing 100191, China 
2 School of Computer Science and Engineering, Beihang University, Beijing 100191, China 
3 !RIT-CNRS, Université de Toulouse, Toulouse 31062, France 
1 Introduction 
Abstract The use of multi-core processors will become a 
trend in safety c1itical systems. For safe execution of multi-
threaded code, automatic code generation from formal spec- 
ification is a desirable method. Signal, a synchronous lan­
guage dedicated for the functional description of safety crit­
ical systems, provides soundness semantics for determinis­
tic concurrency. Although sequential code generation of Sig­
nal has been implemented in Polychrony compiler, deter­
ministic multi-threaded code generation strategy is still far 
from mature. Moreover, existing code generation methods 
use certain multi-thread library, which limits the cross plat­
form executions. OpenMP is an application program inter­
face (API) standard for parallel programming, supported by 
several mainstream compilers from different platforms. This 
paper presents a methodology translating Signal program to 
OpenMP-based multi-threaded C code. First, the intermedi­
ate representation of the core syntax of Signal using syn­
chronous guarded actions is defined. Then, according to the 
compositional semantics of Signal equations, the Signal pro­
gram is synthesized to dependency graph (DG). After par­
allel tasks are extracted from dependency graph, the Signal 
pro gram can be finally translated into OpenMP-based C code 
which can be executed on multiple platforms. 
Keywords multi-thread, synchronous language, Signal, 
code generation, OpenMP 
E-mail: hukai@buaa.edu.cn 
Multi-core processors have been widely used in high­
performance computing and universal computing. With the 
increase of functional and non-functional demands, multi­
core architecture will become indispensable in safety critical 
systems such as avionics, aerospace and automobile control. 
Multi-threaded software is necessary to make full use of 
computing resources of multi-core processors. At present, 
two types of strategies have been developed to aid program­
ming multi-threaded software. One is application pro gram in­
terfaces (APls) and libra1ies provided by Unix-like OS [l] 
and Windows [2]; the other includes several parallel program­
ming technologies such as MPI [3] for the multi-processor 
distributed system, and OpenMP [4] and Intel TBB [5] for 
the shared memory architecture which provides mechanism 
to describe high level parallel algorithm. 
However, these two strategies fail to satisfy the strict quan­
titative indicators of functional and non-functional properties 
demanded in safety critical systems. If the executions of the 
embedded software are non-deterministic, they may cause 
undesirable consequences such as delay of reactions and race 
conditions. ln addition, parallel programming is error-prone 
because programmers have to specify the synchronization 
and resources sharing among threads. This will bring the 
multi-threaded coding for safety-critical applications a high­
risk programming activity [6]. 
To solve this problem, using model-based development 
and automatic code generation technology based on formal 
methods has become a trend in academics and industries. 
One of the available formal methods used in safety critical 
systems is the synchronous language (7, 8], built on a math­
ematical model combining synchronous hypothesis and de­
terministic concurrency. In synchronous hypothesis, time is 
abstracted as partial discrete logical time series and actions 
executed by the system are abstracted as discrete steps of 
computing. The input, computation and the output take no 
time at each instant (the unit of discrete logical time). Due 
to the abstract time model, the inherent functional proper­
ties are preserved, which makes synchronous languages suit­
able for the functional design of systems. The mainstream 
synchronous languages include Esterel [9], Lustre [10], and 
Signal [11] among which Signal is a multi-clocked language 
that no global clock is pre-defined and every signal has its 
own clock. Compared to the mono-clocked synchronous lan­
guages, multi-clocked model is more sui table for the descrip­
tion of distributed systems and multi-core systems. 
Endochrony [12] and weak endochrony [13] properties 
have been proposed to generate deterministic code from Sig­
nal. In the endochronous Signal program, the clock of each 
signal can be computed from a "root clock". The Polychrony 
compiler [14, 15] not only supports the sequential code gen­
eration, but also provides the fonction of multi-threaded code 
generation from the endochronous program, based on the 
clustering method. According to the data dependency rela­
tions, the Signal program is divided into tasks which will be 
"forked" as threads at the runtime. These threads will com­
municate with each other by the "wait-notify" system call 
while in each thread the sequential code will be executed. 
However, there are still some implicit concurrencies in the 
program which may not be discovered and the use of "wait­
notify" takes time in the synchronization among threads. 
Weak endochrony property, as its name implies, is less 
strict than endochrony. If the relation among signais meets 
the full-diamond condition (13], it is possible to generate de­
terministic multi-threaded code. [16] proposes a methodol­
ogy checking weak endochrony property based on bounded 
model-checking. Since the model-checking method is ex­
pensive for the code generation, the paper proposes another 
method based on the isochrony [17] property which is suit­
able for the compositional design. This method, however, 
cannot fully caver all the weakly endochronous programs. 
(18] proposes a methodology generating deterministic multi­
threaded code from weakly endochronous program based on 
synchronous flow dependence graphs [ 19]. Every statement 
in the program corresponds to a thread and the threads syn-
chronize with each other based on "wait-notify" system call. 
On the basis of the atom theory proposed in [13,20] presents a 
general method to check weak endochrony on multi-clocked 
synchronous programs. The corresponding strategy of multi­
threaded code generation is given in [21]. However, some re­
strictions must be met. For instance, data types of the pro­
gram interface should be finite and delay equations should 
be replaced by the clock relation equations. Therefore, some 
weakly endochronous programs may be rejected. 
Another strain of methodology is proposed in [22]. It 
translates synchronous guarded actions, an intermediate rep­
resentation for mono-clocked synchronous languages, into 
OpenMP-based multi-threaded program. The synchronous 
guarded actions are first translated into dependency graph 
(DG) and then, from the DG, tasks are divided for the parallel 
execution. Finally, OpenMP-based C code can be generated 
according to the task partition. Since OpenMP has been im­
plemented by several compilers from different OS, the gener­
ated code can be executed on multiple platforms. 
As an API standard for parallel programming, OpenMP 
provides abundant mechanism for the description of high 
level parallel algorithms. The newest version of OpenMP 
supports fine-grained scheduling and task balancing which 
can increase the performance of the program. However, few 
studies of multi-threaded code generation for Signal have 
chosen OpenMP as the target language. Drawing on the idea 
presented in [22], this paper introduces a methodology dis­
covering the implicit parallelism from the Signal program and 
translating the endochronous Signal program to OpenMP­
based C code. However, some vital changes are made to fit 
the characteristics of Signal. Firstly, while [22] directly trans­
lates the program written by synchronous guarded actions, 
p1imitive constructs in Signal are needed to be first translated 
into representation of synchronous guarded actions. With re­
gard to this, some new f eatures are added to synchronous 
guarded actions. For instance, in order to represent implicit 
clock relations defined in each primitive construct, Boolean 
variables representing the clocks are introduced. Moreover, 
in [22], action dependency grpah (ADG) is a bipartite graph 
in which variables and guarded actions are vertices. This pa­
per proposes a DAG (directed acyclic graph)-like form of DG 
in which nodes are guarded actions and edges represent the 
dependency relations between nodes. 
The paper is structured as follows. An informa! introduc­
tion to the Signal is provided in Section 2. In Section 3, 
the paper proposes an intermediate representation of the core 
syntax of Signal using synchronous guarded actions. Based 
on the compositional semantics and data dependency rela-
tions, the formal definition of DG is given. In Section 4, 
methods of finding the implicit parallelism of the Signal pro­
gram and the task partition from DG are proposed. Finally, in 
Section 5, the translation from partitioned tasks to OpenMP­
based C code is defined and an example is analyzed to vali­
date the methodology proposed in this paper. 
2 Introduction to Signal 
2.1 Syntax and conesponding semantics 
Apart from primitive constructs listed in Table 1, Signal 
also provides other extended constructs such as the dock 
operator "/\" and memory operator "cell". Moreover, nested 
process, module and other mechanisms are defined in Signal 
to specify the large system with components at various rates. 
The details of the syntax can be refened in [23]. 
Relations among signals and their docks are defined as 
equations in Signal. The basic unit of a Signal program, 
called process, consists of a set of equations. Two basic oper­
ators, respectively called synchronous composition and local 
de.finition, are applied to the process. The syntax and cone-
As mentioned in Section 1, time is abstracted and the behav- sponding semantics are shown in Table 2. 
iors of the system are divided into a discrete series of instants. Table 2 Piimitive operations on the process 
At each instant, the input, computing and output are executed 
instantaneously and simultaneously. The unbounded series of 
typed values are called signais. Signals in the program can be 
present or absent at each instant and the dock of a signal is 
defined as the series of subscripts at which instant the signal 
is present. 
In Signal, primitive constructs (core syntax) are provided 
to express the relations between signals, defined in Table 1. 
Note that the dock of signal sis denoted as '"'s". 
Operators in Signal not only depict the data dependency 
relations but also imply dock relations among signais. Ac­
cording to the dock relations, operators can be divided into 
two types, mono-docked operators and multi-docked opera­
tors. In equations of mono-docked operators, induding Re­
lation and Delay, operand signals are synchronous, that is, at 
any instant, all signais will be present or absent at the same 
time. In contrast, operand signals of multi-clocked operators, 
such as Sampling and Merge, may have different docks. For 
instance, in the Sampling equation, shown in Table 1, the left­
hand side value O will be present only when the right-hand 
side value sl and s2 are present and s2 evaluates to true. 
Table 1 P1imitive constrncts of Signal 
Name Syntax Informa! semantics 
Relation O := /(sl, s2, When sl, s2, ... , sn are present, 0 is 
... , sn) present and the value is /(sl, s2, ... , sn); 
otherwise signal Ois absent 
Delay O := sl $ init r. The docks of O and sl are equal; when sl 
is present, the value of O is the previous 
present value of sl; the initial value of 0 
is C 
Sampling O := sl when s2 0 will be present and evaluated to sl only 
when s l and s2 are present and s2 evaluates 
to true 
Merge O := sl default When sl is present, 0 is present and evalu-
s2 ated to sl; other\vise when s2 is present, 0 
is present and evaluated to s2; if neither sl 
nor s2 is present, signal O is absent 
Name Syntax 
Synchronous Pl Q 
composition 
Local defini- P where t_l sl 
tion ... t_n sn; end 
lnfo1mal semantics 
P and Q are processes. The behavior of 
PIQ is the conjunction of the mutual be­
haviors of P and Q [24] 
Pis a process and sl ... sn are signais. 
The scope of s 1, ... , sn is restricted to P 
which means they are not visible outside 
P[24] 
From the introduction given above, we can give the ab­
stract syntax process, shown as below: 
P, Q ::= x := yfzlPIQIP/x 
The process (P and Q) consists of the synchronous compo­
sition (PIQ) of dataflow equations. P/x is the local definition 
of signals. Dataflow equation "x := yfz" represents that the 
value of xis decided by the input signal y, z and the operation 
f on them. 
To simplify the translation, this paper sets a few restric­
tions on the source Signal program: all equations are w1it­
ten in primitive constructs; every signal can only be defined 
once in the program [24]; all equations are in the same pro­
cess, which means that there is no subprocess in the Signal 
program. Moreover, in Signal, a program is a process and 
shares the same syntax [24]. In the remainder of the paper, 
the source program is a flattened process written in primitive 
constructs. 
Two IDEs, RT-builder [25] and Polychrony [26] are based 
on Signal. The former one is commercial version and Poly­
chrony is open source for academic use. The code generated 
by Polychrony compiler takes the form of the infinite loop of 
elementary iterations. In each iteration, the program will read 
from the input, compute and write to the output. More details 
of this code generation principle can be found in [14]. In this 
paper, we will use the iteration as the execution model of the 
generated code. 
2.2 An example of Signal 
3 Intermediate representation for primitive 
A Signal program presented in [24] called ABRO is used to constructs and the program synthesis method 
illustrate the code generation in the paper. Figure 1 is a finite 
state machine specification of the ABRO process. Due to the declarative feature of Signal, one of the indispens-
Fig. 1 A finite state machine specification of the ABRO process 
ABRO emits signal O when input signals A and B have 
been received. When input signal Ris received, ABRO cornes 
back to the initial state and begins to wait the inputs. Once 0 
has been emitted, it will not be emitted again until R has ar­
rived to reset the state. The original version of the Signal pro­
gram of ABRO can be found in [24]. Here we give a modified 
version, shown in Fig. 2. 
1 : process ABRO= 
2: ( ? boolean A, B, R; ! event O ; ) 
3: ( 1A _received "= B _received "=after _fi_ until_ 0 
4: A received "= A "= B "=R 
5: RT := not R when R 
6: A received :=RTdefaultAR 
7: AT:=A whenA 
8: AR :=ATdefault Adelay 
9: Adelay: = A_received $ init false 
10: IBT:= BwhenB 
11: 1 B _received := RT default BR 
12: I BR := BT default Bdelay 
13: 1 Bdelay := B _received $ init false 
14: from R before O := not O default RR 
15: RR :,;-Re defauît after R until 0 
16: Re := Rwhen R 
- - -
17: after R until O := from R before O $ init true 
18: 0 :=true when ABR 
- - -
19: ABR :=A received whenArr 
20: Arr:= B _received when after_R_until_OI) 
21: where 
22: boolean A received, B received, from R before 0 
23: ,Adelay, Bdelay, AR, BR, RR, ABR, Arr, after_R_until_O 
24: ,AT, BT, RT, Re; end; 
Fig. 2 Signal program of ABRO process 
In the process ABRO, A, B, Rare Boolean typed input sig­
nals and O is event typed output signal, as shown in Line 2. 
Line 3 to Line 20 are dataflow equations specifying the dock 
and value relations among signals. Lines 3 and 4 synchro­
nize the input signals A, B, R with the intermediate signals 
A_received, B_received and after_R_until_O. By analyzing 
the dock relations among signals, it can be deduced that the 
dock of input signals is the only root dock of this program, 
so the pro gram is endochronous. 
able steps when generating imperative code is to translate the 
source program into an intermediate form with the informa­
tion of clock hierarchy and data dependency relations. Sec­
tion 3.1 presents a method translating primitive constructs of 
Signal into the code block of synchronous guarded actions 
which is used to represent the clock and data dependencies 
among the operands of equations. To represent the data de­
pendency relations for the whole program, another interme­
diate form called DG is defined in Section 3.2. Implicit con­
currency of the program can be then detected by analyzing the 
DG, which will be proposed in Section 4. Note that the source 
program to be translated should be endochronous. Based on 
the definition given in [18], the informal description of en­
dochrony is: a Signal program is endochronous if and only 
if the clock of all signals can be computed according to the 
intemal clock relations and no extemal environment runtime 
information is needed, which equivalently means that there 
will be a root dock in the program. 
3.1 Synchronous guarded actions for primitive constructs 
Based on the semantics of core syntax given in Section 2, the 
synchronous guarded actions is defined as below, which is 
different from [22]: 
A synchronous guarded action is a four-tuple (R, L, B, 0). 
Ris the set of signals which represent the right-hand side val­
ues of the primitive constructs. L is the set of the left-hand 
side values. B is the code block defined as (G, A), taking the 
form "if G thenA". G is a Boolean expression and Ais the set 
of actions to be executed when G holds. 0 is the output set 
of the signals which can be used in other blocks' right-hand 
side values. 
There are three kinds of signals: input, output and intenne­
diate signals. Although input signals cannot be the left-hand 
side value of the equation, the dock relations can be specified 
to decide at which instants the value can be read. Intermediate 
signals and output signals can be the left-hand and right-hand 
side value of the equation. 
The synchronous guarded actions representations of the 
core syntax are defined below. We use syntax of C as the 
style of pseudo code in the block and the dock of signal "s" 
is denoted as "C_s". Note that before constructing blocks for 
each equation in the program, dock analysis needs to be com­
pleted to divide all signals into dock equivalence classes so 
that synchronous signais will have the same clock represen­
tation in each block. 
a) 0 := f(sl, s2, ... , sn)
Right-hand side signais: sl, s2, ... , sn, Co 
Left-hand side signal: 0 
if(Co==true) 
0 = j(sl, s2, ... , sn); 
Output of the block: 0 
In the block above, "f" is an n-ary instant operator. The 
right-hand side operands of the block are the operands of op­
erator "f" and the left-hand side operand of the block is O.
The implicit clock relation is '"'O = A s1 = · · · = A sn" while 
C_O is defined as the common dock of these signals. Signals 
on the right-hand side are needed to compute the value of O. 
The Boolean expression in the block, "C_O= =true", means 
that the O can be computed only when O is present at this in­
stant. The Output of the block depicts that after the execution 
of the code block, 0 can be used as a right-hand side value 
and if Ois an output signal, the write action can be executed. 
b) 0 := s1 default s2
Right-hand side signais: C_sl, sl 
Left-hand side signais: 0, C_0 
if(C_sl==true){ 
0 = s1; 
Co=tme; 
Output of the block: C_0, 0 
Right-hand side signais: C_sl, C_s2, s2 
Left-hand side signais: 0, C_0 
if(C_sl==false && C_s2==tme){ 
0= s2; 
C_O=true; 
Output of the block: C_0, 0 
From the semantics of operator Merge, two corresponding 
code blocks are constructed. Signal s1 is prior to s2. If s1 is 
present, 0 is assigned to the value of sl. If sl is absent and 
s2 is present, 0 is assigned to the value of s2. Note that apart 
from the assignment to 0, the dock of 0, denoted by C_O 
should be assigned to trne if s1 or s2 is present. 
c) 0 := s1 when s2
Right-hand side signais: s2, C_s1, C_s2 
Left-hand side signais: 0, C_0 
if(C_sl==true && C_s2==true && s2==true){ 
0 = si; C_O=tme; 
Output of the block: C_0, 0 
From the block shown above, we can see that if sl and s2 
are present and s2 evaluates to true (which means type of s2 
should be Boolean or event), 0 is present and evaluates to the 
value of sl . 
d) 0 := s1 $ init c
Right-hand side signais: C _O 
Left-hand side signais: Null 
if(G_O==true){ 
} 
Output of the block: Null 
For operator Delay, since Delay is a mono-docked opera­
tor, s1 and 0 have the same dock, which means that when 
sl is present at the instant, 0 is also present and can be used 
as right-hand side value. The dock of 0 and s1, denoted as 
C_O, is the single right-hand side signal. However, the corre­
sponding code block has no action since no data dependency 
is defined in the equation. How to assign value to the memory 
signal will be introduced in Section 3.2. 
As for the input signals in the root dock set, at the begin­
ning of each iteration, read actions should be executed. The 
corresponding dock should also be set to true. 
Right-hand side signais: Null 
Left-hand side signais: i 
{ read(i); C _i=true;} 
Output of the block: i, C_i 
For the input signal not belonging to the root dock set, the 
dock can be extracted from the dock calculation, denoted as 
C_s. If C_s evaluates to true, the read actions can be executed 
so that the read will be nested in the same block assigning 
C_s to true. 
Left-hand side signais: i, other signais 




Output of the block: i, other signais 
Two other primitive constructs indude the local definition 
(P/x) and synchronous composition (PIQ). They have no cor­
responding code blocks of synchronous guarded actions. The 
local definition operator enables one to restrict the scope of 
a signal to a process [24]. Intermediate signais are defined in 
this part and they are invisible from the outside of the process. 
Synchronous composition is the union of equations defined 
in the program. Equations communicate with each other by 
common signal variables. The behavior of the program can 
be seen as the conjunction of mutual behaviors of all equa­
tions [24]. Based on the semantics of synchronous composi-
tion, code blocks of synchronous guarded actions generated for Delay equation has no left-hand side value. In this case, 
from the program will be composed into the DG according tests on whether docks of memory signais evaluate to true are 
to the data dependencies among code blocks of synchronous included in the guard condition of b 1 and no precedence re­
guarded actions, used to describe the behavior of the whole lation needs to be specified between nodes respectively con-
program and explore the implicit concurrencies. taining b 1 and b2. 
3.2 Synthesis method based on DG 
After generating the code block of synchronous guarded ac­
tions for each equation in the program, DG can be con­
structed. All signals belonging to the same class are syn­
chronous. The definition of DG is given below: DG is defined 
as (NS, � ). NS is the set of nodes which represents the code 
block of synchronous guarded actions. � is the precedence 
relation between nodes defined over NS as follow: s 1  � s2 
if and only if some signais exist both in right-hand side of 
s2 and left-hand side of sl, which indicates that to execute 
the code in s2, we first need to get the execution result of 
sl. Note that two code blocks for the operator Merge defined 
in Section 3.1 will be treated as one node in DG. A DG is 
correct if every cycle "sO � · · · � sn � sO" is a pseudo cy­
cle: the conjunction of all guard expression of synchronous 
guarded action sO, ... , sn involved in the cycle is false. From 
this, it is also easy to know that DG is not strictly a DAG 
because there may be pseudo cycle in the graph. Note that if 
dependencies from docks to their corresponding signais (in­
put signais) are added, dock constraints are set. In this case, 
values read from the environment must meet the constraints 
to guarantee the correct execution. 
To generate a complete DG, some situations need to be 
considered. Sorne blocks are for the read of input signal with 
no right-hand side signal. These nodes will be composed into 
a single node called the initial node. Since there is no prece­
dence among these nodes, they can be arranged at any order 
when getting composed. Furthermore, the dock of the root 
clock class, denoted as C_l, has to be set to true in the front 
of the initial node. At the end of iteration, a terminal node 
is also needed in which every signal on the left-hand side of 
the Delay equation(denoted as memory signais) will be set to 
new value under the condition that it has been present at the 
last iteration. These blocks are then synthesized into the ter­
minal node in which all the docks of the clock equivalence 
classes are also needed to be set to false. Note that the initial 
assignments to memory signais needs to be executed before 
the iteration begins. Furthermore, according to the definition 
given in Section 3. 1, some blocks (denoted as bl) may have 
the right-hand side signals which are memory signals but the 
code block of synchronous guarded actions (denoted as b2) 
Redundancies in the generated DG can be found as fol-
lows: 
1) The Boolean sub expressions may be duplicated in the
condition expression of the code blocks. For instance, ac­
cording to the algorithm given in Section 3.1, the corre­
sponding condition of the equation "c := b when not b" is 
"C_b==true&& C_b==true &&b==false". 
2) If there are blocks with signais in the same clock equiva­
lence class as the left-hand side value, there will be duplicated 
assignments to the dock. 
To eliminate these redundancies, the duplicated Boolean 
expressions are to be deleted first. Then traverse from the ini­
tial node, if there is a clock assignment in a node, the same 
assignment in its subsequent nodes will be deleted. 
4 Task partition strategy 
As defined in Section 3, DG is a kind of DAG on which prece­
dence relation is defined among nodes. Informally, the prece­
dence relation indicates the dependency between nodes. If no 
precedence relation exists between two nodes, they can run in 
parallel. In this section, we partition a set of nodes of DG into 
tasks. The precedence relation on tasks is compatible with the 
precedence relation on nodes. 
Task is defined as a set of nodes belonging to NS of DG. 
TS is a two-tuple (T, �),in which T is a  partition of NS and 
� is the precedence relation among tasks on T. A task t of 
T is an anti-chain in the reflexive transitive closure -v0 of � 
(i.e., nodes in a task cannot be compared: (Vt E T)(Vnl E 
t)(Vn2 E NS)(nl � n2) ⇒ ((nl = n2) V (n2 (J. t)). Among 
tasks belonging to T, tl � t2 if and only if there exists at 
least one node in t2 which is preceded by nodes in tl. Note 
that although cycle checking has been done after the con­
struction of DG, there may be pseudo cycles of tasks since 
pseudo cycles are allowed in DG. To deal with this problem, 
the guard of a task t is defined as the disjunction of the guards 
of nodes belonging to t. If the conjunction of ail these guards 
of tasks involved in the cycle is false, the cycle is a pseudo 
cycle. Because of pseudo cycles, the result of some nodes in 
t2 may be required by some nodes in t1 (when some condi­
tion C is true) and conversely (when the condition C is false). 
In this paper we only consider DAG of nodes: the processing 
of programs with cycles and pseudo cycles is not described 
here. One can then use a topological sorting to partition nodes 
into tasks so that the result of the partition is a total order of 
tasks: for all tasks t1, t2, ... , tn belonging to T, a series of 
them, t 1 -» · · · -» tm
n
-i -» lm11 exists. As a result, no task pair 
will be allowed to execute in parallel. Here we illustrate the 
task partition with the example in Section 2. 
Part of the code is shown in Fig. 3. Lines 3 and 4 show 
the root clock of the program is the clock of input signals A,
B and R. Synchronized with these signals, intermediate sig­
nal A_received, B_received and after_R_until_O also belong 
to the root class. The initial node, as a result, contains the 
read of the input signals and the assignment to the root clock 
C _l. Tuen for each equation in the program, corresponding 
code block of synchronous guarded actions are generated. Fi­
nally, these nodes are composed into the DG shown in Fig. 4. 
Arrows in the figure represent the precedence relation "--+". 
Note that since Adelay and Bdelay are memory signals and 
the guard expression "C_l==true" implies that Adelay and 
Bdelay are present (Adelay and Bdelay are in the root clock 
3: (1 A_received "= B_received "= after_R_until_O
4: 1 A_received "= A "= B "= R
5: 1 RT := not R when R
6: 1 A _received := RT default AR 
7: 1 AT := A when A 
8: 1 AR := AT default Adelay 
9: 1 Adelay := A_received $ init false 
l 0: 1 ET := B when B
11: 1 B _received := RT default BR
12: I BR := ET default Bdelay 
13: 1 Bdelay := B _received $ init fa!se
Fig. 3 Fragment of ABRO process 
H{C 3=Jitlse & 












HlC 4=fülsc && 
C · l ;;"'·truc) { 
IJR=Bdelay� 
If( C 2,·t····trne)' 
lJ fcccivcd=RT: 
l -
Fig. 4 DG con-esponding to Signal program in Fig. 3 
class), corresponding nodes for two Delay equations are 
omitted and there is no arrow explicitly illustrating the prece­
dence relation between Adelay and AR nor between Bdelay 
and BR.
The result of the task partition is shown in Fig. 5. Nodes of 
DG are divided into four tasks. Arrows in the figure illustrate 
the total order among four tasks: taskl will be executed first 
and task4 will be the last one to be executed. Nodes in the 
same task can be executed in parallel. For instance, in task2 
there are three nodes preceded by the node in taskl. How­
ever, there is no precedence relations among these nodes so 
that they can be executed in parallel. 
After the task partition, the generated tasks will be used 
for the OpenMP code generation which will be presented in 
Section 5. 
5 OpenMP based C code generation and case 
study 
OpenMP, an API for shared-memory parallel programming in 
C/C++ and FORTRAN, provides users with several mecha­
nisms such as compiler directive, programming interface and 
environment variables for the high level description of paral­
lel algorithms. This section will introduce the method map­
ping tasks partitioned in Section 4 to the OpenMP-based C 
code. 
The basic syntax of directives in OpenMP is shown in 
Fig. 6. There are several directives in OpenMP. For instance, 
directive "parallel for" is used for the parallelization of "for" 
loop; directive "parallel sections" is used to specify the code 
blocks which can be executed in parallel. ln OpenMP 3.0, 
directive "task" is added to support the parallelization of ir­
regular data, iteration and recursive call. Since actions of 
code blocks are simple computations, we choose the directive 
"parallel sections" for the parallelization, shown in Fig. 7. 
The code blocks executed in parallel are encircled in direc­
tive "#pragma omp section" respectively. 
Moreover, race condition will occur when multiple threads 
can access shared variables at the same time, which will make 
the result of the execution non-deterministic. Clauses, such 
as p1ivate, shared and reduction, are provided to specify the 
variable scope and sharing property to handle this problem. 
Clause private (list) is used to declare that each thread has its 
own duplicate of the variables in the list. Clause shared (list) 
declares the list of shared variables among threads. Clause re­
duction (operator:list) specifies an operation on one or a list 
of variables. Each thread has duplicates of variables in the list 
Task2 
If( C _ 1 ==true && A==true) { 
RT=notR; 
If( C _ 1 =truc && B==true) { 
BT=B; 
If( C _3==true){ 
AR=AT; 
} 






























Fig. 5 Task partitions of DG in Fig. 4 
#pragma omp directives [clause[clause]. .. ] 
Fig. 6 Syntax of OpenMP directive 
no common variables. Consequently, the race condition will 
not appear in the generated code. 
and when all threads finish their executions, initial variables 
will be updated according to the calculation among its dupli­
cates. However, every signal will be defined only once in the 
source program so that parallel nodes in the same task share 
We take the form of the infinite loop of elementary itera­
tions from [14] as the structure of the generated code. Tasks 
will be generated into the OpenMP structure as the core of 
the iteration. Here we only give the method translating tasks. 
Firstly, every node are translated into C code block. Secondly, 
#pragma omp sections [clause[[,] clause] ... ] 
[ #pragma omp section] 
structured-block 
[#pragma omp section 
structured-block 
Fig. 7 Syntax of directive parallel sections 
each translated code block is encircled in the directive 
"#pragma omp section". Tuen, all blocks belonging to the 
same task will be encircled by directive "#pragma omp paral­
lel sections". Finally, the sequential order of these blocks will 
be determined according to the total order specified among 
tasks. Note that the initial and tenninal node of DG are re­
spectivel y put in the front and the rear of the iteration and 
initial assignments to the memory signais should be put be­
fore the iteration part. 
According to the method given above, fragment of the pro­
gram in Fig. 3 can be translated to the OpenMP structure, 
shown in Fig. 8. We can see that sequential code is generated 
according to the order of the tasks. Since task2, task3, and 
task4 hare multiple code blocks, the corresponding OpenMP 
directives "#pragma omp parallel sections" are respectively 
generated. Code blocks which can be executed in parallel 
are encircled in the directive "#pragma omp section". Note 
that although assignments to the memory signal Adelay and 
Bdelay are not shown in Fig. 8, the value of these two signais 
can be determined when they are on the right-hand side of the 
assignment statement, as Section 3.2 has indicated. 
6 Conclusion and future works 
Tiùs paper presented a methodology transforming en­
dochronous Signal program (using core syntax) to OpenMP­
based C code. First, the translation of Signal core primitives 
to code blocks of synchronous guarded action was described. 
Tuen, the formai definition of DG was presented, used to ex­
plore the implicit concurrency. From DG, the definition of 
task was given. Nodes of DG can be partitioned into tasks. 
Tasks will be executed in sequence while in each task, nodes 
can be executed in parallel. Finally, the method translating 
tasks into OpenMP-based C code was introduced. Using the 
approach, the generated program can run on multi-core pro­
cessors, increasing the utilization of computation resources. 
Moreover, since the generation target OpenMP is a multi­
platform standard, few modifications are needed for multi­
platform execution. 
However, several improvements can be accomplished from 
read(A); read(B); read(R); 
C l =true· 
#�agma 'omp parallel sections { 
#pragma omp parallel section { 
if(C_l=true && A==true){ 
AT=A; 
C 3=trne;}} 
#pragm;- omp parallel section { 
if( C _ l =trne && B==trne ){ 
BT=B; 
C 4=true;}} 
#pragm;- omp parallel section { 
if( C _ l =true && R==true) { 
RT=not R; 
C_2=true;}} 
#pragma omp parallel sections { 
#pragma omp parallel section { 
if( C _3=true) { 
AR=AT;} 
if(C_3=false && C_l=true){ 
AR=Adelay;}} 
#pragma omp parallel section { 
if( C _ 4=true) { 
BR=BT;} 
if( C _ 4=false && C _ l =true) { 
BR=Bdelay;}} 
#pragma omp parallel sections { 
#pragma omp parallel section { 
if(C_2=true){ 
A_received=RT;} 
if(C_2=false && C_l=true){ 
A_recevied=AR;}} 
#pragma omp parallel section { 
if(C_2=true){ 
B _received=RT;} 
if(C_2=false && C_4=true){ 
B_received=BR;}} 
Fig. 8 OpenMP-based C code corresponding to the Signal program frag­
ment in Fig. 3 
the current study. For instance, the method proposed in this 
paper does not allow the parallel execution among tasks, 
which may restrict the possibility of generating more effi­
cient code. Another problem is that the methodology does 
not support the transformation of weakly endochronous Sig­
nal program, which limits the practicality of the study. In the 
future work, we will study how to check weak endochrony 
and generate deterministic code from weakly endochronous 
programs. Moreover, Signal provides arrays of processes to 
handle data arrays which is suitable for parallel execution. To 
generate better OpenMP code for these features is also one 
of our objectives. 
Acknowledgements This work was supported by the National Natural Sci­
ence Foundation of China (Grant Nos. 61073013 and 61003017) and the 
Aviation Science Foundation of China (2012ZC51025). Grateful acknowl­
edgement is made to Mr. Mamoun FILALI-AMINE, Prof. Jean-Pau! BODE-
VEIX from IRIT-CNRS and Prof. Paul Le Guernic from INRIA. They have 
given a lot of instructive advice to this paper. 
References 
l. IEEE POSIX standardization authority. http://standards.ieee.org/regauth/ 
posix/
2. Microsoft windows threads. http://msdn.rnicrosoft.com/
3. MPI: A message-passing interface standard version 3.0. 
http ://www. mpi-forum. org/ docs/mpi-3. 0/mpi 30-report. pdf 
4. The OpenMP API specification for parallel programrning.
http :/  openmp. orgjwp/
5. Intel thread building blocks. http://www.threadingbuildingblocks.org/
6. Lee E A. The problem with threads. Computer, 2006, 39(5): 3 3-42 
7. Benveniste A, Berry G. The synchronous approach to reactive and real­
time systems. Proceedings of the IEEE, 1991, 79(9): 1270-1282 
8. Benveniste A, Caspi P, Edwards S A, Halbwachs N. Le Guernic P, De
Simone R. The synchronous languages 12 years later. Proceedings of
the IEEE, 2003, 91(1): 64-83 
9. Berry G, Gonthier G. The esterel synchronous programrning language: 
design, semantics, implementation. Science of Computer Program­
rning, 1992, 19(2): 87-152
10. Halbwachs N, Caspi P, Raymond P, Pilaud D. The synchronous data 
flow programrning language lustre. Proceedings of the IEEE, 1991, 
79(9): 1305-1320
11. Le Guernic P, Gautier T, Le Borgne M, Le Maire C. Programrning real­
time applications with signal. Proceedings of the IEEE, 1991, 79(9): 
1321-1336
12. Le Guernic P, Talpin J P, Le Lann J C. Polychrony for system design. 
Journal of Circuits , Systems, and Computers, 2003, 12(3): 261-303
13. Potop-Butucaru D, Caillaud B, Benveniste A. Concurrency in syn­
chronous systems. F01mal Methods in System Design, 2006, 28(2):
111-130
14. Besnard L, Gautier T, Talpin J P. Code generation strategies in the 
polychrony environment. http://hal.iruia.fr/docs/00/ 37 /24/12/PDF/RR-
6894. pdf
15. Besnard L, Gautier T. Le Guernic P, Talpin J P. Compilation of poly­
chronous data flow equations. ln: Synthesis of Embedded Software, 
1-40. Springer, 2010 
16. Talpin J P, Ouy J, Gautier T, Besnard L ,  Le Guernic P. Compositional
design of isochronous systems. Science of Computer Programrning, 
2012, 77(2): 113-128
17. Benveniste A, Caillaud B, Le Guernic P. Compositionality in dataflow
synchronous languages: specification and distributed code generation.
Inf01mation and Computation, 2000, 16 3(1): 125-171 
18. Jose B A, Shukla S K, Patel H D, Talpin J P. On the deterministic multi­
threaded software synthesis from polychronous specifications. In: Pro­
ceedings of the 6th ACM/IEEE International Conference on Formai
Methods and Models for Co-Design. 2008, 129-138
19. Maffeïs 0, Le Guernic P. Combining dependability with architectural 
adaptability by means of the signal language. ln: Static Analysis, 99-
11 O. Sp1inger, 1993 
20. Potop-Butucarn D, Sorel Y, Simone d R, Talpin J P. From concurrent
multi-clock programs to detenninistic asynchronous implementations.
Fundamenta Informaticae, 2011, 108(1): 91-118
21. Papailiopoulou V, Potop-Butucaru D, Sorel Y, Simone d R, Besnard 
L, Talpin J. From design-time concurrency to effective implementation 
parallelism: The multi-clock reactive case. In: Proceedings of the 2011 
Electronic System Level Synthesis Conference. 2011, 1--6 
22. Baudisch D, Brandt J, Schneider K. Multithreaded code from syn­
chronous programs: extracting independent threads for openmp. In:
Proceedings of the 2010 Conference on Design, Automation, and Test
in Europe. 2010, 949-952 
23. Besnard L, Gautier T, Le Guernic P. Signal v4-Inria version: reference 
manual, 2008 
24. Gamatie A. Designing embedded systems with the signal programrning
language. Springer, 2010
25. RT-builder, geensys. http://www.geensys.com/ 
26. Polychrony. http://www.irisa.fr/espressojPolychrony/ 
Kai Hu is an associate professor at Bei­
hang University, China. He received his 
PhD degree from Beihang University 
in 2001. From 2001 to 2004, he did 
the post-doctoral research at Nanyang 
Technological University, Singapore. 
Since 2004, he is the leader of the team 
of LDMC in the lnstitute of Computer 
Architecture(ICA), Beihang university. 
His research interests concem embedded real time systems and high 
performance computing. He has good cooperation with IRIT and 
INRIA Institute of France on study of AADL and synchronous lan­
guages. 
Teng Zhang received his BE in com­
puter science and engineering from 
Beihang University in 2011. He is now 
the master's degree student at the same 
university. His research interests in­
clude synchronous languages, model­
ing of embedded system and fo1mal 
methods. 
Zhibin Yang received his PhD degree 
from Beihang University, China, in 
February 2012. Since April 2012, he 
has been a Postdoc in IRIT research 
laboratory of University of Toulouse, 
France. His research interests include 
safety-critical real-time system ,  formal 
verification, AADL, and synchronous 
languages. 
