Separate compilation of hierarchical real-time programs into linear-bounded Embedded Machine code  by Ghosal, Arkadeb et al.
Science of Computer Programming 77 (2012) 96–112
Contents lists available at SciVerse ScienceDirect
Science of Computer Programming
journal homepage: www.elsevier.com/locate/scico
Separate compilation of hierarchical real-time programs into
linear-bounded Embedded Machine code✩
Arkadeb Ghosal a, Daniel Iercan b,∗, Christoph M. Kirsch c, Thomas A. Henzinger d,
Alberto Sangiovanni-Vincentelli a
a University of California, Berkeley, CA, USA
b ‘‘Politehnica’’ University of Timisoara, Romania
c University of Salzburg, Austria
d Institute of Science and Technology (IST), Austria
a r t i c l e i n f o
Article history:
Received 14 April 2008
Received in revised form 18 November
2009
Accepted 1 June 2010
Available online 16 June 2010
Keywords:
Real-time
Coordination language
Hierarchy
Compiler
Virtual machine
a b s t r a c t
Hierarchical Timing Language (HTL) is a coordination language for distributed, hard real-
time applications. HTL is a hierarchical extension of Giotto and, like its predecessor,
based on the logical execution time (LET) paradigm of real-time programming. Giotto is
compiled into code for a virtual machine, called the Embedded Machine (or E machine).
If HTL is targeted to the E machine, then the hierarchical program structure needs to be
flattened; the flattening makes separate compilation difficult, and may result in E machine
code of exponential size. In this paper, we propose a generalization of the E machine,
which supports a hierarchical program structure at runtime through real-time trigger
mechanisms that are arranged in a tree. We present the generalized E machine, and a
modular compiler for HTL that generates code of linear size. The compiler may generate
code for any part of a given HTL program separately in any order.
© 2010 Elsevier B.V. All rights reserved.
1. Introduction
With the rapid increase in complexity of distributed embedded controllers (e.g., automotive drive-by-wire controllers,
avionics fly-by-wire controllers, intelligent building operators, etc.), efficient specification, verification, and synthesis of such
systems are of utmost importance. Platform-based design [1] is one of the proposed approaches to handle such complexity
by separating functional specification (capturing what the controller is supposed to do) from architectural specification
(capturing the resources available to implement the controller). Given the two specifications, the designer checks whether a
valid implementation of the functional specification over the given architecture is possible or not; validity includes semantic
preservation as well as maintaining system properties like latency, reliability, and cost. Such designmodels are increasingly
becoming critical for system design where sub-systems (e.g., automotive controllers) are supplied by different parties.
Conceptualization of platform-based design can be observed in industry in specification models (e.g., AUTOSAR [2], which
restricts interface definition of functions), and communication models (e.g., FlexRay [3], which assumes a global notion of
time).
An effective modeling of the separation-of-concerns approach is the logical execution time (LET) [4] model. In the LET
model, a task is specified by release and termination events, which are independent of the underlying platform. The LET
✩ Supported by the EU ArtistDesign Network of Excellence on Embedded Systems Design and the Austrian Science Fund P18913-N15.∗ Corresponding author.
E-mail addresses: arkadeb@gmail.com (A. Ghosal), daniel.iercan@aut.upt.ro (D. Iercan), ck@cs.uni-salzburg.at (C.M. Kirsch), tah@ist.ac.at
(T.A. Henzinger), alberto@eecs.berkeley.edu (A. Sangiovanni-Vincentelli).
0167-6423/$ – see front matter© 2010 Elsevier B.V. All rights reserved.
doi:10.1016/j.scico.2010.06.004
A. Ghosal et al. / Science of Computer Programming 77 (2012) 96–112 97
model restricts the reading of inputs at release events, and writing of outputs at termination events. An implementation of
the task is valid if the execution can be completed within the time window specified by the release and termination events.
With respect to traditional approaches, where task output is available as soon as the execution is complete [5], the LETmodel
provides portability as well as efficiency in analysis. In contrast to the synchrony paradigm (e.g., [6,7]), which assumes that
the underlying platform is much faster than the response time of tasks, the LET model provides restricted expressiveness,
and thus requires less formal checking (e.g., absence of zero-time cycles).
The hierarchical timing language (HTL) is a hierarchical coordination language for distributed, hard real-time
applications [8,9] based on the LET model. HTL programs determine portable and predictable real-time behavior of periodic
software tasks running on a distributed system of host computers. An HTL program specifies task-to-host mappings, task
frequencies and dependencies, mode switching, and I/O times; HTL programs do not specify task implementations, which
are assumed to be done in some general purpose language such as C or Java. A task in HTL is essentially sequential code
that reads inputs, computes, and writes outputs. HTL offers two fully hierarchical programming constructs: (sequential,
conditional, parallel) composition of tasks as well as refinement of abstract tasks into concrete tasks. An abstract task has
a frequency, specific I/O times and dependencies, and a worst-case execution time (WCET), but no implementation. An
abstract task is a temporally conservative placeholder for a concrete task with an implementation. A concrete task that
refines an abstract task should have the same frequency as the abstract task, at least as much time to compute, i.e., possibly
relaxed I/O times and dependencies, and as much or smaller WCET than the abstract task. The result is that a concrete
HTL program is time-safe (schedulable) if it refines a time-safe abstract HTL program [8]. In general, checking refinement
in HTL is exponentially faster than checking time safety (schedulability). Thus, the HTL methodology advocates to design,
when possible, hierarchical programswith abstract tasks, which are checked for time safety, and refined intomore detailed,
concrete tasks.
After checking refinement and time safety, HTL programs are compiled into E code for the Embedded Machine [10]
or E machine. E code is virtual machine code with specific instructions for timing I/O activity, native task computation,
and host-to-host communication. Time-safe E code is portable and predictable, and therefore provides a hardware- and
OS-independent target abstraction for compiling possibly distributed real-time programs. Further compiling E code into
native code is possible but has so far not been necessary even for high-performance applications such as helicopter flight
control [11,12]. However, E code has originally been designed as target for compiling non-hierarchical programs written in
Giotto [4], which is the predecessor of HTL. As a consequence, HTL compilation into conventional E code involves flattening
of the HTL programs, and may therefore result in exponentially larger E code programs. Flattening also prohibits compiling
parts of large HTL programs separately. Since the original HTL compiler [8] has to flatten anHTL description before compiling
it into E code, we will refer to this compiler as the flattening compiler in the rest of the paper.
In this paper, we propose to extend E code by adding instructions for maintaining hierarchical program structure at
runtime to enable separate compilation of (parts of) HTL programs into E code programs, whose size is linear bounded with
respect to the size of the HTL code. Our solution trades off runtime performance for compile-time convenience, and E code
size, because execution of E code compiled from flattened HTL programs may result in lower runtime overhead than the
execution of such hierarchical E code, or HE code. All original and most new instructions can be executed in constant time.
However, one new instruction that involves traversing hierarchical structure requires linear time with respect to the size
of the original HTL program. This may only be avoided by flattening HTL programs prior to compilation. For simplicity
and clarity, we have chosen to define the new instructions in RISC style, where most instructions have rather simple,
‘‘atomic’’ semantics. So far, runtime performance has not been an issue but may easily be improved using CISC-style macro
instructions. Since the new HTL compiler preserves hierarchical structure of an HTL description at compile time, we will
refer to this compiler as the hierarchy-preserving compiler in the rest of the paper.
Code generation for languages based on the LETmodel.Other than Giotto, real-time languages like TDL [13] and timed
multitasking (TM) [14] are also based on the LET model. Among these languages, TM, an actor-based language, can express
hierarchy, i.e., an actor can contain another actor; however, code generation for TM programs does not explicitly address the
hierarchical structure. The main difference between the HTL compiler presented in this paper, and the compilers developed
for the other languages based on LET, is that the HTL compiler explicitly addresses the hierarchical structurewhile the others
do not.
Code generation for languages based on the synchrony model. An advantage of HTL over existing synchronous
languages is the support for refinement of tasks into task groups with precedences while maintaining the LET model. Lustre
addresses hierarchical structure through the Simulink-to-SCADE/Lustre-to-TTA [15] tool chain; the tool accepts discrete
time models written in Simulink, translates the models to Lustre models, verifies system properties (e.g., schedulability),
and generates code for a target time-triggered architecture. Nevertheless, when a Simulink schema is translated into
Lustre models, the hierarchical structure is flattened; thus the Lustre compiler does not address hierarchical structure at
compile time. Taxys [16] is a tool chain that combines Esterel with model checker Kronos, and generates an application
specific scheduler that ensures timing commitment of tasks; Taxys does not address hierarchical structure at compile time.
Another difference is that HTL compiler generates code for a virtual machine, which makes the generated code portable
across implementations, while the above tool chains generate code for specific targets. Recent work (e.g. [17,18]) addresses
modularity in synchronous languages; however, as discussed earlier, the syntactic/semantic constraints of HTL are tighter,
which simplifies semantic verification and code generation.
98 A. Ghosal et al. / Science of Computer Programming 77 (2012) 96–112
Fig. 1. Overview of the three-tank system.
Fig. 2. Overview of the implementation.
Overview. Throughout the paper we use a real-time controller for a three-tank system (3TS) to illustrate the
contributions. There are three tanks T1, T2, and T3 (Fig. 1) each with an evacuation tap; the taps can be controlled to trigger
perturbations in the system. The tank T3 is connected to the tanks T1 and T2; the interconnections between the tanks are
controlled by the taps. Two pumps, P1 and P2, feed water into the tanks T1 and T2, respectively. The goal of the controller
is to maintain the level of water in the tanks T1 and T2 under the presence and absence of perturbations. If there is no
perturbation, a P (proportional) controller is used; under perturbations, a PI (proportional integral) controller is used [19].
The implementation (Fig. 2) of the controller is distributed over three E machines — two of them regulate the pumps
feeding the tanks T1 and T2; and the third one implements the communication across the E machines, and the plant.
The compile-time support for separate HTL compilation into HE code is provided by extending an existing HTL compiler
implementation [8]. The implementation of runtime support for HE code is developed by extending an existing E machine
implementation [10]. Each Emachine is implemented in C on aUnixmachine. The tasks are implemented in C. The schedulers
in the Unixmachines are used for scheduling the released tasks. The Emachines communicatewith each other through UDP.
Communicationwith the system (e.g., reading levels ofwater in tanks, and sending fill debit commands to pumps) is done via
a TCP server implemented on aWindows 98 machine. TheWindows 98 machine communicates with the 3TS plant through
a data acquisition card, namely, DAC 98. Refer to [20] for implementation details and demo.
Section 2 presents themajor constructs of HTL. Section 3 discusses the design of HE code, Section 4 presents an overview
of compile-time support for HTL programs in HE code, and Section 5 presents the compilation of HTL program into HE code.
Section 6 analytically compares the complexity of theHTL compiler proposed in this paper, and the original HTL compiler [8];
Section 7 does the comparison by generating code for prototype HTL programs.
2. Hierarchical timing language
The coordination languageHTL is centered around three coremodels: the computationmodel, the communicationmodel,
and the composition model — the computation model specifies the interface of tasks, the communication model specifies
the communication between task interfaces, and the composition model specifies the (sequential, parallel, conditional, and
hierarchical) interaction between tasks.
The computationmodel in HTL is the logical execution time (LET) taskmodel. A LET task is a block of sequential codewith no
internal synchronization point. Each task has a release event and a termination event specified by clock ticks or completion
events of other tasks. The task reads the inputs at the release event (even if the task starts executing later), and the task
updates the outputs at the termination event (even if the task terminates earlier). The LET model decouples the time when
the inputs are read, and outputs are written from actual execution, which makes the model time- and value-deterministic,
portable and composable [10].
The communication model of HTL is based on the notion of communicator [8]; a communicator is a typed variable that
can be accessed (read from or written to) with a specified periodicity. Communicators are used to exchange data with the
environment (sensors and actuators are special cases of communicators) or between tasks. A task in HTL reads from certain
A. Ghosal et al. / Science of Computer Programming 77 (2012) 96–112 99
Fig. 3. HTL program for the three-tank controller (HTL keywords are shown in italics).
instances of some communicators, computes a function, and writes to certain instances of other communicators. The latest
read instance, and the earliest write instance implicitly specify the available execution window (or the LET) for the task.
The composition model specifies the model of interaction between tasks. A group of tasks communicating over a set
of communicators form a sequential composition or an HTL mode. A mode can switch to a different mode depending on
specific switching scenarios; a network of modes connected with each other through conditional switches is a conditional
composition (of tasks) or an HTL module. A group of modules composed in parallel form an HTL program, and allows parallel
execution of tasks (in different modules). The hierarchical composition is specified through refinements — any mode in an
HTL program can be refined by another HTL program; refinement allows selective replacement of tasks (which are higher
in the refinement) by other tasks (which are lower in the refinement) during execution. Rules on refinement ensures that
timing properties of the (refined) higher level programs are maintained by the (refining) lower level programs [9].
2.1. HTL code for the three-tank controller
An HTL program, P_3TS, for the three-tank controller is shown in Fig. 3. Code snippets related to the major constructs
have been shown; refer [20] for the complete code. The program consists of three modules (T1, T2, and IO), and several
communicators (h1,u1 etc). Each communicator has a type, a period, and an initial value. For example, the communicatorh1
(type double, period 100ms, and initial value 0) denotes the height of water in tank T1, while the communicator u1 (type
double, period 100ms, and initial value 0) denotes the required motor current (for pump P1) computed by the controller.
The module T1 implements controller for P1, the module T2 implements controller for P2, and the module IO implements
the communication between the tanks. Each module is a network of modes with one being the start mode. For example,
module T1 has onemode m_T1, which is also the start mode. Amodule restricts the tasks that can be invoked in constituent
modes by specifying a set of task declarations, i.e., the modes in a module can only invoke tasks declared in the module.
A task declaration in a module defines the type of input/output interface for the task — a task invocation in a mode binds
the input/output interface of the task invocation to specific communicators; the type of the communicators must match the
declared type of the input/output interface. Each mode has an associated period which denotes the invocation rate of the
tasks invoked in the mode. The module T1 declares a task t_T1, which computes the current for pump P1— the task reads
an input variable of type double, and updates an output variable of type double. In the mode m_T1, the task is invoked
— the invocation of task t_T1 reads from the third instance of the communicator h1, and updates the fourth instance of
the communicator u1. Thus the task reads the current height of water in the tank T1, and computes the required current
to be applied to pump P1. The period of the mode being 500 ms, and the periods of the communicators being 100 ms, the
logical execution window for the invocation of task t_T1 is between 300 and 400 ms. The structure of themodule T2 is very
similar to that of themodule T1. Themodule T2 consists of onemode m_T2, and declares a task t_T2; themode invokes the
task to compute the required value of motor current (represented by the communicator u2) for pump P2 from the height
(represented by the communicator h2) of water in tank T2. The module IO also has one mode readWrite, which invokes
100 A. Ghosal et al. / Science of Computer Programming 77 (2012) 96–112
Fig. 4. HTL refinements for the three-tank controller.
tasks to communicate between controller, tanks and pumps. The task invocations in the mode readWrite is discussed in
Section 2.3.
2.2. Refinement in HTL
A mode (referred to as parent mode) can be refined by another HTL program (referred to as refinement program); any
mode in the refinement program is a child mode of the parent mode. Themode m_T1 is refined by the program P_T1 (Fig. 4),
which has a single module with two modes m_T1_P and m_T1_PI. The modes m_T1_P and m_T1_PI are child modes for
parent mode m_T1. Each task (referred to as child task) in a child mode maps to a unique task (referred to as parent task) in
the parent mode. The modes m_T1, m_T1_P, and m_T1_PI invoke tasks t_T1, t_T1_P, and t_T1_PI, respectively, with
t_T1 being the parent of the other two tasks. During execution, the parent task is replaced by the child task, i.e., instead
of t_T1, either t_T1_P or t_T1_PI executes. The declaration for a parent task does not include the name of the function
that implements the computation of the task; however the declaration for the child tasks include the function name. For
example, declaration of the parent task t_T1 does not include a function name, while the declaration of t_T1_P consists
of function name f_T1_P. The task t_T1_PI is a child task but is also a parent to tasks in refinement program P_T1_2PI;
hence the corresponding task declaration has no associated function. The function is not defined in the HTL program, and is
expressed in different programming language (e.g., C); the function body is linked at runtime.While the taskt_T1 represents
a control task for pump P, the tasks t_T1_P and t_T1_PI are the proportional (P) and proportional integral (PI) version
of the control algorithm, respectively. In other words, a parent task is an abstract specification, while the child tasks are
concrete implementations of the specification. Instead of specifying a functional behavior, a parent task specifies the timing
behavior of the concrete task, e.g., parent mode and child mode have identical periods, a child task cannot be released
(resp. terminated) later (resp. earlier) than the parent task, and the WCET of the child task is bounded by that of the parent
task; these constraints are referred to as refinement constraints. There can be tasks (in parent mode) that are not parent to
any child task — these tasks will execute in parallel with the tasks in child mode.
Mode refinement does not add expressiveness; an HTL program with multiple levels of refinement can be translated
into an equivalent flat program without refinement. The mode m_T1 can be replaced by the switching modes m_T1_P
and m_T1_PI. However mode refinement helps in a structured and concise specification. In the top-level, mode m_T1
invokes control task for pump P1; however no distinction is made for different scenarios. In the second level (program
P_T1) the distinction is made between absence and presence of perturbations, thus requiring the use of P and PI control
tasks. There can be subsequent refinement (e.g., P_T1_2PI) that distinguishes slower and faster invocations of PI control
task for pump P1. Refinement helps in succinct expression of choice (a task is parent to several children tasks in different
sequential modes), change (the IO of a child task may differ from that of the parent), space (empty parent tasks that can
be refined later), and replacement (replacing a refinement programwith another). Mode refinement helps in conservatively
simplifying program analysis: e.g., the schedulability check need only be done for the top-level program, and the refinement
constraints preserve [8] the schedulability across the hierarchy. Fig. 5 shows a schematic view of the hierarchical structure
of the program.
A. Ghosal et al. / Science of Computer Programming 77 (2012) 96–112 101
Fig. 5. HTL program for the three-tank controller.
Fig. 6. Interaction between tasks and communicators for the three-tank controller.
2.3. Inter-task communication in HTL
Tasks communicate via communicators. A communicator can be accessed (read from or written to) at the periodicity
stated by the communicator definition. To avoid non-determinism at a point of access, only one task can write to a
communicator; additionally, a communicator update must precede reading from the communicator. The first constraint is
preserved by syntactic checks on an HTL program; the second constraint is ensured by runtime code. For example, the task
t_T1 reads the third instance of the communicator h1, and writes to the fourth instance of the communicator u1. No other
task can write to that instance of the communicator u1; also any task reading that instance of the communicator must do so
after the communicator has been updated. The latest read and the earliest write time implicitly specify the LET of the task;
in the case of the task t_T1, the LET is from 300 to 400ms. Tasks within a module can communicate via the communicators
defined in the program containing themodule. Tasks in a refinement program can communicate via communicators defined
in the refinement program and any ancestor program.
Tasks can also communicate with one another through untimed variables referred to as ports. Fig. 6 shows two tasks
t_filterH and t_estimateV1 (in mode readWrite) communicating via port local_h1. The task t_filterH reads
the communicatorh1 (at300 ms), andwrites to the communicatorh1f and to theportlocal_h1. The taskt_estimateV1
reads the port local_h1 and the communicator u1, and writes to the communicator h1c. The task t_filterH reads the
height of water in the tank T1, and computes the actual height (represented by communicator h1f) by filtering out noises
due to ripples; the task t_estimateV1 computes the should be height (represented by communicator h1c) of water in
the tank based on the mathematical model of the controlled plant in the absence of perturbation. The difference in the
values of the communicators h1f and h1c denotes the perturbation in the tank, and hence provides the switching condition
between modes m_T1_P and m_T1_PI. Both the tasks write to communicators at the end of the period; hence the task
t_estimateV1 cannot access the result from the other task t_filterH until the end of the period. This may result in a
delay of one mode period (500 ms), and may not be tolerable. The internal communication between tasks within a period
is served by ports to avoid unnecessary delays — through the local port local_h1, the task t_estimateV1 can access
the output of the task t_filterH as soon as the task t_filterH completes execution; the task t_estimateV1 does not
need to wait until the end of the LET for the task, which is at 500 ms. Use of port provides a hybrid LET model where a task
evaluation can be read at a point within the LETwindow of a task to avoid any delay. Ports are local tomodules, i.e., the tasks
102 A. Ghosal et al. / Science of Computer Programming 77 (2012) 96–112
Fig. 7.Mode switch for HTL programs (active modes are shown in bold font).
declared in amodule can only use the ports (of themodule) for communication; the ports cannot be used for communication
outside the module.
2.4. Mode switching in HTL
HTL allows mode switching (at the end of mode periods) to model changes in control laws. In the complete specification
of the program P_3TS there are three modules with one mode each: m_T1, readwrite, and m_T2. The mode m_T1 is
further refined by program P_T1 with two modes m_T1_P and m_T1_PI. The mode switching happens based on a switch
condition defined in each of the modes. The switch condition is checked at the end of mode period, which is 500 ms in
the example. The mode m_T1_PI is further refined by program P_T1_2PIwith two modes, m_T1_PI_S and m_T1_PI_F,
switching between themselves. In HTL, switches for a parentmode and its childrenmodes are enabled simultaneously due to
constraints on timing behavior. The HTL semantics prioritizes the mode switch check (and subsequent action) of the parent
mode over those of the children. Consider an instance when the modes m_T1, m_T1_PI, and m_T1_PI_F are active (Fig. 7).
The mode m_T1 has no mode switches, i.e., it is invoked repeatedly. Depending upon the mode switches in the other two
modes there are three possible scenarios at the end of each mode period:
• switch possibility 1 — switch conditions for both themodes m_T1_PI and m_T1_PI_F are false: there is nomode switch;
the three modes remain active.
• switch possibility 2 — switch condition for the mode m_T1_PI_F is true but switch condition for the mode m_T1_PI is
false: the mode m_T1_PI_F switches to the mode m_T1_PI_S, while the modes m_T1_PI and m_T1 remain active.
• switch possibility 3 — switch condition for the mode m_T1_PI is true: the mode m_T1_PI switches to themode m_T1_P,
the mode m_T1 remains active, and themode m_T1_PI_F is terminated; the switch condition for themode m_T1_PI_F
does not matter in the transition.
2.5. Distribution in HTL
HTL modules can be distributed over several hosts. Distribution is specified through a mapping of top-level modules
to hosts. All refinements of all modes in a top-level module are bound to the same host to which the module is mapped.
The distribution is implemented by replicating shared communicators on all hosts, and then have the tasks that write to
the shared communicators broadcast the outputs. For this purpose, the LET model is extended to include both the WCETs
and the worst-case output transmission times (WCTTs). The semantics (i.e., the real time behavior) of an HTL program is
independent of the number of hosts; however code generation, and program analysis take the distribution into account. For
the three-tank controller, the three modules T1, T2, and IO are implemented on three different hosts or E Machines. The
address of the host machine is included in the module description; e.g., module T1 is implemented on a host with address
192.168.1.1.
3. The Embedded machine
The Embeddedmachine or Emachine controls the release of tasks, and the timewhen variable values are exchanged (i.e.,
copied or initialized). The variables are accessed through drivers [10]. A task or a driver is implemented in any other language,
A. Ghosal et al. / Science of Computer Programming 77 (2012) 96–112 103
Fig. 8. Structure of the hierarchy-extended E machine.
e.g., C. The E machine maintains a queue of triggers to order the updating of variables, and the release of tasks. A trigger
remembers the action (exchange of variable values or release of tasks) to be taken at an event. An event is either progress
in time (time event) or completion of a task (completion event) or a combination of both. When the event (associated with
a trigger) occurs (i.e., trigger is enabled), the action (associated with the trigger) is executed.
The original E machine [10] has been extended to support the hierarchical structure of HTL at runtime; the modified
E machine is referred to as the hierarchy-extended E machine. The code for the hierarchy-extended E machine is referred
to as Hierarchical E code (HE code). Fig. 8 depicts the schematic structure of the hierarchy-extended E machine. The
hierarchy-extended E machine maintains three FIFO queues of triggers. If multiple triggers in a queue are enabled at an
event, then corresponding actions are executed in FIFO order. While one FIFO queue orders the actions of simultaneously
ordered triggers, the parallel FIFO queues provide second ordering on simultaneously enabled triggers. In the case of code
generated for HTL programs, the multiple queues are used to order the communicator updates, the mode switch checks,
the communicator reads, and the task releases. Each trigger also maintains a parent trigger, and a set of children triggers to
remember the position in the hierarchy. The hierarchy-extended E machine has two LIFO stacks, and four registers to track
hierarchy. One stack remembers the position of the code being executed in the hierarchy of the whole program, and the
other is used to add parent and children information to newly created triggers. The registers store references to triggers.
An HE program is a sequence of HE code instructions possibly generated by compiling high-level descriptions like HTL
programs. The execution of an HE program generates a set of traces. A trace is a (possibly infinite) sequence of configurations
where each configuration is a tuple (state,writeQ , switchQ , readQ , tasks, PC, R0, R1, R2, R3, parentStack, addressStack) —
state is the state of the variables (communicators and ports for HTL program); writeQ , switchQ , and readQ are FIFO queues
of triggers; tasks is a set of released tasks; PC is a program counter tracking the address of the current instruction being
executed; R0, R1, R2, and R3 are registers that store trigger names; parentStack is a stack of trigger names; and addressStack
is a stack of addresses. For any two consecutive configurations ui−1, ui where i > 0, ui is the result of the time event,
completion event, or execution of an HE code instruction at configuration ui−1. The execution starts with a default value of
each variable, empty trigger queues, empty task sets, empty registers, and empty stacks.
A trigger g is a tuple (e, a, par, clist), where e is an event, a is an address, par is a trigger name, and clist is a list of trigger
names. An event is a pair (n, cmps), where n ∈ N≥0 and cmps is a set of task names. The positive integer n denotes the
number of time tick events being waited for. The set cmps denotes the tasks whose completion event is being waited for.
A trigger is enabled when n = 0 and cmps = ∅. When a trigger is created, it is assigned a unique name until the trigger is
removed. A trigger name is the reference to a trigger; a trigger g can be accessed through the respective name name(g). A
trigger cannot be duplicated; however it can be modified when events occur. A trigger is modified if the associated event,
parent, or children information is changed. A trigger is associated with at most one queue. The registers store trigger names.
A register can be copied and/or resetwithout affecting the trigger referenced by the name stored in the register. For a register
Rx (where x ∈ 0, 1, 2, 3), Rx.par (resp. Rx.clist) denote the parent (resp. the children) of the trigger referred by the name
in Rx.
104 A. Ghosal et al. / Science of Computer Programming 77 (2012) 96–112
Fig. 9. States of the hierarchy-extended E machine.
The address stack tracks the hierarchical position; e.g., in the case of code generated from an HTL program, the stack
remembers the program, the mode, and the module for which code is being executed. The parent stack remembers the
hierarchy of the triggers enabling switch actions betweenmodes. There are twooperations to access the stacks: push and pop.
The operation push(addressStack, a) inserts the address a on the stack addressStack, while the operation pop(addressStack)
returns the last value inserted in the stack addressStack. The operation push(parentStack, Rx) inserts the trigger name stored
in the register Rx (where x ∈ 0, 1, 2, 3) on the stack parentStack; the operation pop(parentStack) returns the last value stored
in the stack parentStack.
The hierarchy-extended E machine can be modeled as a finite state automata. Fig. 9 illustrates the different possible
states of the hierarchy-extended E machine. The hierarchy-extended E machine starts by interpreting HE code at address
zero (interpreting state). When there is no HE code instruction to be interpreted (e.g., a return instruction has been executed,
and there is no address in the stack addressStack), the hierarchy-extended Emachine goes into thewriting state; in this state,
the trigger queue writeQ is checked for active triggers. If an active trigger is found, then the hierarchy-extended E machine
returns to the interpreting state, and starts interpreting HE code at the address associated with the active trigger found. If
no active trigger is found in the queue writeQ, then the hierarchy-extended E machine goes into the switching state; in this
state, the trigger queue switchQ is checked for active triggers. If an active trigger is found in switchQ, then the E machine
goes into the interpreting state, and starts interpreting HE code at the address associated with the active trigger found;
otherwise the E machine goes into the post-switching state. In the post-switching state, the trigger queue readQ is checked
for active triggers. If an active trigger is found in the queue readQ, then the E machine goes into the interpreting state, and
starts interpreting the HE code at the address associated with the active trigger found; otherwise the E machine moves
into the waiting state. The E machine stays in the waiting state until an event occurs, when the virtual machine switches to
the updating state. In the updating state, triggers in each of the three queues are updated. After all the triggers have been
updated, the E machine switches to the writing state and starts checking for active triggers; the entire process presented
above repeats.
When the hierarchy-extended E machine checks for active triggers in any of the queues (e.g., writeQ, switchQ, or readQ ),
the queue being checked is traversed in FIFO order. When an active trigger (·, a, ·, ·) is found, the program counter PC is set
to the address a, the name of the trigger is stored in register R0, and the trigger is removed from the queue in which it was
found. The hierarchy-extended E machine starts executing instructions at addresses a until a return instruction is executed.
When a return instruction is executed, the trigger (which triggered the code execution) is deleted from the system, and code
execution starts from the address popped from the address stack. The HE code interpretation continues until the address
stack is empty. When the address stack is empty, the hierarchy-extended E machine starts searching again for other active
triggers in the three queues in the following order: writeQ, switchQ, and readQ. When the hierarchy-extended E machine is
in the updating state as a result of a time event (or a completion event), the event information of the triggers in any of the
three queues are updated. For a time event: each trigger ((n, ·), ·, ·, ·), where n > 0, is updated to ((n − 1, ·), ·, ·, ·). For
a completion event of a task t: each trigger ((·, cmps), ·, ·, ·) is updated to ((·, cmps \ {t}), ·, ·, ·) if t ∈ cmps; the task t is
removed from the task set.
A. Ghosal et al. / Science of Computer Programming 77 (2012) 96–112 105
Table 1
HE code syntax and semantics
call(d) state → state′b
release(t) tasks′ = tasks ∪ {t}
writeFuture(e, a) writeQ ′ = writeQ ◦ g ′ , R1′ = name(g ′)a
switchFuture(e, a) switchQ ′ = switchQ ◦ g ′ , R1′ = name(g ′)a
readFuture(e, a) readQ ′ = readQ ◦ g ′ , R1′ = name(g ′)a
jumpIf (cnd, a) if predicate cnd is true, then PC ′ = a′ , else PC ′ = next(a)
jumpAbsolute(a′) PC ′ = a′
jumpSubroutine(a′) addressStack′ = push(addressStack, next(a)); PC ′ = a′
copyRegister(Rx, Ry) Ry = Rx
pushRegister(Rx) parentStack′ = push(parentStack, Rx)
popRegister(Rx) Rx′ = pop(parentStack)
getParent(Rx, Ry) Ry = Rx.par
setParent(Rx, Ry) Rx.par = Ry
copyChildren(Rx, Ry) Rx.clist = Ry.clist
updateChildren(Rx, Ry) set the trigger name in Ry as the parent of all the triggers in Rx.clist
deleteChildren(Rx) for all trigger names in Rx.clist: (recursively) delete the triggers pointed to by the children
list and remove the triggers from the queue
replaceChild(Rx, Ry, Rz) in Rx.clist , replace the trigger name identical to that in Ry by the name in Rz
cleanChildren(Rx) Rx.clist = ∅
return() PC ′ = pop(addressStack)
a g ′ = (e, a,⊥,∅)
b driver d is executed; x, y, z ∈ {0, 1, 2, 3}.
The effect of executing anHE code instruction is presented in Table 1. Let the configuration before executing an instruction
be (state,writeQ , switchQ , readQ , tasks, PC, R0, R1, R2, R3, parentStack, addressStack). After the instruction has been exe-
cuted, the new configuration be (state′,writeQ ′, switchQ ′, readQ ′, tasks′, PC ′, R0′, R1′, R2′, R3′, parentStack′, addressStack′).
Each instruction is associated with an address, denoting the location of the corresponding instruction. The instruction
at the address a is denoted as ins(a), and the next address following a is denoted as next(a). When ins(a) is being exe-
cuted, PC ′ = next(a) unless otherwise mentioned. A parameter has the same value over the execution unless otherwise
mentioned.
Once a trigger is handled and removed from the queue, the trigger is deleted from the systemwhen the code block (started
by the trigger) ends. For general HE code program, a garbage collectormay be necessary to properly remove all de-referenced
triggers, and to ensure that there is no dangling reference fault (e.g., trigger name is being used but the trigger itself has been
deleted). Code generated from an HTL program does not create any such problem; so we avoid the definition of a formal
garbage collector. All of the above instructions except deleteChildren can be executed in constant time. The execution of
deleteChildren requires time linear in the size of the original HTL description of the involved children.
4. Handling HTL in HE code
An HTL program specifies interactions between tasks, (sequential, parallel, and conditional) composition of tasks, and
refinement of tasks. In order to fully support any HTL program, HE code has to provide means for handling all of the
HTL constructs mentioned above. Communication between tasks and communicators (and ports) is supported through
the call instruction. For any communication between a task and a variable (ports or communicators), a driver function
is implemented in a regular programming language (e.g., C, Java, etc.); at runtime the driver is invoked through the
call instruction. The sequential composition of tasks (mode) is supported through inter-task communication over ports
and communicators. The conditional composition of tasks (modes) is supported through the jumpIf instruction. The HTL
compiler generates a jumpIf (cnd, a) instruction for each mode switch in a mode — the condition cnd is the mode switch
condition, and the address a denotes the destination mode address. Each mode switch condition is verified at the end of
the mode period — if a switch is found to be true, then the HE code execution jumps to the destination mode. The parallel
composition of tasks (module) is supported through the use of triggers, i.e., by using instructions writeFuture, switchFuture,
and readFuture. For any mode in a module, the HTL compiler generates writeFuture instructions (one for each variable
update), readFuture instructions (one for each variable read), and a single switchFuture instruction (for the set of mode
switches). The triggers ensure that the modes are executed periodically, and that the active modes in different modules
are executed in parallel. Refer to Section 5 for the detailed code generation scheme.
Handling refinement in HE code is by far the most difficult challenge. There are two major concerns that need to
be addressed: tracking the current position in the hierarchy (i.e., which program, module, or mode is being executed),
and maintaining the hierarchical relation between the modes. The first is done by subroutine-like calls to initialize and
execute programs, modules, and modes; refer to Section 5 for details. Intuitively, the address stack stores the addresses
of programs, modules, and modes in a tree like fashion so that E machine knows which program, module, or mode is to
be initialized/executed once the current one has been initialized/executed. Maintaining the hierarchical relation is more
involved, and is done through triggers and HE code instructions. For HTL programs, the compiler generates triggers as
follows: all triggers associated with writing communicators are stored in the write queue, all triggers associated with mode
106 A. Ghosal et al. / Science of Computer Programming 77 (2012) 96–112
a b c d e f
Fig. 10. Handling switch checks in HE code.
switch checks are stored in the switch queue, and all triggers associated with reading communicators (and subsequently
releasing tasks) are stored in the read queue. The writing of communicators in a module, reading of communicators in a
mode, and releasing of tasks in a mode are independent of other modes, modules, and programs. The above holds if the
HTL program is race free (ensured by structural checks), and if all communicators are written before they are read (ensured
by handling triggers in the write queue before that of the switch and the read queue). However checking switches (and
subsequent actions) in a mode depends on other modes. For code generated from HTL, triggers in the write and the read
queue have no parent and children information; in other words they do not carry any hierarchy information. Only triggers
in the switch queue have hierarchy information.
In HTL, switches for a parent mode and its children modes are enabled simultaneously due to constraints on timing
behavior. The HTL semantics prioritizes the mode switch check (and subsequent action) of the parent mode over those of
the children. Section 2.4 discusses a specific instance of mode switch in the HTL program for the three-tank control system.
The switching action of HTL is reflected in the HE code as follows. The compiler generates code in such a way that there is
exactly one trigger per mode in the switch queue, i.e., the implicit tree in the switch queue is the hierarchy of the modes in
the program. When a trigger in the switch queue is enabled, the corresponding mode switch is checked. If the mode switch
is false then the mode is reinvoked; otherwise all triggers (in the switch queue) related to the modes in the refinement
program of the mode are removed, and the target mode is invoked.
Wewill illustrate the handling of the scenarios shown in Fig. 7. Fig. 10.a shows the associated triggers in the switch queue;
instead of the queue, the implicit tree structure has been shown. Consider the third scenario (of Fig. 7)when switch condition
for the mode m_T1_PI is true. The triggers in the switch queue from the refinement program of the mode m_T1_PI are
removed (Fig. 10.b). A new trigger for the target mode m_T1_P is generated (Fig. 10.c), the parent information is transferred
to the new trigger (Fig. 10.d), and the trigger for the mode m_T1_PI is removed. The trigger for the mode m_T1_PI_F is
removed without even checking whether the switch condition is true or false. In the other scenarios, the switch condition
of the mode m_T1_PI is false, i.e., the mode will be reinvoked. A new trigger is created for mode m_T1_PI in the switch
queue (Fig. 10.e) with no parent and children information. The parent and children information of the old trigger for the
mode m_T1_PI is redirected to the new trigger for themode m_T1_PI (Fig. 10.f), and the old trigger for themode m_T1_PI
is removed from the switch queue. Next, the E machine traverses the queue to check switches for the mode m_T1_PI_F.
5. HTL compiler
In general, the compilation (Fig. 11) of any HTL program involves specific analyses of the program followed by code
generation. The analyses ensure that the input program satisfies the constraints on parallel composition of modules and
refinement of modes [8], and that the program tasks are schedulable relative to the target platform. The WCET/WCTT
information for tasks is provided by an external tool. If the constraints are satisfied, then the code generator generates code
for a distributed implementation. The code generation is done by compiling the whole program for each host, where each
host maintains a copy of each communicator and port; however, tasks are executed on the host only if the corresponding
mode (in which the task is invoked) is mapped onto that host. Whenever a task completes execution, the output is
broadcasted to all hosts and stored in local ports; when a communicator (on a host) is to be written, the value of the local
port is copied to the communicator. The released tasks are dispatched for execution by an EDF scheduler; the scheduler is
external to the E machine.
The hierarchy-preserving compiler generates code for programs, modules, and modes by invoking Algorithm 1a,
Algorithm 1b, and Algorithm 2a, respectively. The compiler uses symbolic addresses to refer to different parts of the
code. For each program P, programInit[P] (resp. programStart[P]) denotes the address of the HE code block that initializes
(resp. executes) the program P. For each module M,moduleInit[M] (resp.moduleStart[M]) denotes the address of the HE code
block that initializes (resp. executes) the module M. For each mode m, modeStart[m] is the address of the HE code block that
starts the mode m, and targetMode[m] is the address of the HE code block that is executed when another mode switches
to the mode m. Each mode m is divided in uniform units corresponding to the smallest period between two time events
(i.e., a communicator read or a communicator update) in the mode m. Given a mode m, the duration of a unit γ [m] is the
greatest common divisor of all access periods of all communicators read or written in the mode m; the total number of units
is π [m]/γ [m], where π [m] is the period of the mode m. For each unit i of every mode m, the compiler generates separate code
blocks for updating communicators, checking switches (and related actions), and reading communicators (and releasing
tasks): the address of the HE code block that writes communicators is unitWrite[m, i], the address of the HE code block
that checks switch conditions is unitSwitch[m, i], and the address of the HE code block that reads the communicators is
A. Ghosal et al. / Science of Computer Programming 77 (2012) 96–112 107
Fig. 11. Structure of the compiler and runtime system.
unitRead[m, i]. HTL semantics constraints that at any instance, communicator writes, mode switch checks, communicator
reads, and task releases should be done in the above order to maintain consistency of communicator values across all
modules. The address of the HE code block that sets the execution order of communicator writes, switch checks, and
communicator reads (and task releases) is modeBody[m]. Instructions may forward reference to any of the above symbolic
addresses, which may require fix up during compilation.
Algorithm 1 Code generation for programs and modules
(a) GenECodeForProgramOnHost(P, h) (b) GenECodeForModuleOnHost(M, h)
set programInit[P] to PC and fix up setmoduleInit[M] to PC and fix up
// initialize communicators // initialize task port
∀c ∈ communicators(P):emit(call(init(c))) ∀p ∈ taskPorts(M):emit(call(init(p)))
// initialize all the modules in P // return from initialization subroutine of M
∀M ∈ modules(P):emit(jumpSubroutine(moduleInit[M])) emit(return)
// return from initialization subroutine of P setmoduleStart[M] to PC and fix up
emit(return) //start the start mode of M
set programStart[P] to PC and fix up emit(jumpSubroutine(modeStart[start[M]]))
// start all the modules in P // return from start subroutine of M
∀M ∈ modules(P):emit(jumpSubroutine(moduleStart[M])) emit(return)
// return from start subroutine of P
emit(return)
Algorithm 1a generates code for a program P on a host h. The code at the address programInit[P] initializes all
the communicators declared in the program P by calling the corresponding initialization drivers (init(·) denotes the
initialization driver for a communicator or a port); this is followed by calls to the initialization subroutines of each module
in the program P. The HE code at the address programStart[P] calls the start subroutine of each module M in the program P.
Algorithm 1b generates code for a module M on host h. The code at the address moduleInit[M] initializes all task ports
(denoted by taskPorts(M)) of the tasks in the module M by calling respective initialization drivers. All tasks maintain two
sets of local ports, called task input ports and task output ports, which are not accessible by other tasks. Before a task is
released, values of the communicators and the ports (that are read by the released task) are copied into the task input ports;
the values stored in the task input ports are used by the task during execution. At completion, the task output ports are
updated; the communicators and the ports are written from the task output ports when the writing is due. The code at the
addressmoduleStart[M] calls the start subroutine of the start mode, start[M], of the module M.
Algorithm 2a generates code for a mode m on host h. We will use the following auxiliary operators for Algorithm 2a. The
set readDrivers(m, i) contains the drivers that load the tasks in themode mwith values of the communicators that are read
by these tasks at the unit i. The set writeDrivers(m, i) contains the drivers that load the communicators with the output
of the tasks in the mode m that write to these communicators at the unit i. The set portDrivers(t) contains the drivers
that load the task input ports of the task t with the values of the ports read by the task t . The set complete(t) contains the
events that signal the completion of the tasks on which the task t depends, and that signal the read time of the task t . The
108 A. Ghosal et al. / Science of Computer Programming 77 (2012) 96–112
set releasedTasks(m, i) contains the tasks (invoked in the mode m) with no precedences, that are released at the unit i.
The set precedenceTasks(m) contains the tasks in the mode m that depend on other tasks.
Algorithm 2 Code generation for modes
(a) GenECodeForModeOnHost(m, h) (b) GenECodeToStartModeOnHost(m, h)
0 GenECodeToStartModeOnHost(m, h); 0 setmodeStart[m] to PC and fix up
1 i := 0 1 // check the mode switches
2 while i < π[m]/γ [m] do 2 ∀(cnd, m′) ∈ switches(m):
3 set unitWrite[m, i] to PC and fix up 3 emit(jumpIf (cnd, targetMode[m′]))
4 // write communicators with the values of task output ports 4 set targetMode[m] to PC and fix up
5 ∀d ∈ writeDrivers(m, i):emit(call(d)) 5 emit(jumpSubroutine(modeBody[m]))
6 // wait for other triggers to become enabled 6 if (program P refines m)
7 emit(return) 7 //increment the level
8 if (i = 0) 8 emit(getParent(R0, R3))
9 GenECodeToTestSwitchesForModeOnHost(m, h) 9 emit(pushRegister(R3))
10 GenECodeToPreserveHierarchyForModeOnHost(m, h) 10 emit(setParent(R0, R2))
11 emit(return) 11 emit(cleanChildren(R0))
12 end if 12 emit(jumpSubroutine(programStart[program[m]]))
13 GenECodeToReleaseTasksForModeOnHost(m, h, i) 13 //decrement the level
14 if(i < π[m]/γ [m] − 1) 14 emit(popRegister(R3))
15 // jump to the next unit of mode m 15 emit(setParent(R0, R3))
16 emit(writeFuture(γ [m], unitWrite[m, i+ 1])) 16 emit(cleanChildren(R0))
17 emit(readFuture(γ [m], unitRead[m, i+ 1])) 17 end if
18 end if 18 // return from start subroutine of m
19 // wait for other triggers to become enabled 19 // OR wait for other triggers to become enabled
20 // OR return from body subroutine of m 20 emit(return)
21 emit(return)
22 i := i+ 1
23 end while
Algorithm 2a starts by invoking Algorithm 2b, which emits HE code to start a mode. Algorithm 2b emits code (at address
modeStart[m]) for checking all the mode switches (lines 1–3) in a mode m to ensure that the switches are checked before the
mode m is invoked for the first time. Next, code is generated (at address targetMode[m]) to handle the case when no switch
is enabled: a call to code atmodeBody[m], followed by a call to the refinement program (if any). This sets the execution of a
mode before the execution of the refinement program. Code emission at lines 6–17 checks whether a refinement program
exists, and subsequently updates the hierarchy information if there is one. Before the code for the refinement program is
invoked (line 12), the hierarchy is updated (lines 7–11), as refinement adds one level of hierarchy; once the invocation
of the refinement program completes, the level is restored (lines 13–16). The hierarchy is updated by pushing the parent
of the trigger that is referred by the trigger name in the register R0 onto the stack (lines 8 - 9). The parent of the trigger
pointed to by the register R0 is then changed to the trigger pointed to by the register R2 (which contains a pointer to the
last trigger added to the switch queue), and the children list is reset (code for the refinement program has yet to be invoked,
thus there is no children information). In effect, for the execution of the refinement program, parent of the trigger pointed
to by the register R0 points to the parent trigger of all the triggers that will be added in the switch queue for that program.
To restore the hierarchy level, the parent of the trigger pointed to by the register R0 is updated by popping the parent
stack.
After the call to Algorithm 2b returns, Algorithm 2a generates HE code for each mode unit i. Lines 4–7 generate E code
that calls the driver for each communicator being written at the unit i. Next, Algorithm 3a is invoked for the mode unit zero
to generate code that tests the mode switch conditions, and takes necessary action when a switch is enabled. In HTL, modes
can switch only at period boundaries; thus the switches are checked only for unit zero. If no mode switch occurs (line 4),
then the code jumps to addressmodeBody[m]. If a mode switch occurs, then all the children of the last enabled trigger in the
switch queue (its name is stored in register R0) are removed from the switch queue (lines 5–8). The removal of the children
is recursive, thus all the children of the subsequent children are also removed. Once the children are removed, the code
execution jumps (lines 9–10) to the target address targetMode[m′] of the destination mode m′.
After code for mode switches has been generated, Algorithm 3b is invoked to sequence the execution order of
communicator writes, switch checks, and communicator reads (and subsequent task release), for the unit zero of mode
m. This is done by emitting a future instruction (line 1) for address unitWrite[m, 0] (trigger will be added to queuewriteQ ), a
future instruction (line 2) for address unitSwitch[m, 0] (trigger will be added to queue switchQ ), and a future instruction (line
9) for address unitRead[m, 0] (trigger will be added to queue readQ ). Whenever a trigger is created and added to a queue, the
relevant trigger pointer is stored in register R1. Once a trigger is added in the switch queue, the hierarchy information has to
be updated (lines 3–8). Depending on how the code is reached, there are two scenarios: first, the code is invoked by handling
an enabled trigger in the switch queue, i.e., a mode switch has occurred or a mode is being reinvoked, and second, the code
A. Ghosal et al. / Science of Computer Programming 77 (2012) 96–112 109
Algorithm 3 Code generation for mode switches and hierarchy preservation
(a) GenECodeToTestSwitchesForModeOnHost(m, h) (b) GenECodeToPreserveHierarchyForModeOnHost(m, h)
0 set unitSwitch[m, 0] to PC and fix up 0 setmodeBody[m] to PC and fix up
1 // check the mode switches 1 emit(writeFuture(π [m], unitWrite[m, 0]))
2 ∀(cnd, m′) ∈ switches(m): 2 emit(switchFuture(π [m], unitSwitch[m, 0]))
3 emit(jumpIf (cnd, PC + 2)) 3 emit(getParent(R0, R3))
4 emit(jumpAbsolute(PC + 4)) 4 emit(replaceChild(R3, R0, R1))
5 // cancel all triggers related to the refining 5 emit(updateChildren(R0, R1))
6 // program of m, and its subprograms 6 emit(setParent(R1, R3))
7 emit(deleteChildren(R0)) 7 emit(copyChildren(R1, R0))
8 emit(cleanChildren(R0)) 8 emit(copyRegister(R1, R2))
9 // switch to mode m’ 9 emit(readFuture(0, unitRead[m, 0]))
10 emit(jumpAbsolute(targetMode[m′]))
is invoked when a mode is executed for the first time. In both scenarios, the register R0 records the relevant hierarchy
information. In the first scenario, it stores the name of the last trigger in the switch queue that was handled; by semantics, if
any trigger is handled, then the name is stored in the register R0. In the second scenario, it stores the name of the last trigger
in the switch queue that was created. The code at lines 3–7 sets the parent of the trigger pointed to by the register R0 as a
parent for the trigger pointed to by the register R1, and copies the children list from the trigger pointed to by the register R0
to the trigger pointed to by the register R1. A new trigger for the read queue may remove the information of the last trigger
added to the switch queue from the register R1; so a copy of the register R1 is stored in the register R2 (line 8).
Once the call to Algorithm 3b returns, Algorithm 4 is invoked. Algorithm 4 generates code (at address unitRead[m, i]) that
reads (lines 2–3) all communicators (by calling drivers that copy from communicators into the task input ports) that are to
be read at unit i, and releases all tasks (with no precedences) that should be released at the unit i. For the unit zero (line 6),
code is generated to release precedence tasks (lines 7–15). For each task t with precedences, a trigger is added to the queue
readQ : the trigger is activated at the completion of preceding tasks of t , and the subsequent code writes the input ports of
the task t , and releases the task t .
The last stage in code generation for a unit of mode m consists of emitting code to jump from one unit to the next (lines
14–18, Algorithm 2a). The generated code adds triggers to the write queue and the read queue only; no triggers are added
to the switch queue as switches are not possible in other units except unit zero. For the unit zero, the future instructions
that ensure continuity of execution are generated by Algorithm 3b.
Algorithm 4 GenECodeToReleaseTasksForModeOnHost(m, h, i)
0 set unitRead[m, i] to PC and fix up
1 if (mode m is contained in a module on host h)
2 // read communicators into task input ports
3 ∀d ∈ readDrivers(m, i):emit(call(d))
4 // release tasks with no precedences
5 ∀t ∈ releasedTasks(m, i):emit(release(t))
6 if (i = 0)
7 // release tasks with precedences
8 ∀t ∈ precedenceTasks(m):
9 // wait for tasks on which t depends to complete
10 emit(readFuture(complete(t), PC + 2))
11 emit(jumpAbsolute(PC + 3+ |portDrivers(t)|))
12 // read ports of tasks on which t depends,
13 // then release t
14 ∀d ∈ portDrivers(t):emit(call(d))
15 emit(release(t))
16 // wait for other triggers to become enabled
17 emit(return)
18 end if
19 end if
The code generation algorithm for programs/modules/modes accesses other programs, modules, or modes through
symbolic addresses, and does not influence the code generation of other programs, modules, and modes. Thus parts of HTL
programs can be compiled in any order separately.
110 A. Ghosal et al. / Science of Computer Programming 77 (2012) 96–112
Table 2
Compiler complexity,where: p ∈ N>0 is the number of programs in theHTLdescription,
m ∈ N≥p is the number of modules in the HTL description, n ∈ N>0 is the number of
modes per module.
Compiler Code size Instructions/instant Triggers in queue(s)
Flattening O(nm−p) O(nm−p) O(m− p)
Hierarchy-preserving O(mn2) O(mn) O(m)
6. Complexity analysis
The complexity of the twoHTL compilers, flattening and hierarchy-preserving, are analyzed in terms of efficiency of code
generation and runtime overhead. The efficiency of code generation ismeasured by the number of instructions generated (by
a compiler) for a givenHTL program. The runtime overhead is the average time spent at each instant in executing instructions
and searching for enabled triggers; the overhead is measured as the number of instructions that have to be interpreted per
instant, and the number of triggers in the trigger queues.
Given an HTL description with p ∈ N>0 programs (i.e., one top-level program and p− 1 refinement programs),m ∈ N≥p
modules, and n ∈ N>0 modes per module, Table 2 shows the code generation efficiency, and the runtime overhead of the
flattening HTL compiler and the hierarchy-preserving HTL compiler. The bounds presented represent both upper and lower
bounds for the hierarchy-preserving HTL compiler. For the flattening HTL compiler, the bounds represent both upper and
lower bounds for a constrained set of HTL descriptions as discussed below; for all other HTL descriptions, the bounds are
only upper bounds. The upper and lower bounds for the generated code size and for the number of instructions that have
to be interpreted per instant are identical when all but one HTL program (consider it to be program P) contain one module
each, and the program P (which should also be a direct child of the root program) contains the remaining modules; the
number of remaining modules is m − p + 1 (accounting for at least one module for each program). The upper and lower
bounds for the number of triggers in trigger queues are identical when all the HTL programs except the root HTL program
contain one module, and the root HTL program containsm− p+ 1 modules.
The focus of the analysis is to understand the handling of hierarchy by the two compilation processes. The code generation
for handling communicators, ports, and task invocations is the same for both the compilers, thus the non-hierarchy
description is kept to a minimum as follows: the root program contains one communicator declaration; non-root programs
do not contain communicator declarations; and modes do not invoke any task. Any mode m in a module can switch to any
other modem′ in the module.
The generated E code size increases exponentially with the number of modules for the flattening compiler, while the
size of generated HE code is linear bounded for the hierarchy-preserving compiler. The exponential explosion in the case of
the flattening compiler is caused by the flattening process. In order to merge the refining program into the refined mode,
the flattening accounts for all possible combinations between the modes in parallel modules from the refining program.
The combination of modes generates an exponential number of new modes and mode switches, which in the end leads to
an exponential growth of the number of instructions. On the other hand, the number of generated instructions decreases
exponentially with the number of programs; this is due to the fact that as the number of programs increases for a constant
number of modules, the number of parallel modules in the refinement decreases, which implies fewer merging modules,
and thus less modes. This makes the bound O(nm−p). The hierarchy-preserving compiler generates the same number of
instructions for each mode independent of its position in the hierarchy. Thus, for n modes per module (where each mode
can switch to any of the other n− 1 modes), the number of generated instructions is O(mn2).
The number of instructions that have to be executed per instant is given by the number of mode switches that have to
be checked in the worst case. Since the number of mode switches in a mode for a flattened program is exponential in terms
of the number of parallel modules in the refinement, the number of instructions that have to be executed per instant is
also exponential in terms of the number of parallel modules in the refinement. For the hierarchy-preserving compiler, the
E machine checks (in worst case) all modes switches for the active mode in each of the modules, which means that in the
worst case the number of instructions is bound by O(nm) for each instant.
The third column in Table 2 compares the number of triggers in trigger queues. In the case of the flattening compiler,
there will be one trigger in the trigger queue for each parallel module in the root program of an HTL description; i.e., in the
worst case, there will bem−p triggers in the trigger queue. For the hierarchy-preserving compiler, there will be one trigger
in each of the three trigger queues (i.e., write queue, switch queue, and read queue) for eachmodule. Thus, in the worst case
when all modules are executed, the number of triggers in the queues is O(m) for the hierarchy-preserving compiler.
7. Experimental analysis
This section compares the efficiency of the flattening compiler and the hierarchy-preserving compiler through
experiments. In order to compare the runtime overhead, the time spent in interpreting E code (generated by flattening
A. Ghosal et al. / Science of Computer Programming 77 (2012) 96–112 111
1
2
3
4
5
6
7
1
2
3
4
5
6
7
number of programs
number of modules
ge
ne
ra
te
d 
E 
co
de
 s
iz
e 
0
1000
2000
3000
4000
5000
Fig. 12. Number of E code instructions generated by the flattening HTL compiler for an HTL description with 1 ≤ p ≤ 7 programs, 1 ≤ m ≤ 7 modules,
and n = 2 modes per module.
1
2
3
4
5
6
7
1
2
3
4
5
6
7
0
1000
2000
3000
4000
5000
number of programs
number of modules
ge
ne
ra
te
d 
HE
 c
od
e 
si
ze
Fig. 13. Number of HE code instructions generated by the hierarchy-preserving HTL compiler for an HTL description with 1 ≤ p ≤ 7 programs, 1 ≤ m ≤ 7
modules, and n = 2 modes per module.
compiler), and HE code (generated by the hierarchy-preserving compiler) for the three-tank controller has been measured;
the delay introduced by code interpretation is below 1% for both E code and HE code. This result proves that for relatively
simple HTL descriptions (i.e., with no parallel modules in the refinement), the runtime overhead introduced by interpreting
HE code is not significantly higher than the runtime overhead introduced by interpreting E code.
The code size is compared for HTL descriptions with p programs (i.e., one top-level program and p − 1 refinement
programs) and m modules (p ≤ m), where each module has n = 2 modes switching between themselves. For each such
scenario there are a number of possible HTL descriptions. All the HTL descriptions for which 1 ≤ p ≤ 7 and 1 ≤ m ≤ 7 have
been automatically generated, and compiled with both HTL compilers. For each scenario the highest generated code size
for each compiler has been recorded. Fig. 12 shows the plot of the largest generated code size for the flattening compiler;
Fig. 13 shows the plot of the largest generated code size for the hierarchy-preserving compiler. In the worst case, the size
of the generated E code (4364 instructions) is one order of magnitude larger than the size of the generated HE code (387
instructions). The outcome of the experiments also support the results discussed in the previous section.
112 A. Ghosal et al. / Science of Computer Programming 77 (2012) 96–112
8. Conclusion
We have designed and implemented a new HTL compiler, which preserves the hierarchical structure of an HTL program
at compile time. We have shown that the hierarchy-preserving compiler generates code that is linear in size in the number
of modules, and introduces runtime overhead that is linear in the number of modules. The results (both analytical and
experimental) show that the previously proposed flattening compiler [8] can generate exponentially many instructions in
terms of number of modules, and can introduce exponential overhead in terms of number of modules.
The HTL implementation discussed in this paper has been developed originally for Unix-based platforms; the
implementation was also used for the three-tank controller discussed in the paper. Recently the implementation has
been ported to Java, and a microcontroller-based hardware platform. The HTL implementation for Java was developed
on top of the Exotasks platform [21]. The implementation has been tested by developing the low-level controller for
JAviator [22], a quad-rotor helicopter. TheHTL implementation for Java provides a 300µs accuracy [21]. Themicrocontroller-
based HTL implementation [23] has also been tested by implementing the low-level controller for JAviator. Although, the
microcontroller platform is very limited with respect to the available resources (e.g., CPU clock frequency - 16 MHz, flash
memory - 128 KB, EEPROM data memory - 4096 B, SRAM data memory - 4096 B), the timing accuracy provided by this
implementation is below 4 µs. So far the JAviator low-level control applications have been tested only on a software
simulated helicopter. The hierarchy-preserving compiler has also been extended for HTL programs with logical reliability
constraints [9]; the hierarchy-extended E machine has been used to implement a fault-tolerant version of the three-tank
controller [24].
References
[1] A. Sangiovanni-Vincentelli, L. Carloni, F.D. Bernardinis, M. Sgroi, Benefits and challenges for platform-based design, in: Proceedings of the 41st Annual
Design Automation Conference, ACM, San Diego, CA, USA, 2004, pp. 409–414.
[2] AUTOSAR, http://www.autosar.org/.
[3] FlexRay, http://www.flexray.com/.
[4] T.A. Henzinger, B. Horowitz, C.M. Kirsch, Giotto: a time-triggered language for embedded programming, in: Proceedings of the IEEE, vol. 91, IEEE,
2003, pp. 84–99.
[5] G.C. Buttazzo, Hard Real-Time Computing Systems, Kluwer Academic Publisher, 1997.
[6] F. Boussinot, R. de Simone, The ESTEREL language, in: Proceedings of the IEEE, vol. 79, IEEE, 1991, pp. 1293–1304.
[7] N. Halbwachs, P. Caspi, P. Raymond, D. Pilaud, The synchronous data-flow programming language LUSTRE, in: Proceedings of the IEEE, vol. 79, IEEE,
1991.
[8] A. Ghosal, T.A. Henzinger, D. Iercan, C.M. Kirsch, A. Sangiovanni-Vincentelli, A hierarchical coordination language for interacting real-time tasks,
in: Proceedings of the 6th ACM and IEEE International Conference on Embedded Software, ACM, Seoul, Korea, 2006, pp. 132–141.
[9] A. Ghosal, A hierarchical coordination language for reliable real-time tasks, Ph.D. Thesis, University of California, Berkeley, Ph.D. Dissertation, 2008.
[10] T.A. Henzinger, C.M. Kirsch, The EmbeddedMachine: predictable, portable real-time code, ACM Transactions on Programming Languages and Systems
29 (2007) 33–61.
[11] C.M. Kirsch, M.A.A. Sanvido, T.A. Henzinger,W. Pree, A Giotto-based helicopter control system, in: Proceedings of the Second International Conference
on Embedded Software, in: Lecture Notes in Computer Science, vol. 2491, Springer-Verlag, 2002, pp. 46–60.
[12] C.M. Kirsch, M.A.A. Sanvido, T.A. Henzinger, A programmable microkernel for real-time systems, in: Proceedings of the 1st ACM/USENIX International
Conference on Virtual Execution Environments, ACM, Chicago, IL, USA, 2005, pp. 35–45.
[13] E. Farcas, C. Farcas, W. Pree, J. Templ, Transparent distribution of real-time components based on logical execution time, in: Proceedings of the 2005
ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems, ACM, 2005, pp. 31–39.
[14] J. Liu, E.A. Lee, Timed multitasking for real-time embedded software, IEEE Control Systems Magazine 23 (2003) 65–75.
[15] P. Caspi, A. Curic, A. Maignan, C. Sofronis, S. Tripakis, P. Niebert, From Simulink to SCADE/Lustre to TTA: a layered approach for distributed embedded
applications, in: Proceedings of the 2003 ACM SIGPLAN Conference on Language, Compiler, and Tool Support for Embedded Systems, ACM, San Diego,
CA, USA, 2003, pp. 153–162.
[16] V. Bertin, E. Closse, M. Poize, J. Pulou, J. Sifakis, P. Venier, D. Weil, S. Yovine, Taxys = Esterel + Kronos. A tool for verifying real-time properties of
embedded systems, in: Proceedings of the 40th IEEE Conference on Decision and Control, vol. 3, IEEE, Orlando, FL, USA, 2001, pp. 2875–2880.
[17] R. Lublinerman, S. Tripakis, Modular code generation from triggered and timed block diagrams, in: Proceedings of the 2008 IEEE Real-Time and
Embedded Technology and Applications Symposium, IEEE, 2008, pp. 147–158.
[18] R. Lublinerman, C. Szegedy, S. Tripakis, Modular code generation from synchronous block diagrams: modularity vs. code size, in: Proceedings of the
36th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, ACM, Savannah, GA, USA, 2009, pp. 78–89.
[19] D. Iercan, Contributions to the development of real-time programming techniques and technologies, Ph.D. Thesis, University ‘‘Politehnica’’ of
Timisoara, Ph.D. Dissertation, 2008.
[20] HTL Home Page, http://htl.cs.uni-salzburg.at/.
[21] J. Auerbach, D.F. Bacon, D. Iercan, C.M. Kirsch, V.T. Rajan, H. Röck, R. Trummer, Low-latency time-portable real-time programmingwith exotasks, ACM
Transactions on Embedded Computing Systems 8 (2009).
[22] S. Craciunas, C. Kirsch, H. Röck, R. Trummer, The JAviator: a high-payload quadrotor UAV with high-level programming capabilities, in: Proceedings
of AIAA Guidance, Navigation and Control Conference and Exhibit, GNC, AIAA, Honolulu, Hawaii, 2008.
[23] D. Iercan, L. Fedorovici, Porting hierarchical timing language on amicrocontroller based platform, in: Proceedings of the 5th International Symposium
on Applied Computational Intelligence and Informatics, IEEE, Timisoara, Romania, 2009, pp. 283–288.
[24] K. Chatterjee, A. Ghosal, T.A. Henzinger, D. Iercan, C.M. Kirsch, C. Pinello, A. Sangiovanni-Vincentelli, Logical reliability of interacting real-time tasks,
in: Proceedings of International Conference on Design, Automation and Test in Europe, ACM, Munich, Germany, 2008, pp. 909–914.
