WCRT Algebra and Scheduling Interfaces for Esterel-Style Synchronous Multithreading by von Hanxleden, Reinhard et al.
INSTITUT FÜR INFORMATIK
UND PRAKTISCHE MATHEMATIK











Institut für Informatik und Praktische Mathematik der
Christian-Albrechts-Universität zu Kiel
Olshausenstr. 40
D – 24098 Kiel
WCRT Algebra and Scheduling Interfaces










Abstract. The abstractions used in system design typically limit themselves to encapsu-
late and guarantee functionality, not timing. Hence, it is very difficult to transfer results
on timing behavior across layers, e. g., from the application level through the operating
system level to the hardware level. The choice of the model of computation plays a big
role in facilitating this transfer. In the realm of reactive systems, the synchronous model
of computation has some appeal here, as it inherently limits the number of operations per
reaction, and addresses concurrency and preemptive behavior at the language level.
Recently, reactive processing architectures have been proposed as execution platform for
synchronous languages, notably Esterel. Initially, these architectures were driven by the
desire for high performance with low resource usage, including low power consumption.
However, by now they have also demonstrated their benefits in terms of predictability.
Preliminary work on worst case reaction time (WCRT) analysis has been promising—fairly
simple heuristics already achieve an accuracy typically in the 30–40% range. However,
these methods so far lack formal grounding, and do not exploit knowledge about signal
consistency etc. To provide a formal basis for WCRT analysis, we here propose a type-
theoretic, algebraic approach. This approach not only allows to verify the correctness of
WCRT analyses methods, but also opens the door for more exact analyses, as it allows
to capture functionality and timing precisely and to trade off precision against analysis
effort.
This approach is still under development; this report presents first results on suitable
interface types and the proper characterization of instantaneous nodes, delay nodes and
concurrency. As a concrete application, it builds on a multi-threaded Esterel processor,
the Kiel Esterel Processor (KEP).
Key words: Worst Case Reaction Time analysis, Interface Algebra, Synchronous Lan-
guages, Esterel, Multithreading, Reactive Processing, Kiel Esterel Processor
1
Table of Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Synchronicity and Timing Predictability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1 Reactive Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3 A Type-Theoretic Approach to WCRT Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4 Example of Esterel-style Multi-threading, KEP Assembler . . . . . . . . . . . . . . . . . . . . . . . . . 7
5 WCRT Interfaces at Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5.1 Introducing Interface Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5.2 Instantaneous Behavior: Transient Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Weaving Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Weaving Nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Bundling Abstractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Data Dependency and Degrees of Precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.3 Sequential Behavior: Delay Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.4 Concurrent Behavior: Fork and Join . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
A WCRT Interface Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
B Multi-threading Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Multi-threading of Surface Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Multi-threading of Depth Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Putting it Together: Adding Fork and Join . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1. INTRODUCTION
1 Introduction
Reconciling performance and predictability in embedded systems is a challenge that spans all lay-
ers of hard- and software development. The use of abstraction layers, for example distinguishing
the operating system (OS) layer from an application software layer above and a hardware layer
below, is fundamental to computer science. However, as observed by Edwards and Lee [1], these
abstractions typically limit themselves to encapsulate and guarantee functionality, not timing.
Hence, even though there is a significant body of work that addresses predictability at different
abstraction layers, considering for example schedulability, worst case execution times (WCET),
or circuit timing, it is very difficult to transfer results across layers. However, end users are not
interested in results that apply only to one layer—they care about timing guarantees for complete
systems.
The choice of the model of computation—and its model of time—has a profound influence
on how easy or difficult it is to provide timing guarantees across abstraction layers. From the
predictability point of view, a very appealing candidate in the embedded systems domain is the
synchronous model of computation [2]. This model supports deterministic timing in several ways,
as further argued in Section 2. Furthermore, languages built on that model generally have a well-
established formal semantics that allows reasoning about functional as well as timing properties
from the ground up. In this paper, we first give an overview on how the synchronous model
together with a suitable execution platform can provide system-level timing predictability, and we
illustrate this with the case of multi-threaded execution of Esterel-style synchronous programs.
The main contribution of this paper then is the introduction if an algebraic framework for
precisely capturing WCRT characteristics for this execution approach.
2 Synchronicity and Timing Predictability
The synchronous model of computation divides physical time into a sequence of discrete ticks,
or instants. The abstraction is that at each tick, outputs are synchronous with the inputs. In
other words, computations take place instantaneously, interspersed with durations of inactivity
between ticks. Synchronous languages generally do not permit unbounded computations within
a tick. For example, the language Esterel provides a loop construct [3], but each loop iteration
must include at least one tick-delimiting instruction, and the compiler must be able to verify
this. This simplifies the problem of determining the maximal number of instructions per tick,
which leads to the worst case reaction time (WCRT). The situation is quite different for classical
imperative languages such as Java or C, which permit unbounded loops and unbounded recursion,
and thus only language subsets (e. g., with statically bounded loop iterations) are amenable
to WCET analyses. Another helpful characteristic of (strict) synchrony is that the statuses of
signals, which are basic communication means in synchronous programs, evolve monotonically. In
other words, there can be no oscillations between signal presence and absence, thus guaranteeing
convergence after a finite number of computations. This contrasts, for example, with Harel’s
original Statecharts dialect [4], which assumes a weaker (non-strict) form of synchrony in which
computations are also assumed to not consume any time, but signal statuses are allowed to
oscillate and computations within a tick are unbounded. Finally, the synchronous paradigm also
supports concurrent and preemptive control flow, with a deterministic semantics regarding both
functionality and timing characteristics. This again contrasts with classical imperative languages,
which either do not support non-sequential control flow at all (e. g., C relegates this to the OS
level, subject to run-time scheduling decisions), or support it only in a rather haphazard fashion
(e. g., Java threads [5]).
2.1 Reactive Processing
Synchronous programs may be compiled into hardware or software. The traditional software de-
sign flow is to first compile the synchronous program into a classical imperative language, such
as C, and to compile and run the resulting program on a standard micro processor [6]. This
approach preserves the nice semantical properties of synchronous programs at a functional level.
3
3. A TYPE-THEORETIC APPROACH TO WCRT ANALYSIS
The timing properties, however, are only partially preserved with this approach. Computations
are still finite per tick, and the synthesized C-code should not have unbounded loops, for ex-
ample. However, depending on the synthesis approach used, the control flow may still be rather
complex and difficult to analyze (for example, computed gotos). Furthermore, standard proces-
sors typically employ various techniques that improve average execution time, at the expense of
worst case execution time and predictability [7].
An alternative, more recent approach for executing synchronous programs is to run them on
processors that directly support reactive control flow. This reactive processing approach builds on
instruction set architectures (ISAs) that can express concurrency and preemption and preserve
functional determinism [8]. There have been various proposals on how to support concurrency
in reactive processing, including sequentialization [9], parallel execution [10], and, most recently,
multi-threading [11, 12]. The latter one appears to be the most effective at this point, and signif-
icantly outperforms classical software-based execution strategies while using minimal resources.
In the multi-threaded reactive processing approach, a combination of static scheduling, hardware-
supported context switching, and fixed machine instruction execution times assures timing de-
terminism. This has been exploited in a compiler which translates Esterel programs into multi-
threaded assembler code, and as part of the compilation process analyzes the WCRT in terms
of instruction cycles [13]. The assembler code is then annotated with this WCRT, which at run
time configures a hardware unit, the TickManager, which provides timing stability at the logical
tick level [14]. This decouples physical system reaction times from the tick-specific computation
requirements, which is for example desirable for control loop stability. Note that, as an alter-
native, one may still choose not to make use of the TickManager and instead let the processor
run freely, i. e., start the next tick as soon as the current tick finishes. This can improve average
case performance, at the expense of reaction time jitter and possibly higher power requirements.
Furthermore, our experiments indicate that there is no dramatic difference between WCRT and
average case reaction time (ACRT)—typically a factor of around 1.5 [13].
The WCRT analysis technique developed so far does already provide fairly promising results.
A relatively simple heuristics provides an accuracy typically in the 30–40% range [13]. However,
this heuristics still makes conservative and simplifying assumptions, and is not grounded in a
formal timing model. We believe that such a formal model will be instrumental in the further
development of these techniques to cover different processing platforms and several levels of
abstraction between hardware and software layers. A semantically grounded notion of WCRT
interface types will be crucial to make our techniques scale up to industrial-sized systems without
losing tight control of correctness and exactness of WCRT analysis. In this paper we take first
steps towards such an interface model.
3 A Type-Theoretic Approach to WCRT Analysis
Imperative synchronous programming as exemplified in Esterel, Statecharts provide predictabil-
ity in terms of determinism and bounded reaction in the face of powerful language constructs for
concurrency, state hierarchy, priorities, strong and weak preemption. These constructs induce so-
phisticated control dependencies so that WCRT analysis for such languages and processors that
directly support them is non-trivial. The risk of over- and under-approximations jeopardizes the
quality of WCRT analysis and thus creates a conflict between performance and predictability.
Combining performance with predictability involves a trade-off between analysis time and
execution time. Optimizations in the efficiency of timing analysis are paid for by compromis-
ing the quality of timing results. Under-approximations may result in a loss of coverage and
over-approximations in a loss of exactness. Thus, the system under analysis (SUA) becomes less
predictable or runs at slower (virtual) clock speed. Optimizations in speed and predictability of
the SUA, on the other hand, require sophisticated data-dependent analysis which are computa-
tionally expensive. This creates a direct conflict with the performance and predictability of the
WCRT algorithms themselves. To strike this trade-off a scalable and modular approach towards
timing analysis is called for in which precision and efficiency can be adjusted systematically in
wide margins.
4
3. A TYPE-THEORETIC APPROACH TO WCRT ANALYSIS
The existing WCRT algorithms such as [13] are neither compositional nor scalable in terms of
precision. They are global analyses on the complete control-flow graph of a monolithic program.
Also they run at the ground level of atomic program statements rather than hierarchical sub-
systems. In this paper we propose a theory of WCRT interfaces for synchronous programming
and show how it can be employed to obtain type-directed and modular WCRT analyses which
– give precise statements about exactness and coverage of timing values, supporting a variety
of timing abstractions;
– are dedicated to express the implicit control flow of imperative synchronous programming
languages;
– are scalable across component hierarchies and the software-hardware abstraction boundary.
As an interface theory our WCRT algebra operates on matrices of delay values characterizing
whole sub-systems rather than individual nodes like existing graph-theoretic WCRT algorithms
do. It combines max-plus algebra (N,max,+, 0,−∞), see e.g. [15], with Boolean logic1 to reason
about implicit control-flow.
The key element of WCRT analysis is the interchange (distribution) of max and +. In its







di2, . . . ,
∑
i∈pn
din) where pj are execution paths of the system and
dij the delay of path segment i in path pj . However, the number n of paths is exponential in the
number of elementary nodes of a system. Practicable WCRT analyses therefore reduce the max-
of-sums to the polynomial complexity of sum-of-maxs employing various forms of dependency
abstraction. Unfortunately, the obvious distribution of max(d1 + e1, d1 + e2, d2 + e1, d2 + e2) =
max(d1, d2) + max(e1, e2) is exact only if we have a full set of path combinations. In general,
there will be dependencies ruling out certain paths, in which case we only get conservative
over-approximations, e.g., max(d1, d2) + max(e1, e2) ≥ max(d1 + e1, d2 + e1, d2 + e2). On the
other hand, max(d1 + e1, d2 + max(e1, e2)) = max(d1 + e1, d2 + e1, d2 + e2) which eliminates
one addition operation, does work in this case. The art of WCRT analysis consists in finding a
judicious trade-off between forming the maximum operation early in order to aggregate data and
refining dependency paths for the sake of exactness. A practicable WCRT algebra must be able
to express and control this trade-off. In Sec. 5.1 we sketch a type theory which achieves this by
coupling timing delays d with logic formulas φ. A pair d : φ specifies the semantical meaning of
d within the control-flow of a program. Logical operations on the formulas then go hand-in-hand
with arithmetic operations on timing. E.g., suppose a schedule activates control points X and Y
with a delay of d1 and d2 instruction cycles. If they are independent then both control points are
jointly active within the maximum of both delays, i.e., (d1 : X)∧(d2 : Y ) ∼= (max(d1, d2) : X∧Y ).
On the other hand, if reaching Y is causally dependent on having reached X first then we must
take addition, i.e., (d1 : X) ∧ (X ⊃ (d2 : Y )) ∼= (d1 + d2 : X ∧ Y ) where ∧, ⊃ are logical
conjunction and implication, respectively. Thus, in general, the computation of paths involves
functional analysis of implicit control-flow using logic reasoning rather than just following pointers
in a graph (of explicit dependencies). We will illustrate this in Secs. 5.2–5.4 below by way of an
extended example.
The idea of modularizing embedded systems programming and specifically synchronous pro-
gramming using interfaces is not new. Mostly these interface models focus on causality issues,
which amounts to dependency analysis without quantitative time. On the other hand, there exist
numerous approaches to classical WCET analysis [16] but only few on WCRT analysis [14].
Logothetis, Schneider and Metzler [17, 18] have employed model checking to perform a precise
timing analysis for the synchronous language Quartz, which is similar to Esterel. However, their
problem is WCET since they are interested in computing the number of logical ticks required
to perform a certain transformational computation, such as a primality test. Instead, in reactive
system WCRT one considers how long it may take to compute a single logical tick. WCRT is an
orthogonal issue to WCET and has been rarely investigated in the literature so far.
André et al. [19] employ a causally simple notion of module in the sense that no instantaneous
interaction between modules is permitted. Such a model is not suitable for WCRT. Hainque et
1 To be more precise, we use a finite-valued Heyting algebra in which A ∨ ¬A is not a tautology. This
difference, however, is not essential for this paper.
5
3. A TYPE-THEORETIC APPROACH TO WCRT ANALYSIS
al. [20] use a topological abstraction of the underlying circuit graphs (or syntactic structure
of Boolean equations) to derive a fairly rigid component dependency model. A component is
assumed executable iff all of its inputs are available; after component execution all of its outputs
become defined. The former restriction means that single-threaded execution cannot be modeled
compositionally. The interface model also does not cover data dependency and thus cannot deal
with dynamic schedules and does not support WCRT, either.
The causality interfaces of Lee et al. [21] are more flexible. These are functions δ:Pi×Po → D
associating with every pair (i, o) ∈ Pi ×Po of input and output ports an element δ(i, o) ∈ D of a
dependency domain D, which expresses if and how an output o depends on input i. The domain
D is a linearly ordered dioid structure (D, 0, 1,⊗,⊕, <), where ⊗ models sequential composition
and ⊕ is the parallel composition of dependencies, with neutral elements 0 and 1, respectively.
Causality analysis is then performed by multiplication on the global system matrix describing the
dependencies between any two signals. Using an appropriate dioid structure D, one can perform
the analyses of [20] as well as restricted forms of WCRT. However, Lee’s interfaces cannot express
the difference between an output depending on the joint presence of several values as opposed
to depending on each input individually. In other words, they do not support full AND- and
OR-type synchronization dependencies and hence cannot represent neither multi-threading nor
multi-processing. The work reported here can be seen as an extension of Lee’s to address these
deficiencies.
Similar restrictions apply to recent work [22, 23] combining network calculus [15, 24] with
real-time interfaces in which sequential connection and concurrent composition operators play
an analogous role to Lee’s sequential and parallel composition, respectively. On top of that, these
works are concerned with the compositional modeling of regular execution patterns rather than
stabilization processes that make up the components inside each execution cycle of a synchronous
program. Existing interface theories [21–23], which aim at the verification of resource constraints
for real-time scheduling, handle rather delicate timing properties such as task execution latency,
arrival rates, resource utilization, throughput, accumulated cost of context switches, and so on.
On the other hand, in those works, the dependency on data and control flow is largely abstracted.
For instance, since the task sequences of Henzinger and Matic [23] are independent of each other,
their interfaces do not model concurrent forking and joining of threads. The causality expressible
there is even more restricted than that by Lee et al. [21] in that it permits only one-to-one
associations of inputs with outputs. The interfaces of Wandeler and Thiele [22] for modular
performance analysis in real-time calculus are like those of Henzinger and Matic [23] but without
sequential composition of tasks and thus do not model control flow.
AND- and OR-type synchronization dependencies are important for WCRT in synchronous
programming since reachability of control nodes in general depends both conjunctively and dis-
junctively on the presence of data. Moreover, execution may depend on the absence of data
(negative triggering conditions), which makes compositional modeling rather a delicate matter
in the presence of logical feedback loops. This severely limits the applicability of existing interface
models. The assume-guarantee style specification [22, 23] does not address causality issues arising
from feedback and negative triggering conditions. The interface automata of Alfaro, Henzinger,
Lee, Xiong [25, 26] model synchronous macro-states and assume that all stabilization processes
(sequences of micro-states) can be abstracted into atomic interaction labels. The introduction of
transient states [27] alleviates this to some extent, but the focus is still on regular (scheduling)
behavior. The situation is different, however, for cyclic systems, in which causality information is
needed. The interface problem in WCRT for stabilization processes is quite a different game—it
is simpler and more complex at the same time. It is simpler since we do not need sophisticated
resource and timing models; in a stabilization process the only resource is time—here instruction
cycles—which is only consumed once rather than in a regular pattern. It is more complex since we
need to model more sophisticated synchronization effects and solve causality and full-abstraction
problems due to feedback and negative dependencies. Because of the complications arising from
causality issues, there is currently no robust component model for imperative synchronous pro-
gramming.
Before we move on introducing our approach in the next section let us add a couple of general
remarks on our philosophy. Our approach to WCRT is specification-oriented and type-theoretic.
The idea we wish to stress is that timing does not have any meaning without at the same time
6
4. EXAMPLE OF ESTEREL-STYLE MULTI-THREADING, KEP ASSEMBLER
specifying the functional behavior to which the timing information is attached. Without function,
timing are just numbers signifying nothing. Working with (matrices of) numbers alone makes
is easy to lose track of semantical correctness and exactness and thus the connection between
performance and predictability. For illustration consider the following analogy: Imagine we are
given a full blueprint of all layers (metal, poly-silicon, etc.) of a VLSI solid-state circuit. We
know all its physical parameters and could simulate and fabricate the circuit. Yet, we would
not be able to sensibly tell what its timing is without knowing what the inputs and outputs are
and what function the circuit is supposed to execute. So just as correct functionality depends
on timing, correct timing depends on functionality. By using the type-theoretic structure f :
φ we bring out both the separation and the intimate coupling of scheduling bounds f and
scheduling types φ. The scheduling shape φ acts as a type specification for timing matrix f making
clear the semantical meaning of f and the scheduling bound f in turn acts as a quantitative
implementation of the schedule φ. We believe that type theory seen as an extension of algebra
provides a powerful framework for coupling functional and non-functional specifications that has
not yet been exploited to its full potential.
4 Example of Esterel-style Multi-threading, KEP Assembler
Esterel [3] programs communicate with the environment and internally via signals, which are
either present or absent during one instant. Signals are set present by the emit statement and
tested with the present test. They are reset at the start of each instant. Esterel statements can
be either combined in sequence (;) or in parallel (||). The loop statement simply restarts its
body when it terminates. All Esterel statements are considered instantaneous, except for the
pause statement, which pauses for one instant, and derived statements like halt=loop pause,
which stops forever. Esterel supports multiple forms of preemption, e. g., via the abort statement,
which simply terminates its body when some trigger signal is present. Abortion can be either
weak or strong. Weak abortion permits the execution of its body in the instant the trigger signal
becomes active, strong abortion does not. Both kinds of abortions can be either immediate or
delayed. The immediate version already senses for the trigger signal in the instant its body is
entered, while the delayed version ignores it during the first instant in which the abort body is
started.
Consider the Esterel fragment in Figure 1a. It consists of two threads. The first thread (G)
emits signals R, S, T depending on some input signal I. In any case, it emits signal U and terminates
instantaneously. The thread (H) continuously emits signal R, until signal I occurs. Thereafter, it
either halts, when E is present, or emits S and terminates otherwise.
The main problems when executing Esterel on standard processors are the handling of pre-
emption and synchronous parallelism. The Kiel Esterel Processor (KEP) [11] handles abortion
by watchers, which are executed in parallel with their body and simply set the program-counter
when the trigger signal becomes present. Synchronous parallelism is executed by multi-threading.
The KEP manages multiple threads, each with their own program counter and a priority. In each
instruction cycle, the processors determines the active instruction from the thread with the high-
est priority and executes it. New (sub-)threads are initialized by the PAR instruction. The PARE
instruction ends the initialization of parallel threads and sets the program counter of the current
thread to the corresponding join. By changing the priorities of the threads, arbitrary interleavings
can be specified; the compiler has to ensure that the priorities respect all signal dependencies,
i. e., all possible emits of a signal are performed before any testings of the signal. The specific
interleavings are not relevant for the WCRT analysis under multi-threading. Therefore, priority
changing instructions may be treated like the padding statement nothing which has no effect
on the system state other than adding a time delay. Additionally, for all parallel threads one
join instruction is executed, which checks whether all threads have terminated in the current
instant. If this is the case, the whole parallel terminates and the join transforms the control to
the next instruction. Otherwise the join blocks. Instructions pause and halt end the execution for
the current instant, the execution is resumed from the same point in the next instant.
Most instructions, like emit or entering an abort block, are executed in exactly one instruction
cycle. The pause instruction is executed both in the instant it is entered, and in the instant it
is resumed, to check for weak and strong abortions, respectively. Note that the halt instruction
7
4. EXAMPLE OF ESTEREL-STYLE MULTI-THREADING, KEP ASSEMBLER
1 [ % thread G
2 present I then
3 emit R
4 end present;












17 when immediate I;




























































L01: T0: PAR 1,G0,1
L02: PAR 1,H0,2
L03: PARE A1
L04: G0: PRESENT I,G1
L05: EMIT R
L06: G1: PRESENT I,G3
L07: GOTO G2
L08: G3: EMIT S
L09: EMIT T
L10: G2: EMIT U




L15: H1: PRESENT E,H2
L16: HALT
















(e) KEP sample Trace
Fig. 1: A simple Esterel program T with its corresponding control-flow graph and the resulting KEP
Assembler
is executed in one cycle. While halt and loop pause are functionally equivalent, their execution
times differ. The latter has a worst case reaction time (WCRT) of 3 instruction cycles (ics) while
the former only needs 1 ic.
The concurrent KEP assembler graph (CKAG, see Fig. 1c) captures the control flow, both
standard control and abortions, of an Esterel program. We distinguish two kinds of edges, in-
stantaneous and non-instantaneous. Instantaneous edges can be taken immediately when the
source node is entered, they reflect control flow starting from instantaneous statements or weak
abortions of delayed statements. Non-instantaneous edges can only be taken in an instant where
the control started in its source node, like control flow from pause statements or strong abor-
tions. The CKAG can be derived from the Esterel program by structural translation. For a given
CKAG, the generation of KEP Assembler is straight-forward (see Fig. 1c). Most nodes are trans-
lated into one instruction, only fork nodes are expanded to multiple instructions to initialize the
threads. In our example, the fork v0 is transformed into three instructions (L01 − L03).
8
5. WCRT INTERFACES AT WORK
5 WCRT Interfaces at Work
The WCRT of an Esterel program is the maximal number of instructions that are executed
during one instant. WCRT differs from WCET fundamentally in that it deals with the timing of
stabilization rather than iteration processes. WCRT assumes that all dependencies in the control
flow are acyclic and the propagation of control is a monotonic process in which each atomic
control point is only ever executed at most once. On the other hand, WCRT for Esterel-style
synchronous processing must handle non-atomic control flow including features such as hierar-
chical and concurrent threads, priorities and preemption. In the following we sketch the basic
elements of WCRT for synchronous processing and develop an algebra for modular timing anal-
yses. This algebra is an extension and adaptation of the intuitionistic propositional stabilization
theory presented in [28].
5.1 Introducing Interface Types
An execution σ is a finite and monotonically increasing sequence σ = ∅ ⊆ σ(0) ⊆ σ(1) ⊆ σ(2) ⊆
· · · ⊆ σ(n−1) of sets of control signals σ(i) ⊆ S (0 ≤ i < |σ| = n) called events. This includes the
special case n = 0 of the empty execution σ = ∅. For all executions, empty or not, we refer to the
initial ∅ as σ(−∞). Signals S contain the control-flow labels as well as input and output signals of
the program. E.g. for the program in Fig. 1c {L0−L20, G0−G3,H0−H3, I, E,R, S, T, U} ⊆ S.
We will also need activation controls active(v) ∈ S for nodes v in the hierarchical decomposition
of the program. An execution σ models the micro-sequence of instruction cycles (ic) which are
executed within a single synchronous instant. Each step σ(i) 7→ σ(i + 1) records the change of
controls between two successive activations of the thread. Accordingly, the number of instructions
executed by σ is the number of changes or steps |σ| − 1 rather than the number of events.
Example 1. One possible execution of the program T in Fig. 1c would be as follows. Initially,
control signal T0 is set, so σ(0) = {T0}. Then the PAR and PARE instructions making up the fork
node v0 are executed in line numbers L01, L02, L03 each taking one ic. The two PAR instructions
set up internal counters for thread control, which does not change the set of events in the signals
of Fig. 1c, which are the signals that we are interested in. Hence, σ(1) = σ(2) = {T0}. However,
after the PARE both control signals G0, H0 become present bringing threads G and H to life.
This means σ(3) = {T0, G0,H0}. The next instruction could be any of the two first instructions
of G or H. As it happens, the KEP Assembler Fig. 1d assigns higher priority to H so that our
execution continues with wabort (node v8), i.e., σ(4) = {T0, G0,H0, L12}. This brings up the
pause instruction v9. Now, depending on whether signal I is present or not the execution of
pause either moves to v12 (weak immediate abort) or terminates. Let us assume the latter, i.e.,
σ(5) = {T0, G0,H0, L12} which is the same set as σ(4) but now thread H is finished up for the
instant and has entered an implicitly wait state. The execution continues with the first instruction
of G, the present node v1 at label G0. Since I is assumed absent, its execution effects a jump
to label G1, i.e., σ(6) = {T0, G0,H0, L12, G1}. Thereafter, we run sequentially through nodes
v3, v5, v6, v7 giving σ(7) = {T0, G0,H0, L12, G1, G3}, σ(8) = {T0, G0,H0, L12, G1, G3, L9},
σ(9) = {T0, G0,H0, L12, G1, G3, L9, L10}. Executing the final emit instruction v7 hits the join
at entry L11, so that σ(10) = {T0, G0,H0, L12, G1, G3, L9, L10, L11}. Now both threads G and
H are finished. It takes one execution step of the join node v16 to detect this and to terminate the
synchronous instant of T with the final event σ(11) = {T0, G0,H0, L12, G1, G3, L9, L10, L11}.
Overall, we get an execution σ = σ(0)σ(1) · · ·σ(11) of the outer-most main thread of T from T0
consisting of 11 instruction cycles.
Note that signal L19 is not included in the last event σ(11) because control remains inside
the pause node v9 of T . Only in the next logical instant when T is resumed in v9 and thread H
eventually comes out at control point L19 (if signal I is present and E absent), then executing
the join v16 will bring us to control point L19 and out of T instantaneously.
Note further that the difference ∆i = σ(i + 1) \ σ(i) may be an arbitrary subset of S. It may
be empty as with i ∈ {1, 4, 10}, contain exactly one control as for i ∈ {3, 5, 6, 7, 8, 9} or more as
in i = 2. In general, ∆i will encompass more than one signal when a thread forks into concurrent
sub-threads or if other concurrent threads get executed between the two activations i and i + 1
of the thread represented by σ.
9
5. WCRT INTERFACES AT WORK
A set of executions S defines a schedule. The possible schedules of a program will be specified
by a scheduling type φ generated by the language
φ ::= A | true | false | φ ∧ φ | ¬φ | φ ⊃ φ | φ ∨ φ | φ ⊕ φ | φ ‖ φ | ◦φ.
We write S |= φ (σ |= φ) to say that schedule S (execution σ) validates the type φ. As a type,
each signal A ∈ S represents statement that “A is active (is present, has been traversed, is
scheduled) in all executions of the schedule”. The constant true is validated by all schedules and
false only by the empty schedule or the schedule which contains only the empty execution. The
type operators ¬, ⊃ are (intuitionistic) negation and implication. The two operators ∨ and ⊕ are
two forms of logical disjunction to encode internal and external non-determinism and ∧, ‖ are
two forms of logical conjunction related to true concurrency (multi-processing) and interleaving
concurrency (multi-threading), respectively. Finally, ◦ is the operator to express execution delays.
We will keep matters brief and present only some essential constructions in this theory. In
fact, a fragment of the language will suffice to define interface types for KEP programs as far as
they are treated in this paper. To begin with, define a
– basic control type to be an expression ζ built from literals A, ¬A (A ∈ S) and constants true,
false using conjunction ∧ and disjunction ⊕.
Basic control types satisfy S |= ζ iff σ |= ζ for all σ ∈ S, i.e., they express properties of
individual executions. On executions, ζ behaves like a standard Boolean combination of the
atomic statements A (A present throughout σ) and ¬A (A absent throughout σ). For instance,
σ |= A⊕¬A says that signal A is constant in σ, i.e., it is either present from the start, A ∈ σ(0)
or never becomes active, A 6∈ σ(|σ| − 1). Since signals which are not active initially may occur in
the course of an execution the type A⊕¬A is not a tautology, i.e., A⊕¬A 6∼= true. This reveals
the intuitionistic nature of negation which is crucial to handle the semantics of synchronous
languages in a compositional and fully abstract way [29, 30]. For special signals like activation of
nodes active(v) it is safe to assume active(v)⊕¬active(v) ∼= true since these are state signals and





j lij over literals lij .
Basic controls ζ are used to specify scheduling interaction at the input and output side of a
program block. When used as an output we need to express that ζ occurs delayed after some
maximal number of ics, d say. We write σ, d |= ζ or σ |= d : ζ for this as an abbreviation of
σ′ |= ζ where σ′ = σ(d)σ(d + 1) · · ·σ(|σ| − 1) is the suffix of σ starting after d ics. Note that if
the delay is larger than the length of the execution, d > |σ| − 1 then this suffix is empty σ′ = ∅
and thus σ |= d : ζ for all ζ, even ζ = false is validated. This is natural since by stepping beyond
the final event within a thread’s instant an inconsistent state is reached. We will see that this
may be exploited for optimizations in WCRT analysis. The limit cases are +∞ : false ∼= true
and −∞ : false ∼= false. The specification wait =df 1 : false is of particular interest. It says that
an execution has at most one event, i.e., σ |= wait iff |σ| ≤ 1. If non-empty such an execution
has reached the end of the scheduling instant and is pausing in a final event σ(0) ⊆ S. We permit
wait to be used as a third constant besides true and false inside basic controls. The reaction time
of an execution σ may then either be specified as σ |= d : wait or σ |= d + 1 : false depending on
whether we are interested in the number of steps or the number of events in σ.
– An output control is an expression ψ = ◦ζ1 ⊕ ◦ζ2 ⊕ · · · ⊕ ◦ζn with basic controls ζi. S |= ψ
specifies that schedule S reaches at least one of the controls ζj after a bounded number of ics.
The selection of which ζj is activated is expressed by ⊕ since it is an internal choice which is
dynamically resolved during each execution. Each operator ◦ stands for a possibly different
delay depending on which output ζj is taken. In contrast to this, an output control such as
ψ = ◦(ζ1 ⊕ ζ2 ⊕ · · · ⊕ ζn) only specifies one bound for all exits ζj .
– An input control is an expressions φ = ζ1 ∨ ζ2 ∨ · · · ∨ ζm where the disjunction ∨ refers to
the external non-determinism resolved by the environment which determines how a program
block is started. There is also no delay involved which is why we do not need operator ◦.
Formally, S |= φ if there is at least one ζi such that S |= ζi.
10
5. WCRT INTERFACES AT WORK
Notice the change of quantifiers between input and output regarding executions: S |= ζ1 ∨ ζ2
requires ∃i ∈ {1, 2}.∀σ ∈ S. σ |= ζi and is an external choice whereas S |= ζ1 ⊕ ζ2 is ∀σ ∈ S.∃i ∈
{1, 2}. σ |= ζi which expresses an internal choice.
Finally, we build (input-output) interface types for KEP program fragments as implications
φ ⊃ ψ between input controls φ =
∨m
i=1 ζi and output controls ψ =
⊕n
j=1 ◦ξj . The input
controls φ capture all the possible ways in which the program fragment can be started within an
instant and the output controls sum up the ways in which it can be exited during the instant. In
other words, the ζi and ξj represent logical input and output lines of the program. Intuitively,
S |= φ ⊃ ψ says that whenever any set of executions from schedule S enters the program through
one of the input controls ζi then within some bounded number dij of ics all these executions
are guaranteed to exit through one of the output controls ξj . The bounds dij may depend on
the choice of input and output control, in general. To capture the bounds we associate with
each interface type a delay matrix of shape n × m. Our type specifications then become logical
expressions of the form D : φ ⊃ ψ consisting of an interface type φ together with a timing matrix
D. The former describes the qualitative aspect of scheduling, the latter captures the quantitative
part of the interface. Formally, φ ⊃ ψ is a type specification for schedules S and the instrumented










Fig. 2: The four types of thread
paths: through path (a), sink
path (b), source path (c), inter-
nal path (d).
Let us look at how interface types are used. Figure 2 de-
picts a program fragment T abstracted into a reactive box
with input and output controls. The paths inside T seen in
Figure 2 illustrate the four ways in which a reactive block T
may participate in the execution of a logical tick: Threads
may (a) pass straight through the block entering at some
input control ζ and exiting at output control ξ; (b) enter
through ζ but pausing inside, waiting there for the next in-
stant; (c) start the tick inside the block and eventually (in-
stantaneously) leave through some exit control ξ, or (d) start
inside the block and never leave it during the current instant.
These paths or rather sections of a path are called through
paths, sink paths, source paths and internal paths, respectively.
The canonical interface type for such a block T (consider-
ing only one input control ζ and one output control ξ) sepa-






: (ζ ∨ active) ⊃ (◦ξ ⊕ ◦wait)
If one of the paths does not exist its associated delay is set to −∞. A block T can be classified
according to the paths that are executable in it, i.e., that have dtype ≥ 0 (rather than dtype = −∞)
for type ∈ {thr, src, snk, int}. Specifically, we call T a
– through node, if dthr ≥ 0, and Nthr the set of all through nodes
– source node, if dsrc ≥ 0, and Nsrc the set of all source nodes
– sink node, if dsnk ≥ 0, and Nsnk the set of all sink nodes
– internal node, if dint ≥ 0, and Nint the set of all internal nodes.
A delay node is a node with at least one non-instantaneous path, i. e., Ndel = Nsrc ∪Nsnk ∪Nint.
A strong delay node is a delay node without any through path, hence Nsdel = Ndel \ Nthr. A
transient node is a through node that contains only through paths, i.e., dsrc = dsnk = dint = −∞.
Thus Ntrans = Nthr \Ndel. We assume that each cyclic dependency loop in the program is broken
by at least one strong delay node.
It is useful to classify the exits of a node T according to the type of path on which they
can appear. We call an exit instantaneous if it can only be activated on through paths (type a)
and non-instantaneous if it can be reached only by source paths (type c). The successor nodes
reached by them are referred to accordingly as instantaneous successors and non-instantaneous
successors of T . In the KEP assembler graphs of [13] two other types of successor nodes are
11
5. WCRT INTERFACES AT WORK
distinguished, the control successors and exit successors. Since control successors are activated
either by through paths (type a) or source paths (type c) they constitute the general case from the
WCRT scheduling point of view. The exit successors, which are introduced by exit-trap blocks,
are activated exclusively as part of through paths. Thus, they are instantaneous successors in
our terminology.












: active ⊃ (◦ξ ⊕ ◦wait).
By logical transformations of interfaces various optimizations can be achieved. For instance, for
transient nodes we reduce Tsrf as follows:
Tsrf ∼= (dthr,−∞)
T : ζ ⊃ (◦ξ ⊕ ◦wait) ∼= ζ ⊃ ((dthr : ξ) ⊕ (−∞ : wait))
∼= ζ ⊃ (dthr:ξ ⊕ ((−∞ + 1) : false)) ∼= ζ ⊃ (dthr:ξ ⊕ (−∞ : false))
∼= ζ ⊃ (dthr:ξ ⊕ false) ∼= ζ ⊃ (dthr : ξ)
∼= (dthr) : ζ ⊃ ◦ξ,
while Tdpt = (−∞,−∞)
T : active ⊃ (◦ξ ⊕ ◦wait) ∼= true is simply dropped. Moreover, without
loss of generality we may suppose that Tsrf is normalized so it satisfies dthr ≤ dsnk for all sink
nodes T . Otherwise, if dsnk < dthr we would have
(dthr : ξ) ⊕ (dsnk : wait) ∼= (dthr : ξ) ⊕ (dsnk + 1 : false)
¹ (dthr : ξ) ⊕ (dsnk + 1 : ξ)
∼= max(dthr, dsnk + 1) : ξ
∼= dthr : ξ,
where φ ¹ ψ means that all executions satisfying φ also satisfy ψ. In the other direction, it
trivially holds that dthr : ξ ¹ (dthr : ξ)⊕ (dsnk : wait). Hence, whenever dsnk < dthr the two types
(dthr : ξ)⊕ (dsnk : wait) and (dthr : ξ) are equivalent which means essentially that the sink paths
are redundant and thus could be pruned. Operationally, if dsnk < dthr then the through path
(dthr) : ζ ⊃ ◦ξ of T dominates the WCRT while the sink path (dsnk) : ζ ⊃ ◦wait in T cannot
contribute to the longest execution. In this case we might as well assume dsnk = −∞, i.e., that
T is not a sink node at all. The same arguments apply to the depth interfaces Tdpt, for which we
may thus assume dsrc ≤ dint or otherwise dint = −∞.
In general, the interface type of a program T will mention a number of controls ζ1, ζ2, . . . ζm
and ξ1, ξ2, . . . , ξn on the input and output side for which the type would be
T = D : (ζ1 ∨ ζ2 · · · ∨ ζm) ⊃ (◦ξ1 ⊕ ◦ξ2 ⊕ · · · ⊕ ◦ξn) (1)
with a WCRT matrix D of shape n × m. The terminology above can be applied, mutatis mu-
tandis, to such general controls. A composite program will be made up of a number of program
fragments Ti each with its interface Di : φi ⊃ ψi. The total specification is the logical conjunc-
tion
∧
i Di : φi ⊃ ψi in WCRT type theory. The basic controls appearing in φi, ψi describe the




Di : φi ⊃ ψi ¹ D : φ ⊃ ψ (2)
in which the individual timing interfaces are combined into a total delay matrix D for an external
interface φ ⊃ ψ such that D is the smallest (component-wise) matrix of values such that (2)
holds. The external interface φ ⊃ ψ determines the functional precision with which we are
computing the WCRT of the composite system. For instance, instead of an interface like (1) which




5. WCRT INTERFACES AT WORK
and ξ =df ◦
⊕
j∈J ξj might consider merely subsets I ⊆ {1, . . . ,m} and J ⊆ {1, . . . , n} of inputs
and outputs bundled into a single control. Such an interface ζ ⊃ ◦ξ which specifies only one delay
value is more abstract than (1). We can trade off precision and efficiency of the WCRT analysis
within wide margins by choosing different types φi ⊃ ψi for the components and φ ⊃ ψ for the
composite program in (2).
Of course, we do not expect to get an equivalence ∼= but only an inclusion ¹ in (2) if the
calculation of D involves timing abstractions. In general, the right-hand side of (2) will include
more executions than the left-hand side. E.g., this occurs naturally whenever the composite type
φ ⊃ ψ does not include all internal signals mentioned in the types φi ⊃ ψi. Then the right-hand
side of (2) does not constrain the executions on those internal signals while the left-hand side




















1 // Esterel Program G:
2 present I then
3 emit R
4 end present;






// KEP FRAGMENT G
L04: G0: PRESENT I,G1
L05: EMIT R
L06: G1: PRESENT I,G3
L07: GOTO G2
L08: G3: EMIT S
L09: EMIT T
L10: G2: EMIT U
(c)
Fig. 3: Control Flow G (a), Esterel (b), KEP Assem-
bler (c).
How is the composition (2) per-
formed? In general, we can use any sound
and complete logical calculus for WCRT
type theory as described, e.g., in [28]. For
our special interface types this calculus
reduces to matrix multiplication in max-
plus algebra (N,+,max, 0,−∞) [15] com-
bined with logical reasoning on basic con-
trols ζ, which is a slight generalization of
Boolean algebra. We will explain this in
the following sections by way of our run-
ning example from Fig. 1. More details on
the semantic theory of WCRT algebra can
be found in the appendix, Sec. A.
5.2 Instantaneous Behavior:
Transient Nodes
Let us begin by illustrating the role
of type specifications for WCRT in the
single-threaded execution of transient
nodes. Although in this case WCRT anal-
ysis is equivalent to computing longest
paths and straight-forward even without
timing interfaces, it will give a first idea of
the power of types for modularization and
abstraction. This will lay out the play-
ground in which we can later deal with
Esterel-style pausing, preemption, multi-
threading, etc. General synchronous flow
graphs live at a higher level of abstraction in which control dependencies are more implicit and
thus WCRT analysis no longer identical to the longest path problem.
Our example is the sequential program G of Fig. 3 which is the fragment of Fig. 1c consisting
of nodes v1–v7. Each of them is a KEP assembler instruction which is entered either sequentially
through its instruction number L4–L10 or through an explicit jump to a control flow label such
as G0–G3. For instance, node v3 is accessed both through its linear instruction number L6 as
well as by jump to its label G1. In contrast, node v4 is only accessed through its line number
L7 while node v5 only by jumping to its label G3. The present nodes v1 and v3 are tests which
branch to their two successor instructions depending on the status of signal I. If I is present
then v1 moves to instruction v2 which immediately follows it and if I is absent then v1 passes
control to instruction v3 by jumping to label G1.
13
5. WCRT INTERFACES AT WORK
Let us assume that each of the basic instructions present, emit, goto take 1 processor cycle











: (L6 ∨ G1) ⊃ ◦L7 ⊕ ◦G3





: (G2 ∨ L10) ⊃ ◦L11.
Logically speaking, the problem of WCRT for G amounts to obtaining the tightest bound d such


















Fig. 4: The path p1 in G.
Weaving Paths The naive strategy would be to enumer-
ate all paths from G0 to L11, sum up the delays on each
path and then take the maximum. There are four paths
p1 = G0L5L6L7G2L11, p2 = G0G1L7G2L11, p3 =
G0L5L6G3L9L10L11 and p4 = G0G1G3L9L10L11.
Each of these paths defines a sub-graph of G with specific side-
inputs and side-outputs. For instance, p1 as indicated in Fig. 4
has the side-outputs G1, G3 and side-inputs G1, L10 so that
its full scheduling type is G0∨L10∨G1 ⊃ ◦G1⊕◦G3⊕◦L11.
This type says that if p1 gets activated through control edges
G0, L10 or G1 then it must be terminated through one of the
exits G1, G3 or L11. The timing matrix associated with this
type is










The entries −∞ in D1 mean that there is no causal control
flow from the corresponding input to the corresponding out-
put line. D1 can be obtained by successively multiplying (in
max-plus algebra) the timing matrices of the individual nodes
traversed by p1. We explain below in Sec. 5.2 how this is done.
At this point note that for WCRT of G we are not actually
interested in the exact timing of the side-inputs, i.e., the fact
that p1 can also be executed from L10 and G1. One way to suppress this information is by
pre-composing D1 with the timing map (0,−∞,−∞)
T : G0 ⊃ ◦G0 ⊕ ◦L10 ⊕ ◦G1 giving
p′1 = D
′
1 : G0 ⊃ ◦G1 ⊕ ◦G3 ⊕ ◦L11 (3)
where





























The type (3) stipulates that every execution which enters path p′1 through input G0 either leaves
p1 through exit G1 in at most 1 ic, through exit G3 in at most 3 ic or through L11 within
no more than 5 ics. D′1 in contrast to D1 no longer records the exact delay between particular
combinations of inputs and outputs of p1. For instance, the fact that L11 is reached in p1 from
L10 in only 1 ic of time and from G1 in at most 3 ics instead of 5, is lost. Similarly, the information
that from input L10 we cannot reach output G3 at all, indicated by the entry −∞ in D1 is not
present in D′1 any more.
In talking about executions along p1 we also assume that the path is not exited through
G1 or G3. Can we suppress the references to side-outputs G1 and G3 on the right of ⊃ of the
14
5. WCRT INTERFACES AT WORK
scheduling type (3)? Well, not directly, because if we simply drop them the resulting scheduling
type G0 ⊃ ◦L11 would guarantee that all executions entering p1 through G0 eventually come
out at L11. This is obviously false. However, we can fix this using negations and conjunctions.
The right type is (G0∧¬G1∧¬G3) ⊃ ◦L11 which strengthens the assumption G0 by ¬G1∧¬G3
to the effect that only executions starting in G0 not involving G1 or G3 are considered. Indeed
we can construct a timing for this type and justify its semantic correctness as follows: First, by
purely logical reasoning on types (not involving any matrix calculations) we argue that D′1 which
has type G0 ⊃ ◦G1⊕◦G3⊕◦L11 also must have type (G0∧¬G1∧¬G3) ⊃ ◦false⊕◦false⊕◦L11
and then compose with the sound schedule (−∞,−∞, 0) : (false ∨ false ∨L11) ⊃ ◦L11 to obtain
p1 = D
′′













 = (5). (4)
which is the proper type specification of path p1 without side-inputs and side-outputs. For the
other paths we get in a similar fashion
p2 = D
′′
2 : (G0 ∧ ¬L5 ∧ ¬G3) ⊃ ◦L11 D
′′
2 = (4) (5)
p3 = D
′′
3 : (G0 ∧ ¬G1 ∧ ¬L7) ⊃ ◦L11 D
′′
3 = (6) (6)
p4 = D
′′
4 : (G0 ∧ ¬L5 ∧ ¬L7) ⊃ ◦L11 D
′′
4 = (5). (7)
The path schedules (4)–(7) can now be woven together to obtain the final result G = (6) :
G0 ⊃ ◦L11. First, recall that [D1,D2] : (φ1 ∨ φ2) ⊃ ◦L11 is the sum of D1 : φ1 ⊃ ◦L11 and
D2 : φ2 ⊃ ◦L11. So we can combine (3)–(7) as D
′′ =df [(5), (4), (6), (5)] = (5, 4, 6, 5) with the
type
D′′ : ((G0 ∧ φ1) ∨ (G0 ∧ φ2) ∨ (G0 ∧ φ3) ∨ (G0 ∧ φ4)) ⊃ ◦L11 (8)
in which φ1 =df ¬G1 ∧ ¬G3, φ2 =df ¬L5 ∧ ¬G3, φ3 =df ¬G1 ∧ ¬L7 and φ4 =df ¬L5 ∧ ¬L7. We
pre-compose (8) with the timing (0, 0, 0, 0)T : ⊕4i=1(G0 ∧ φi) ⊃ ⊕
4
i=1◦(G0 ∧ φi):
(5, 4, 6, 5) · (0, 0, 0, 0)T = (6) : ((G0 ∧ φ1) ⊕ (G0 ∧ φ2) ⊕ (G0 ∧ φ3) ⊕ (G0 ∧ φ4)) ⊃ ◦L11. (9)
Then consider that by distributivity ⊕4i=1(G0∧φi) is logically equivalent to G0∧⊕
4
i=1φi. More-
over, under single-threaded execution (and only then2) the type equivalence
⊕4i=1φi = (¬G1 ∧ ¬G3) ⊕ (¬L5 ∧ ¬G3) ⊕ (¬G1 ∧ ¬L7) ⊕ (¬L5 ∧ ¬L7) ≡ true
holds in G since every execution thread must make a split decision for either exit L5 or G1 at
node v1 and for either L7 or G3 at node v3. Hence, every thread satisfies one of the summands
¬G1 ∧ ¬G3, ¬L5 ∧ ¬G3, ¬G1 ∧ ¬L7 or ¬L5 ∧ ¬L7. Taking all together gives ⊕4i=1(G0 ∧ φi) ≡
G0 ∧ ⊕4i=1φi ≡ G0 ∧ true ≡ G0 and thus (9) turns into (6) : G0 ⊃ ◦L11 which finally is the
WCRT of graph G.
Weaving Nets WCRT analysis by enumeration of paths, though sound, is of worst-case ex-
ponential complexity. A more efficient way of going about is to exploit dynamic programming
techniques. In the following we illustrate this process in terms of the decomposition of G seen
in Fig. 5. The strategy is to propagate WCRT information forward through G describing and
composing (in this case) sub-nets N1, N2, N3 rather than paths. In doing so we extend the
scheduling interfaces appropriately in order to make the matrices match up:
2 For multi-threaded execution several exits may be taken in one and the same run by different threads.
Also observe that, in contrast to ⊕, the disjunction φ1 ∨ φ2 ∨ φ3 ∨ φ4 is not equivalent to true! This
would hold under static and deterministic scheduling in which all executions are in one of the schedules
φi. Since the exits from statements v1 and v3 are data-dependent, different executions may choose
different paths.
15















Fig. 5: Decomposition of
G.
1. We obtain the scheduling type of N1 by combining v1 of type
G0 ⊃ ◦L5 ⊕ ◦G1 with v2 of type L5 ⊃ ◦L6. To compose them we
first lift v2 to type L5∨G1 ⊃ ◦L6⊕◦G1 by pre-multiplying it with
















: L5 ⊃ ◦L6 ⊕ ◦G1.
Next we combine v′2 with (−∞, 0)
















: L5 ∨ G1 ⊃ ◦L6 ⊕ ◦G1,
where as before [D1,D2] : (φ1 ∨ φ2) ⊃ ψ is the sum of D1 : φ1 ⊃ ψ
and D2 : φ2 ⊃ ψ. The shapes of matrices v1 and v
′′
2 now connect up
















: G0 ⊃ ◦L6 ⊕ ◦G1.
2. Next we construct the WCRT of block N2 : (L6 ∨ G1) ⊃
◦G2 ⊕ ◦G3 by composing v3 : (L6 ∨ G1) ⊃ ◦L7 ⊕ ◦G3 with v4 :
L7 ⊃ ◦G2. With the help of (0,−∞)T : G2 ⊃ ◦G2 ⊕ ◦G3 and
(−∞, 0)T : G3 ⊃ ◦G2⊕◦G3 we extend the schedule of v4 to become










































: (L6 ∨ G1) ⊃ ◦G2 ⊕ ◦G3.
3. The third step is to build N3 : (G2∨G3) ⊃ ◦L11 from v5 : G3 ⊃ ◦L9, v6 : L9 ⊃ ◦L10 and
v7 : (G2∨L10) ⊃ ◦L11. v5 and v6 can be directly multiplied: v6 ·v5 = (1) · (1) = (2) : G3 ⊃ ◦L10.
Then we lift v6 · v5 to type (G2 ∨ G3) ⊃ ◦G2 ⊕ ◦L10 and pre-multiply with v7:






























: (G2 ∨ G3) ⊃ ◦L11.
4. If we compose the three sub-nets N1, N2, N3 in sequence, our schedule of G all the way
from entry point G0 to exit L11 is complete:























= (6) : G0 ⊃ ◦L11.
This is indeed the weight of the longest path p3.
Bundling Abstractions There are of course other ways of arriving at (an approximation
of) the WCRT, corresponding to different network decompositions of G. It is also possible to
condense the timing information by bundling the inputs and outputs of N1, N2, N3 before they
are composed. For instance, one might decide to compress the scheduling type of N1 into a single
entry-exit delay N1′ : G0 ⊃ ◦(L6 ⊕ G1) which specifies the worst-case delay for an execution
entering through G0 to come out at L6 or G1, without distinguishing between threads exiting
on L6 and those exiting on G1. This is applied also to N2 and N3 as indicated in Fig.6.
16















Fig. 6: Bundle Abstraction
of G.
This compression is done algebraically by pre-composing N1
with (0, 0) : (L7∨G1) ⊃ ◦(L7⊕G1) which yields N1′ = (0, 0) ·N1 =
(0, 0) · (2, 1)T = (2) : G0 ⊃ ◦(L6 ⊕ G1). In the same way, we could

























= (3) : (G2 ⊕ G3) ⊃ ◦L11.
Each of the scheduling types N ′i is a max-abstraction of the original
interface Ni. This is seen from the fact that semantically Ni ⊆ N
′
i ,
i.e., each execution satisfying Ni is also an execution under schedule
N ′i . Thus, the timing value associated with N
′
i is an upper bound
of all schedules in Ni. Yet, it is not exact because N
′
i is properly
larger than Ni. For instance, N
′
1 contains an execution of duration 2
which exits on G1 while N1 (exactly) states that all threads leaving
through G1 consume no more than 1 ic. The same is true of the other
sub-nets N ′2 and N
′
3. As a result of this imprecision the composition
N3′ ·N2′ ·N1′ = (2) · (2) · (3) = (7) yields an over-approximation of
G’s WCRT.
Why would we want to abstract the sub-nets in this way and
lose exactness? The answer is that composing N1′, N2′, N3′ is more
efficient since it involves only scalars rather than matrices.
Data Dependency and Degrees of Precision A full and exact
WCRT specification encapsulating program G as a component would
require mention of program labels G1, G3, G2 which are accessible
from outside for jump statements. Therefore, the interface type for
single-threaded scheduling of G would be
G = (6, 4, 3, 1) : (G0 ∨ G1 ∨ G3 ∨ G2) ⊃ ◦L11.
This is still not the exact description of G since it does not express the dependency of the WCRT
on signal I. If I is present then all threads must take control edges L5 and L7 rather than G1
or G3 which are blocked. If I is absent then both G1 and G3 must be taken instead. As a result
the longest path p3 = G0L5L6G3L9L10L11 with delay 6 is not executable. To capture this we
consider signal I as just another control input and refine the WCRT scheduling type of G as
follows:
G = (5, 5, 3, 4, 3, 1) : ((G0 ∧ I) ∨ (G0 ∧ ¬I) ∨ (G1 ∧ I) ∨ (G1 ∧ ¬I) ∨ G3 ∨ G2) ⊃ ◦L11. (10)
The inclusion of signal I in the interface has now resulted in the distinction of two different delay
values 3 and 4 for G1 ⊃ ◦L11 depending on whether I is present or absent during the reaction.
On the other hand, G0 split into controls G0 ∧ I and G0 ∧ ¬I produces the same delay of 5 ics
in both cases, which is a decrease of WCRT compared to 6 : G0 ⊃ ◦L11 from above. Assuming
that input signal I is causally stable, i.e., I ⊕ ¬I ∼= true it is possible to optimize the interface
without losing precision: since (G0∧ I)⊕ (G0∧¬I) ∼= G0∧ (I ⊕¬I) ∼= G0∧ true ∼= G0 the map
(0, 0)T : G0 ⊃ ◦(G0 ∧ I) ⊕ ◦(G0 ∧ ¬I) is sound and can be used to compress the two entries of
value 5 in (10) into a single value 5 = max(5, 5) giving
G = (5, 3, 4, 3, 1) : (G0 ∨ (G1 ∧ I) ∨ (G1 ∧ ¬I) ∨ G3 ∨ G2) ⊃ ◦L11.
In the same vein, but this time without referring to stability, we could further bundle G1 ∧ I
and G3 into a single control with the single delay (3) : ((G1∧ I)⊕G3) ⊃ ◦L11 at the same level
precision.
17
5. WCRT INTERFACES AT WORK
Still, if we only ever intend to use G as a composite transient node G0 ⊃ ◦L11 the typing
G = (5) : G0 ⊃ ◦L11 might be sufficient. The different WCRT type specifications of G are
summarized in Fig. 7. Considering that G is a transient block, i.e., it does not have any delay
nodes, the depth interface of G is trivial Gdpt = (−∞,−∞)
T : active ⊃ ◦L11 ⊕ ◦wait . Though


















(6) : G0 ⊃ ◦L11 (worst-case abstraction from I)
(6, 4, 3, 1)T : (G0 ∨ G1 ∨ G3 ∨ G2) ⊃ ◦L11
(5, 3, 4, 3, 1)T : (G0 ∨ (G1 ∧ I) ∨ (G1 ∧ ¬I) ∨ G3 ∨ G2) ⊃ ◦L11
(5, 3, 4, 1)T : (G0 ∨ ((G1 ∧ I) ⊕ G3) ∨ (G1 ∧ ¬I) ∨ G2) ⊃ ◦L11
(5) : G0 ⊃ ◦L11 (accounting for dependency on I)
Fig. 7: Component G and some of its WCRT Types with varying precision and conciseness.
All the operations performed above are supported by semantically sound proof rules in WCRT
type theory. The logical manipulation of types often can be also done implicitly and hard-coded
into the graph-theoretic search strategies that make up the cleverness of a particular WCRT
algorithm. Where interface types are not used directly in the calculations they provide for a
highly compositional fine-analysis which allows us to validate our WCRT algorithms in terms of
precise statements about correctness and exactness. Due to their logical-symbolic nature WCRT
interfaces can be applied in rather general situations which involve data and higher control-flow
constructs as used in synchronous programming. Some aspects of the latter will be expounded
in the following sections.
5.3 Sequential Behavior: Delay Nodes
Now we take a look at sequential control flow which initiates and terminates in pause and halt
nodes. We illustrate how these are related to the scheduling types active and wait . We use the
example seen in Fig. 8 which is the fragment of nodes v8–v15 from our running example in Fig. 1a.
Nodes wabort, emit, goto, present, nothing are transient and specified as before in Sec. 5.2. But
now the instantaneous paths are broken by the delay nodes v9 and v13.
Consider the pause node v9. It can be entered by two controls, the line number L12 and
program label H3 and left via two kinds of exits, a non-instantaneous edge L13 and a instanta-
neous exit H1 (weak abortion). When a control thread enters v9 then either it terminates the
current instant inside the node or leaves through the weak abort H1 (data-dependent, if signal
I is present) continuing the current instant, instantaneously. A thread entering v9 never exits
through L13 in the same instant. On the other hand, if a thread is started (resumed) from inside
the pause node v9 then control can only exit through L13. This suggests to specify the pause





: (H3 ∨ L12) ⊃ ◦H1 ⊕ ◦wait (11)
(1) : active(v9) ⊃ ◦L13 (12)
The specification (11) says that if pause is entered through H3 or L12 it can be left through H1
or terminate (wait) inside the pause. In both cases execution takes 1 instruction cycle, either
18

























7 when immediate I;











L15: H1: PRESENT E,H2
L16: HALT
L17: H2: EMIT S
L18: NOTHING
(c)
Fig. 8: Sequential Control Flow H (a), Esterel program (b) and KEP Assembler (c).
to move the program counter forward to H1 or to reach an internal wait state. Since there are
no differences in the delays we could bundle the inputs H3, L12 and compress the matrix as
(1, 1)T : (H3 ⊕ L12) ⊃ ◦H1 ⊕ ◦wait or even (1) : (H3 ⊕ L12) ⊃ ◦(H1 ⊕ wait) without losing
information over (11). Still, we could do even better and record the dependency of control on





: ((H3 ⊕ L12) ∧ I) ∨ ((H3 ⊕ L12) ∧ ¬I) ⊃ ◦H1 ⊕ ◦wait .
This separates the threads which must stop inside the pause from those which must leave via H1
due to a weak immediate abort on signal I.
The specification (12) accounts for threads starting in the pause which must necessarily pass
control to L13 within one instruction cycle. This is why (12) does not include a source path
of type active(v9) ⊃ ◦H1. In a similar way we can model strong or non-immediate aborts. For
delayed (non-immediate) weak abort, there is no path from L12 to H1. For strong abort, the
abortion can not be taken in the instant when v9 is entered, thus there is no path from H3 or
L12 to H1.
The halt node v13 in Fig. 8 (equivalent to an infinitely pausing loop pause, but faster) is not
only a sink for control threads entering through L16 but it also has an internal path of length 1
(which is repeated at every instant). It is a strong delay node and specified by
(1, 1) : (active(v13) ∨ L16) ⊃ ◦wait . (13)
Now let us determine the WCRT of G, essentially refining the strategy informally described
in [13]. First, generalizing the notion of surface interfaces we define the surface reaction time
of a control edge A of graph H as the (component-wise) smallest vector A.srf such that A.srf :
A ⊃ ◦L19 ⊕ ◦wait is derivable from the H’s WCRT theory.3 Obviously, the value H0.srf is
the surface interface Hsrf of program H in Fig. 8 seen as a reactive box with (sole) input H0.
Following the strategy of the WCRT algorithm of [13] the computation of H0.srf proceeds by
depth-first search forward from H0: We compose node v8 of type (1) : H0 ⊃ ◦L12 with the
3 In [13] this number is denoted A.inst and computed for complete programs in which all paths are
terminated by pause or halt nodes. Dangling exits like L19 of Fig. 8 are not considered. Also, the
WCRT is defined for nodes rather than edges as we do here. These are minor differences, however.
19
5. WCRT INTERFACES AT WORK
through part (1, 1)T : L12 ⊃ ◦H1 ⊕ ◦wait of v9’s interface given in (11) to get (1, 1)
T · (1) =
(2, 2)T : H0 ⊃ ◦H1 ⊕ ◦wait . This reduces the computation of H0.srf to that of computing
H1.srf : H1 ⊃ ◦L19 ⊕ ◦wait via H0.srf = [H1.srf, (−∞, 0)T ] · (2, 2)T , where [H1.srf, (−∞, 0)T ]
is the lifting of H1.srf : H1 ⊃ ◦L19 ⊕ ◦wait to type (H1 ∨ wait) ⊃ ◦L19 ⊕ ◦wait so it can be
composed with (2, 2)T . As shown below we eventually get H1.srf = (3, 2)T and thus
Hsrf = H0.srf = [H1.srf, (−∞, 0)
















: H0 ⊃ ◦L19 ⊕ ◦wait (14)
which tells us that the longest through path (H0 L12 H1 H2 L18 L19) has 5 ics and the longest
sink path (H0 L12 H1 L16) consumes 4 ics.
Like the surface interfaces, the depth interfaces, too, may be generalized: Each node v has
a depth reaction time4 in H, v.dpt, which is the smallest vector m derivable such that m :
active(v) ⊃ ◦L19⊕◦wait. It characterizes the maximal duration of an instant which starts inside
v either leaving at exit L19 or terminating in the pause or halt nodes of H. It only needs to
be computed for source nodes and internal nodes. For all others we have v.dsrc = v.dint = −∞
which implies v.dpt = −∞, too. In graph H of Fig. 8 only the pause node v9 and the halt node
v13 have a source or internal WCRT.
For instance, the depth interface of (13) gives delay 1 : active(v13) ⊃ ◦wait for the halt. Since
this internal path is the only way to start in v13 and eventually terminate or exit H, the depth
reaction time of v13 is
v13.dpt = (−∞, 1) : active(v13) ⊃ ◦L19 ⊕ ◦wait .
In contrast, the depth interface (12) of the pause node v9 only contains a source path (1) :
active(v9) ⊃ ◦L13. All source paths active(v9) ⊃ ◦L19 ⊕ ◦wait of H must have this source path
as their prefix. So, we must first determine the worst case delay of type L13 ⊃ ◦L19⊕◦wait , which
is nothing but L13.srf. Again by depth-first search we get L13.srf from functional composition
L13.srf = H3.srf · (1) · (1) = H3.srf · (2) : L13 ⊃ ◦L19 ⊕ ◦wait (15)
from nodes v10 of type (1) : L13 ⊃ ◦L14 and v11 of type (1) : L14 ⊃ ◦H3 together with the
surface delay H3.srf : H3 ⊃ ◦L19⊕◦wait . From (11) we extract (1, 1)T : H3 ⊃ ◦H1⊕◦wait . This
type specifies two ways for termination. Either directly inside the pause or continuing via H1.srf :
H1 ⊃ ◦L19 ⊕ ◦wait . We can compute the latter using v12’s type (1, 1)
T : H1 ⊃ ◦H2 ⊕ ◦L16 as
follows: We first combine v13’s surface type (1) : L16 ⊃ ◦wait from (13) and both v14’s interface
















: (H2 ∨ L16) ⊃ ◦L19 ⊕ ◦wait
















: H1 ⊃ ◦L19 ⊕ ◦wait .
We embed this information canonically as [H1.srf, (−∞, 0)T ] : (H1 ∨ wait) ⊃ ◦L19 ⊕ ◦wait and
compose with (1, 1)T : H3 ⊃ ◦H1 ⊕ ◦wait for the final result




















: H3 ⊃ ◦L19 ⊕ ◦wait .
Now we plug this into (15) and get










: L13 ⊃ ◦L19 ⊕ ◦wait .
4 This number is denoted v.next in [13].
20
5. WCRT INTERFACES AT WORK
Finally, the depth reaction time of v9 is obtained by composing L13.srf with (1) : active(v2) ⊃
◦L13 obtaining
v9.dpt = L13.srf · (1) = (7, 6)
T : active(v9) ⊃ ◦L19 ⊕ ◦wait .
Having v9.dpt and v13.dpt available we can compute the depth interface of block H: Since
the instant can start in the pause node v9 or the halt node v13 the active control of H is
active = active(v9)⊕active(v13). So, we are looking for the smallest bound for type (active(v9)⊕
active(v13)) ⊃ ◦L19 ⊕ ◦wait . Formally, we get this by combining v9.dpt and v13.dpt into the
matrix [v9.dpt, v13.dpt] : (active(v9) ∨ active(v13)) ⊃ ◦L19 ⊕ ◦wait and then pre-composing with
(0, 0)T : (active(v9) ⊕ active(v13)) ⊃ ◦active(v9) ⊕ ◦active(v13). This gives










: active ⊃ ◦L19 ⊕ ◦wait .
In general, Hdpt is computed as the (component-wise) maximum over all depth reaction times
for all delay nodes in H. This is perfectly uniform in the interfaces of H’s nodes since we could
well include all nodes into the maximum, max{vi.dpt | 8 ≤ i ≤ 14}, postulating active(H) =
⊕14
i=8 active(v14). However, since vi.dpt = −∞ for nodes different from v9 and v13, all nodes
except the delay nodes will happily drop out of the global maximum.
Note on Optimization The WCRT algorithm presented in [13] is somewhat more generous about
surface and depth interfaces than what we have discussed above. Here we take these interfaces to
have output control ◦L19⊕◦wait distinguishing the WCRT of through and source paths ending
in L19 from those of sink and internal paths ending in wait . The algorithm [13] (implicitly
by hard-coding maximum operations) merges both types of path into the single control ◦L19.
This is sound since the types ◦L19 ⊕ ◦wait and ◦L19 are equivalent in the sense that every
schedule well-timed for one is also well-typed for the other. For instance, we have d1 : L19⊕ d2 :
wait ¹ d1 : L19 ⊕ d2 + 1 : false ¹ d1 : L19 ⊕ d2 + 1 : L19 ¹ max(d1, d2 + 1) : L19 and
d : L19 ¹ d : L19 ⊕−∞ : wait . Provided5 d1 > d2, the optimization ◦L19 ⊕ ◦wait to ◦L19 does
not involve any loss of precision for WCRT due to the maximum operation.
In our example program H this optimization can be exploited as follows: The surface type
of v13 is reworked (1) : L16 ⊃ ◦wait ∼= (2) : L16 ⊃ ◦false ¹ (2) : L16 ⊃ ◦L19 which is
then combined with (2) : H2 ⊃ ◦L19 (obtained from the surface types of v14 and v15) to give
[(2), (2)] = (2, 2) : (H2∨L16) ⊃ ◦L19. This is multiplied with node v12’s interface (1, 1)
T : H1 ⊃
◦H2 ⊕ ◦L16 to produce the surface type H1.srf = (2, 2) · (1, 1)T = (3) : H1 ⊃ ◦L19. From there
we get H3.srf = (4) : H3 ⊃ ◦L19 and finally v9.dpt = (7) : active(v9) ⊃ ◦L19 without ever
having to include possible paths ending in wait . The reason is that these paths may be safely
pruned in the search.
On the other hand, suppose the halt node v13 encapsulated an entire sub-system which took
100 ics to reach termination on the sink path inside (rather than just 1). Applying the same
approximating optimization which replaces (100) : L16 ⊃ ◦wait with (100) : L16 ⊃ ◦L19
we would get H1.srf = (2, 100) · (1, 1)T = (101) : H1 ⊃ ◦L19 and further v9.dpt = (105) :
active(v9) ⊃ ◦L19 rather than the more exact v9.dpt = (7, 105)
T : active(v9) ⊃ ◦L19 ⊕ ◦wait .
The latter is more faithful with respect to instantaneous paths that may be added from control
point L19 onwards in the external context of G (specifically, parallel compositions of G with
other programs) which should not be weighted with delay 105 but 7. This is actually a source
of over-approximation in the existing algorithm [13]. Keeping track of the scheduling type (the
difference between ◦L19 ⊕ ◦wait and ◦L19) can help to control the trade-off between efficiency
and precision.
5.4 Concurrent Behavior: Fork and Join
5 We used this special condition in Sec. 5.1 to normalize interfaces
21










Fig. 9: Program T from Fig. 1a in
which threads G and H are executed
concurrently.
Consider Fig. 9 in which the two sub-programs G and H
with input controls G0, H0 and exits L11, L19 have now
been combined inside the concurrent fork-join block of our
original example program T from Fig. 1a. As discussed














: (H0 ∨ active(H)) ⊃ ◦L19 ⊕ ◦wait .
(17)
How is multi-threading of G and H expressed? The logical
conjunction G ∧ H is not appropriate because it would
say that both threads run in parallel while instead G and
H are supposed to be scheduled in an interleaved fashion.
For instance, the conjunction (d1, d2) : ◦L11∧ ◦L19 of two threads producing an exit signal L11
and L19 in d1 and d2 number of ics, respectively, would imply that max(d1, d2) : ◦(L11 ∧ L19)
which is correct for concurrent multi-processing while for multi-threading it should be the sum
d1 + d2 : ◦(L11 ∧ L19) instead. To capture the fact that G and H share the same processor and
thus are interleaved, a new operator G ‖ H on scheduling types is introduced to represent the
concurrent configuration in Fig. 9. The definition of T = G ‖ H is developed in Sec. B. In the
following, let us sum up its basic properties in application to G and H.
Firstly, we need to extend the interfaces G and H somewhat to instrument the synchronization
between them. E.g., G is strengthened to G′ =df G ∧ ((L11 ∧ ¬L19) ⊃ wait) to express that
thread G runs into a wait state whenever it reaches its exit control L116 and H is not already at
its exit L19 (¬L19). Symmetrically, H is adapted to H ′ =df H ∧ ((L19 ∧ ¬L11) ⊃ wait). These
conjunctive additions do not change the WCRT interfaces (16) and (17). They merely strengthen
the behavior with regard to the exit controls L11, L19, wait rather than the causality between
entry controls G0, H0, active(G), active(H) and these exits.
Secondly, for the source behavior of composite T we introduce a new activation control
active(T ) which starts the threads G and H according to active(T ) ⊃ (active(G) ∧ active(H)) ⊕
(active(G)∧¬active(H))⊕ (¬active(G)∧ active(H)). This is because source paths of T can arise
from any non-empty subset of source paths from the sub-threads G and H. In terms of scheduling
types this is presented systematically as
(0, 0, 0,−∞)T : active(T ) ⊃ active(◦{G,H}) (18)
with the abbreviation active(◦{G,H}) =df ◦(active(G)∧active(H))⊕◦(active(G)∧¬active(H))⊕
◦(¬active(G) ∧ active(H)) ⊕ ◦(¬active(G) ∧ ¬active(H)). The entry −∞ in (18) says that the
case is excluded where T is active but ¬active(G) ⊕ ¬active(H) holds, i.e., no thread is active.
Thirdly, when running the depth executions of T from delay nodes inside G and H we always
assume that those threads which are not part of this depth execution have already terminated
instantaneously in the previous instant and thus wait at their exits. Thus, under the global control
active(T ) we strengthen the depth interfaces of G and H to wait at their exits whenever they
are inactive. We put G′′ =df G
′ ∧ (¬active(G) ⊃ (L11 ∧ wait)) and H ′′ =df H
′ ∧ (¬active(H) ⊃
(L19∧wait)). This has an effect on the depth interfaces of G and H. We now get the immediate
reaction L11 in case of ¬active(G) and L19 if ¬active(H). This can be accounted for in new
6 Note that label L11 and H0 point to the same program instruction, viz. the first instruction of H.
Since they encode different ways of reaching the same state of the program counter they cannot be
identified. In particular, we can have H0 active in an execution in which H has been started without
L11 being true, viz. as long as G is not yet finished.
22













: (H0 ∨ active(H) ∨ ¬active(H)) ⊃ ◦L19 ⊕ ◦wait
with the added third column vector (0,−∞)T of type ¬active(G) ⊃ ◦L11⊕◦wait and ¬active(H) ⊃
◦L19 ⊕ ◦wait , respectively.
The composition of G′′ ‖ H ′′ now generates all possible interleavings of executions specified
by type G′′ with those from type H ′′. On the input side there are 9 possible ways of combining the
three input controls G0∨active(G)∨¬active(G) of G with those of H0∨active(H)∨¬active(H)
from H. However, we are only interested in the surface paths activated by G0 ∧ H0 and the
depth paths activated by the combinations contained in active(◦{G,H}). Instead of computing




















: H0 ⊃ ◦L19 ⊕ ◦wait . (19)
Notice how the entry −∞ in G′′srf expresses that program G does not have a sink path and thus












: (active(H) ∨ ¬active(H)) ⊃ ◦L19 ⊕ ◦wait . (21)
Again the first column (−∞,−∞)T in G′′dpt records that G possesses neither source nor internal
paths. It turns out that the interleaving G′′srf ‖ H
′′






























(G0 ∧ H0) ⊃ (◦(L11 ∧ L19) ⊕ ◦(L11 ∧ wait) ⊕ ◦(wait ∧ L19) ⊕ ◦(wait ∧ wait)).
This gives separate WCRT for all paths started in G0 ∧ H0 in which both sub-systems have
through paths (L11∧L19), one has a through path but the other pauses (L11∧wait , wait ∧L19)
and those in which both threads end up in a pause (wait ∧ wait). In the last three cases the
composite system pauses. They can be merged by the inclusion
(
0 −∞ −∞ −∞
−∞ 0 0 0
)
: ((L11 ∧ L19) ∨ (L11 ∧ wait) ∨ (wait ∧ L19) ∨ (wait ∧ wait))
⊃ ◦(L11 ∧ L19) ⊕ ◦wait . (23)
Composing this exit-abstraction (23) with (22) gives the surface interface of T :
Tsrf =
(
0 −∞ −∞ −∞











: (G0 ∧ H0) ⊃ ◦(L11 ∧ L19) ⊕ ◦wait .
For the depth interface of T we apply the same strategy: First we compose the two depth
interfaces (20) and (21), G′′dpt ‖ H
′′












































−∞ −∞ 7 0
−∞ −∞ 6 −∞
−∞ −∞ −∞ −∞






5. WCRT INTERFACES AT WORK
of type
active({G,H}) ⊃ (◦(L11 ∧ L19) ⊕ ◦(L11 ∧ wait) ⊕ ◦(wait ∧ L19) ⊕ ◦(wait ∧ wait)).
subject to the abbreviation active({G,H}) =df (active(G)∧active(H))∨(active(G)∧¬active(H))∨
(¬active(G)∧ active(H))∨ (¬active(G)∧¬active(H)). Finally, the input and output controls are
adjusted again: on the input side we connect with active(T ) using (18) which drops the irrelevant




0 −∞ −∞ −∞







−∞ −∞ 7 0
−∞ −∞ 6 −∞
−∞ −∞ −∞ −∞




















0 −∞ −∞ −∞




















: active(T ) ⊃ ◦(L11 ∧ L19) ⊕ ◦wait .
This is precisely what we should expect: Since block G is transient the depth interface of T =
G ‖ H is entirely determined by that of H.
To sum up, we have established the WCRT interface of T = G ‖ H from Fig. 9 as





: ((G0 ∧ H0) ∨ active(T )) ⊃ ◦(L11 ∧ L19) ⊕ ◦wait .
This models the concurrent composition of G and H but not yet the interface of the composite












: (T0 ∨ active) ⊃ (◦(G0 ∧ H0) ⊕ ◦active).
The entry (4) : T0 ⊃ ◦(G0 ∧ H0) of fork includes the ics for two PAR, one PARE and one JOIN.
Since the JOIN is always executed when at least one thread is active, its execution time is added
to the fork, not the join. join itself is the identity matrix. Adding fork and join on the input and





















: (T0 ∨ active) ⊃ (◦L20 ⊕ ◦wait)
as the WCRT for the composite program T . The longest through path is exemplified by the
sequence of nodes v0(3)+{v1+v2+v3+v4+v7}G(5)+{v8+v9+v12+v14+v15}H(5)+v16(1) = 14.
A longest sink path is v0(3)+{v1+v2+v3+v4+v7}G(5)+{v8+v9+v12+v13}H(4)+v16(1) = 13. As
a maximal source path we could take {}G(0)+{v9+v10+v11+v9+v12+v14+v15}H(7)+v16(1) = 8
and finally a possible longest internal path {}G(0)+{v9+v10+v11+v9+v12+v13}H(6)+v16(1) = 7.
In a practical WCRT algorithm such as the one of [13] many of the matrix multiplications
shown above are executed implicitly and more efficiently in the combinatorics of traversing the
program’s control flow graph forming maximum and additions as we go along. This is possible
only so far as control flow dependencies are represented explicitly in the graph. In general, with
data-dependencies, this may be an exponential problem so that symbolic techniques are needed.
The point of the above is to highlight the essential algebraic nature of WCRT analysis and the





We introduced an interface algebra for compositional analysis of WCRT in synchronous multi-
threading. We have applied these interfaces to compute the WCRT for programs running on the
Kiel Esterel Processor. The approach is very flexible: From considering all possible data, which
gives an exact WCRT for the price of possible exponential computation time, to abstracting
from all internal behavior, which is very fast but might lead to a large over-approximation,
all levels of exactness can be applied. Even though it is still under development, the interface
algebra results in tighter WCRT analysis for the KEP compared to existing algorithms. For the
small example in Section 4 we computed a WCRT of 14, while the algorithm described in [13]
computes an over-approximation of 16. This is due to the better handling of parallel composition.
Since the interfaces are compositional, we should also be able to get a better performance for
the WCRT computation itself. Furthermore, data-dependencies with arbitrary precision can be
easily expressed in the interface algebra, to rule out impossible executions and get an even tighter
WCRT.
Encouraged by the manual computations, we want to implement a WCRT algorithm based on
the new interfaces. There are still some constructs of Esterel that cannot be directly expressed in
the interface algebra so far. These are in particular exit-traps, which are a third form of abortion
to express exceptions. Furthermore, in this paper we consider all abortions as immediate, which
might lead to more and longer paths than actually executable. We would like to extend our
interfaces to cover exit-traps and non-immediate aborts as well. A further step would be to
integrate thread priorities into the interfaces, to reduce the number of considered paths. So far,
abortions are handled by adding transitions to all pause nodes inside them. It might be more
natural to extend the control flow graphs by hierarchy to directly express the hierarchical nature
of abortions. This could also be captured by the interfaces.
So far, we use the interfaces to compute WCRTs for Esterel programs on the KEP. But this
approach could as well be used for other execution platforms, which implement a similar form
of synchronous multi-threading or multi-processing.
References
1. Edwards, S., Lee, E.A.: The case for the Precision Timed (PRET) machine. Technical report no.
ucb/eecs-2006-149, EECS Department, University of California, Berkeley (November 2006) http:
//www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-149.pdf.
2. Benveniste, A., Caspi, P., Edwards, S.A., Halbwachs, N., Guernic, P.L., de Simone, R.: The Syn-
chronous Languages Twelve Years Later. In: Proceedings of the IEEE, Special Issue on Embedded
Systems. Volume 91. (January 2003) 64–83
3. Berry, G., Cosserat, L.: The ESTEREL Synchronous Programming Language and its Mathematical
Semantics. In: Seminar on Concurrency, Carnegie-Mellon University. Volume 197 of Lecture Notes
in Computer Science (LNCS)., Springer-Verlag (1984) 389–448
4. Harel, D.: Statecharts: A visual formalism for complex systems. Science of Computer Programming
8(3) (June 1987) 231–274
5. Lee, E.A.: The problem with threads. IEEE Computer 39(5) (2006) 33–42
6. Potop-Butucaru, D., Edwards, S.A., Berry, G.: Compiling Esterel. Springer (May 2007)
7. Berg, C., Engblom, J., Wilhelm, R.: Requirements for and design of a processor with pre-
dictable timing. In Thiele, L., Wilhelm, R., eds.: Perspectives Workshop: Design of Systems
with Predictable Behaviour. Number 03471 in Dagstuhl Seminar Proceedings, Internationales
Begegnungs- und Forschungszentrum für Informatik (IBFI), Schloss Dagstuhl, Germany (2004)
http://drops.dagstuhl.de/opus/volltexte/2004/5.
8. von Hanxleden, R., Li, X., Roop, P., Salcic, Z., Yoong, L.H.: Reactive processing for reactive systems.
ERCIM News 66 (October 2006) 28–29 http://ercim-news.ercim.org/content/view/51/82/.
9. Li, X., von Hanxleden, R.: The Kiel Esterel Processor - a semi-custom, configurable reactive proces-
sor. In Edwards, S.A., Halbwachs, N., v. Hanxleden, R., Stauner, T., eds.: Synchronous Programming
- SYNCHRON’04. Number 04491 in Dagstuhl Seminar Proceedings, Internationales Begegnungs- und
Forschungszentrum (IBFI), Schloss Dagstuhl, Germany (2005) http://drops.dagstuhl.de/opus/
volltexte/2005/159.
10. Yoong, L.H., Roop, P., Salcic, Z., Gruian, F.: Compiling Esterel for distributed execution. In:
International Workshop on Synchronous Languages, Applications, and Programming (SLAP’06),
Vienna, Austria (March 2006)
25
6. CONCLUSION
11. Li, X., Boldt, M., von Hanxleden, R.: Mapping Esterel onto a multi-threaded embedded processor.
In: Proceedings of the 12th International Conference on Architectural Support for Programming
Languages and Operating Systems (ASPLOS’06), San Jose, CA (October 21–25 2006)
12. Yuan, S., Andalam, S., Yoong, L.H., Roop, P.S., Salcic, Z.: Starpro—a new multithreaded direct ex-
ecution platform for Esterel. In: Proceedings of Model Driven high-Level Programming of Embedded
Systems (SLA++P), Workshop at ETAPS ’08, Budapest, Hungary (April 2008)
13. Boldt, M., Traulsen, C., von Hanxleden, R.: Worst case reaction time analysis of concurrent reactive
programs. Electronic Notes in Theoretical Computer Science 203(4) (June 2008) 65–79 Proceed-
ings of the International Workshop on Model-driven High-level Programming of Embedded Systems
(SLA++P 2007), March 2007, Braga, Portugal.
14. Li, X., Lukoschus, J., Boldt, M., Harder, M., von Hanxleden, R.: An Esterel Processor with Full
Preemption Support and its Worst Case Reaction Time Analysis. In: Proceedings of the International
Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), New York,
NY, USA, ACM Press (September 2005) 225–236
15. Baccelli, F.L., Cohen, G., Olsder, G.J., Quadrat, J.P.: Synchronisation and Linearity. John Wiley
& Sons (1992)
16. Wilhelm, R., Engblom, J., Ermedahl, A., Holsti, N., Thesing, S., Whalley, D., Bernat, G., Ferdi-
nand, C., Heckmann, R., Mueller, F., Puaut, I., Puschner, P., Staschulat, J., Stenström, P.: The
determination of worst-case execution times—overview of the methods and survey of tools. ACM
Transactions on Embedded Computing Systems (TECS) 7(3) (2008)
17. Logothetis, G., Schneider, K.: Exact high level WCET analysis of synchronous programs by symbolic
state space exploration. In: Design, Automation and Test in Europe (DATE), Munich, Germany,
IEEE Computer Society (March 2003) 196–203
18. Logothetis, G., Schneider, K., Metzler, C.: Exact low-level runtime analysis of synchronous programs
for formal verification of real-time systems. In: Forum on Design Languages (FDL), Frankfurt,
Germany, Kluwer (2003)
19. André, C., Boulanger, F., Péraldi, M.A., Rigault, J.P., Vidal-Naquet, G.: Objects and synchronous
programming. European Journal on Automated Systems 31(3) (1997) 417–432
20. Hainque, O., Pautet, L., Biannic, Y.L., Nassor, E.: Cronos: A separate compilation toolset for
modular Esterel applications. In Wing, J.M., Woodcock, J., Davies, J., eds.: World Congress on
Formal Methods. Volume 1709 of Lecture Notes in Computer Science., Springer (September 1999)
1836–1853
21. Lee, E.A., Zheng, H., Zhou, Y.: Causality interfaces and compositional causality analysis. In:
Foundations of Interface Technologies (FIT’05). ENTCS, Elsevier (2005)
22. Wandeler, E., Thiele, L.: Real-time interfaces for interface-based design of real-time systems with
fixed priority scheduling. In: Proceedings of the ACM International Conference on Embedded Soft-
ware (EMSOFT’05). (September 2005)
23. Henzinger, T., Matic, S.: An interface algebra for real-time components. In: Proceedings of the 12th
Annual Real-Time and Embedded Technology and Applications Symposium (RTAS), Los Alamitos,
CA, USA, IEEE Computer Society (2006) 253–266
24. Boudec, J.L., Thiran, P.: Network Calculus - A theory of deterministic queuing systems for the
internet. Volume 2050 of Lecture Notes in Computer Science. Springer (2001)
25. de Alfaro, L., Henzinger, T.: Interface automata. In: Proc. Foundations of Software Engineering,
ACM Press (2001) 109–120
26. Lee, E.A., Xiong, Y.: System-level types for component-based design. In: Workshop on Embedded
Software EMSOFT 2001, Lake Tahoe, CA, USA (October 2001)
27. Lee, E.A., Xiong, Y.: A behavioral type system and its application in Ptolemy II. Formal Aspects
of Computing 13(3) (August 2004) 210–237
28. Mendler, M.: Characterising combinational timing analyses in intuitionistic modal logic. The Logic
Journal of the IGPL 8(6) (November 2000) 821–853
29. Lüttgen, G., Mendler, M.: Axiomatizing an algebra of step reactions for synchronous languages.
In Brim, L., Jančar, P., Ket́ınský, M., Kučera, A., eds.: International Conference on Concurrency
Theory (CONCUR’02). Number 2421 in Lecture Notes in Computer Science, Brno, Springer (August
2002) 386–401
30. Lüttgen, G., Mendler, M.: Towards a model-theory for Esterel. In Maraninchi, F., Girault, A.,
Rutten, E., eds.: Synchronous Languages, Application, and Programming (SLAP ’02). Volume 65,5
of ENTCS., Elsevier Science (2002)
26
A. WCRT INTERFACE TYPES
A WCRT Interface Types
Executions and schedules are typed using logical expressions of the form f :φ consisting of an
interface formula φ (scheduling type) together with a timing function f (scheduling bound). The
canonical interfaces used in this paper (introduced in Sec. 5.1) are of the form
(ζ1 ∨ ζ2 · · · ∨ ζm) ⊃ (◦ξ1 ⊕ ◦ξ2 ⊕ · · · ⊕ ◦ξn) (24)
with basic controls ζi and ξj . For these types the timing functions are essentially delay matrices
of shape n×m. These interfaces are particular structures in a more general theory that we shall
briefly outline now. First, recall our language of types
φ ::= A | true | false | ¬φ | φ ∧ φ | φ ∨ φ | φ ⊕ φ | φ ⊃ φ | φ ‖ φ | ◦φ,
where A ∈ S ranges over a set of control signals. General WCRT theory associates with every
scheduling type φ a set of scheduling bounds or reaction bounds Bnd(φ) uniformly as follows:
Bnd(false) = 1 Bnd(true) = 1
Bnd(A) = 1 Bnd(¬φ) = 1
Bnd(φ ∧ ψ) = Bnd(φ) × Bnd(ψ) Bnd(φ ∨ ψ) = Bnd(φ) + Bnd(ψ)
Bnd(φ ⊕ ψ) = Bnd(φ) × Bnd(ψ) Bnd(φ ⊃ ψ) = Bnd(φ) → Bnd(ψ)
Bnd(◦φ) = (N ∪ {−∞}) × Bnd(φ) Bnd(φ ‖ ψ) = Bnd(φ) × Bnd(ψ),
where 1 = {0} is a distinguished singleton set. More generally, we use the notation n for n ∈ N to
denote the set {0, 1, . . . , n−1}, discretely ordered. Elements of the disjoint sum Bnd(φ)+Bnd(ψ)
are presented as pairs (0, f) where f ∈ Bnd(φ) or (1, g) where g ∈ Bnd(ψ). An element f ∈ Bnd(φ)
is a form of generalized higher-order timing matrix for schedules of shape φ.
Let σ = ∅ ⊆ σ(0) ⊆ σ(1) ⊆ σ(2) ⊆ · · · ⊆ σ(n − 1) ⊆ S be an execution. We define a sub-
execution σ′ ⊆ σ to be a sub-sequence σ′ = ∅ ⊆ σ′(0) ⊆ σ′(1) ⊆ σ′(2) ⊆ · · · ⊆ σ′(m − 1) ⊆ S
consisting of an arbitrary number of events σ′(i) = σ(fi) of σ with a monotonic function f :m → n.
Such a sub-sequence models a sub-thread of σ which observes only a certain subset of events
from σ according to when it is scheduled. We write σ = σ1 + σ2 to express that execution σ can
be partitioned into sub-executions σ1, σ2 ⊆ σ such that each computation step (σ(i), σ(i + 1))
in σ is contained in σ1 or in σ2. As a degenerated case the empty execution can always be split
∅ = ∅ + ∅. Finally, for every d ∈ N ∪ {−∞} we define the delayed execution σ[d, :] to be the
sequence σ[d, :] = σ(d) ⊆ σ(d + 1) ⊆ σ(d + 2) ⊆ · · · ⊆ σ(n − 1). If d ≥ |σ| then σ[d, :] = ∅ is the
empty execution. If d = −∞ then σ[d, :] = ∅σ, i.e., an initial empty event is added to σ.
We say that an execution σ validates a scheduling type φ with bound f ∈ Bnd(φ) written
σ |= f : φ, according to the rules
σ |= 0 : false iff |σ| = 0
σ |= 0 : true iff always
σ |= 0 : A iff ∀0 ≤ j < |σ| ⇒ A ∈ σ(j)
σ |= (f, g) : φ ∧ ψ iff σ |= f : φ and σ |= g : ψ
σ |= (0, f) : φ ∨ ψ iff σ |= f : φ
σ |= (1, g) : φ ∨ ψ iff σ |= g : ψ
σ |= (f, g) : φ ⊕ ψ iff σ |= f : φ or σ |= g : ψ
σ |= f : φ ⊃ ψ iff ∀σ′ ⊆ σ. ∀g ∈ Bnd(φ). (σ′ |= g : φ ⇒ σ′ |= f g : ψ)
σ |= (t, f) : ◦φ iff σ[t, :] |= f : φ
σ |= (f, g) : φ1 ‖ φ2 iff ∃σ1, σ2 ⊆ σ. σ = σ1 + σ2 and σ1 |= f : φ1 and σ2 |= g : φ2.
One shows by induction on type φ that the empty execution validates all types. Similarly, for all
delay times d after the end of an execution, d ≥ |σ|, we have σ[d, :] |= f : φ for all types f : φ,
including 0 : false. Validity is monotonic in the sense that as the delay time increases more and
more types may become valid. Formally, if σ[d, :] |= f : φ then also σ[d′, :] |= f : φ for all d′ ≥ d.
More generally, truth is inherited by sub-schedules, i.e., if σ |= f : φ and σ′ ⊆ σ then σ′ |= f : φ,
27
A. WCRT INTERFACE TYPES
too. The inclusion σ[d′, :] ⊆ σ[d, :] for d′ ≥ d is a special case of this. We will sometimes write
σ, d |= f : φ for σ[d, :] |= f : φ.
Notice how ‖ captures multi-threading. An execution σ validates φ1 ‖ φ2 if it can be divided
into two sub-executions σ1, σ2 each of which validates its part of the composition. In general
σi overlaps with parts of the other execution σj (i 6= j). All interleaved activities inside σ2
overlapping with σ1 are happening concurrently from σ1’s local point of view and can involve
the simultaneous occurrence of control signals. So, although at the outermost level of a master-
thread σ each step (σ(i), σ(i + 1)) involves only a single control signal, a sub-thread σ1 ⊆ σ
may well experience the occurrence of several signals in the same local instruction cycle. This is
why in multi-threading (and in multi-processing for that matter) we need to consider arbitrary
executions sequences σ.
A set of executions S defines a schedule. Thus, our semantics associates with every pair f : φ
a schedule [[f : φ]] = {σ | σ |= f : φ }. It is useful to view f : φ as a specification of executions
which combines a qualitative aspect φ with the quantitative aspect f ∈ Bnd(φ). The scheduling
type φ captures the abstract causal relationships between the control points and the scheduling
bound f refines this chronometrically by giving concrete numeric distances and durations. The
colon as a binary connective separates these concerns. Specifications may be compared naturally
in terms of their executions. We write f : φ ∼= g : ψ if [[f : φ]] = [[g : ψ]] and f : φ ¹ g : ψ if
[[f : φ]] ⊆ [[g : ψ]].
Example 2. For instance, the pair (5, 0) : ◦false specifies all executions which have at most 5
events and thus consume at most 4 cycles. The type false represents abstract termination and
(5, 0) measures the duration until it occurs. Note that the second component 0 ∈ Bnd(false) = 1
in the pair (5, 0) is needed for systematic reasons to deal with the general case, uniformly. It does
not have any semantic meaning, however, and can be dropped systematically if needed. So, we
would write 5 : ◦false instead.
The pair λx. (12, 0) : A ⊃ ◦B comprises all executions in which control point B is necessarily
passed after at most 11 scheduling steps after any occurrence of A. Here, the type A ⊃ ◦B
expresses a causal control flow from A to B and the bound λx. (12, 0) measures the distance.
Again, the arguments x ∈ Bnd(A) = 1 and value 0 ∈ Bnd(B) = 1 are irrelevant in this function,
whence we will write (12) : A ⊃ ◦B more compactly.
In a concrete WCRT analysis problem we are given some schedule S and a scheduling type
φ (as its specification) and ask for a stabilization bound f such that S ⊆ [[f : φ]]. If such a
bound exists we say that S is well timed for φ with bound f , and write S |= f : φ. If we are
not interested in the bound itself, we write S |= φ to express that S is well-timed for φ, i.e.,
there exists f ∈ Bnd(φ) such that S ⊆ [[f : φ]]. In this way each scheduling type φ defines a class
of well-timed schedules [[φ]] =df {S | S |= φ} while a pair f : φ specifies individual executions.
Because of this it is possible to use expressions f : φ themselves as generalized “control signals”
in types such as (f : φ) ∧ ψ or ◦(f : φ). We simply define Bnd(f : φ) =df 1 and σ |= 0 : (f : φ) iff
σ |= f : φ.
For any given schedule S there may be infinitely many bounds under which S is well-timed
for a type. We will be interested in optimal bounds. To make this formal we introduce a partial
ordering ⊑ on bounds, so that f ⊑ g means f is tighter than g. The ordering on Bnd(φ) is
generated by induction on type φ from the natural ordering ≤ on N, taking point-wise ordering
on products Bnd(φ)×Bnd(ψ) and function spaces Bnd(φ) → Bnd(ψ). For disjoint unions Bnd(φ)+
Bnd(ψ) we take the discrete ordering, so that (i, f) ⊑ (j, g) iff i = j and f ⊑ g. Then, a scheduling
bound f ∈ Bnd(φ) is exact or worst-case for S and φ, if for all g ∈ Bnd(φ) such that g ⊑ f we
have f = g iff S |= g : φ.
Example 3. Bounded termination can be specified by the scheduling type ◦false because for any
execution σ, the statement σ |= n : ◦false says that σ is finite and has length at most n, i.e.,
|σ| ≤ n. Now suppose schedule SG is the set of all possible executions of a program G. The
WCRT of G then is the maximal number of steps in all executions WCRT(G) = max{|σ| − 1 |
σ ∈ SG}. This is one less than the minimal upper bound on the length of all executions, i.e.,
28
A. WCRT INTERFACE TYPES
WCRT(G) = min{n − 1 | ∀σ ∈ SG. |σ| ≤ n} and thus the same as the optimal scheduling bound
m such that SG |= m + 1 : ◦false. Specifically if SG = ∅, then WCRT(G) = −∞.
If program G is not stand-alone but a program fragment then SG is the set of all possible
executions of G in arbitrary program contexts into which G is embedded. Let us suppose that
G is started through control signal active and every thread entering G must leave G via a
unique control point end. In that case WCRT(G) is the least upper bound on the number of
cycles of any execution σ ∈ SG between the occurrence of active and end. In other words, we
want the optimal bound f such that SG |= (f + 1) : active ⊃ ◦end. In this paper the unique
termination point used is the global constant false. Abbreviating wait =df 1 : false we may write
SG |= (f) : active ⊃ ◦wait .
We have the following natural monotonicity property which says that if we relax the schedul-
ing bound or reduce the set of schedules we are not losing well-timedness:
Proposition 1. Let R,S be schedules and f, g ∈ Bnd(φ) scheduling bounds for type φ such that
R ⊆ S and f ⊑ g. Then, S |= f : φ implies R |= g : φ.
On the other hand if we enlarge the schedule (i.e., include more executions) then the worst-
case scheduling bound becomes larger in general and similarly if we tighten the bounds we get
fewer executions satisfying the bound. A bound f ∈ Bnd(φ) is a uniform bound for φ if S |= f : φ
for any schedule S, written |= f : φ.
The partial ordering (Bnd(φ),⊑) depends on the structure of a scheduling type φ and measures
the amount of WCRT information that is associated with φ. In this respect the most simple class
of types is that for which Bnd(φ) is (order) isomorphic to 1. This happens precisely if |Bnd(φ)| = 1.
Such types are called units since they only carry trivial timing information. An example of unit
propositions are the double negated formulas. Such statements may have uniform bounds, but
they do not contain any information. More precisely, it can be shown that if f ∈ Bnd(¬¬φ) and
σ |= f : ¬¬φ then for all g ∈ Bnd(¬¬φ), σ |= g : ¬¬φ. Thus, ¬¬φ either has a uniform scheduling
bound or it does not; and if it does all bounds are uniform bounds. Moreover, they cannot be
distinguished from each other by the relation |=. Hence they might just as well all be identified.
The unit types, which we denote by meta-variable ζ, are characterized syntactically as follows:
ζ ::= true | false | A | ζ ∧ ζ | ζ ⊕ ζ | φ ⊃ ζ,
where φ is an arbitrary type. The basic controls introduced in Sec. 5.1 are a particular class
of unit types. It is natural to exploit the isomorphisms Bnd(ζ) ∼= 1 and identify all bounds
f ∈ Bnd(ζ) canonically with the unique 0 ∈ 1. In fact we may simply identify ζ with 0 : ζ and
write d : ζ instead of (d, 0) : ◦ζ. Note that no non-empty execution satisfies σ |= −∞ : A for
any control signal A which means that timing delay −∞ can be used for non-existing control
dependencies.
Proposition 2. The type operators ∧, ⊕ and ‖ correspond to the arithmetic operations min,
max and +, respectively. More precisely, the values min(d1, d2), max(d1, d2) and d1 + d2 are the
optimal scheduling bounds such that
d1 : X ∧ d2 : Y ¹ min(d1, d2) : (X ⊕ Y ) ∼= min(d1, d2) : (X ∨ Y )
d1 : X ∧ d2 : Y ¹ max(d1, d2) : (X ∧ Y )
((d1 : X) ∧ (X ⊃ wait)) ‖ ((d2 : Y ) ∧ (Y ⊃ wait)) ¹ d1 + d2 : X ∧ Y
(d1 : X) ∧ (X ⊃ (d2 : Y )) ¹ d1 + d2 : X ∧ Y
d1 : X ⊕ d2 : Y ¹ max(d1, d2) : X ⊕ Y.
A somewhat larger but still rather special class of types are the formulas for which Bnd(φ)
is canonically order-isomorphic to a Cartesian product of numbers, i.e., to Nn for some n ≥ 0.
Here, by canonically order-isomorphic we mean that Bnd(φ) ∼= Nn can be derived by the natural
29
B. MULTI-THREADING COMPOSITION
isomorphisms of partial orderings N0 ∼= 1, N1 ∼= N, 1τ ∼= 1, Nk × Nn ∼= Nk+n, n × k ∼= nk,
n+k ∼= n + k, n → k ∼= kn, n → Nk ∼= Nnk alone. This implies that, for instance, the well-known
(dove-tailing) bijection between N and N2 does not count as canonical. The scheduling types φ
with Bnd(φ) ∼= Nn are called elementary. Elementary formulas are referred to by θ and generated
by the grammar
θ ::= θ ∧ θ | ◦ζ | φ ⊃ θ,
where ζ is a unit and φ is ◦-free. Note that every unit type is elementary. Elementary scheduling
types are of special interest since Bnd(φ) is a lattice and its elements first-order objects, i. e.
vectors of natural numbers. The following Proposition 3 provides the basis for WCRT analysis:
Proposition 3. Let θ be an elementary type and S a schedule such that S is well-timed for θ.
Then, the set { f | S |= f : θ } ordered by ⊑ is (nonempty and) a complete lower semi-lattice.
Prop. 3 implies the existence of unique worst-case stabilization bounds for elementary schedul-
ing types. More precisely, it says that for every schedule S we have a worst case reaction bound
relative to any scheduling type φ by putting WCRT(S, φ) = ⊓{f | S |= f : φ}. As a degenerate
case we have WCRT(S, φ) = −∞ if S = ∅.
The WCRT analyses suggested in this paper are implementations of scheduling bounds for














where xij and ykl are min-terms of literals over control signals S. The set of scheduling bounds
for such θ are canonically order-isomorphic to N|K|×|I| which can be understood as |K| × |I|
timing matrices relative to the base controls ζi and ξk. The logical conjunction of these inter-
faces in a fixed set of base controls corresponds to matrix multiplications in max-plus algebra
(N,+,max, 0,−∞). Furthermore, using logical reasoning on base controls ζi, ξj we can massage
the semantics of timing matrices very much like we do with base transformations in ordinary
linear algebra. This equally tight as uniform combination of timing algebra and logical reasoning
is the key to expressing timing abstractions and distinguishes our WCRT algebra from previous
work on component interfaces such as [21].
B Multi-threading Composition
This section explains how the operator ‖ is used to express multi-threading and also fills in some
technical details supporting the developments in Sec. 5.4.
In multi-threading we need to enforce termination of threads. There must be some means
of expressing that a schedule does not lock up inside a thread which has reached a certain
exit point but rather hands over to some other thread. In fork-join blocks this may be either a
concurrent sibling thread or the parent thread. Stopping can in fact be specified by the special
type wait =df 1 : false which is satisfied on singleton intervals, i.e., σ |= wait iff |σ| ≤ 1.
7
Take the example in Fig. 10 (b): If T1 reaches its exit control Y1 then T1 is finished and yields
back to the scheduler, i.e. either to thread T2 or the parent thread which means leaving the join
through Y . This yielding of T1 to T2 is specified by the scheduling type (Y1 ∧ ¬Y2) ⊃ wait . Any
execution σ1 of T1 satisfying σ1 |= (Y1 ∧ ¬Y2) ⊃ wait which reaches control point Y1 in some
cycle i must either include Y2 in the very next cycle i + 1 or terminate, i.e., Y1 occurs in the last
control event of σ1. Formally, σ |= (Y1 ∧ ¬Y2) ⊃ wait iff ∀i < |σ| − 1. Y1 ∈ σ(i) ⇒ Y2 ∈ σ(i + 1).
Now, consider the type
φ1 ‖ φ2 =df ◦Y1 ∧ ((Y1 ∧ ¬Y2) ⊃ wait) ‖ ◦Y2 ∧ ((Y2 ∧ ¬Y1) ⊃ wait). (25)
7 Note the expression (¬X) ⊃ wait means that X has a down-time of at most 1 ic. The combination
















Fig. 10: Interleaved concurrent threads T1, T2 (a) and composite system T with both join and fork (b).
and assume σ |= (d1, d2) : φ1 ‖ φ2 and d1, d2 ≥ 1. Then, by definition, there are sub-executions
σi ⊑ σ with σ = σ1 + σ2 such that σi |= di : φi for i, j ∈ {1, 2} and i 6= j. Let us look at σ1
which is the sub-execution in which thread T1 is executed, satisfying σ1 |= d1 : φ1. The other case
σ2 |= d2 : φ2 is perfectly symmetric. First, σ1 |= d1 : ◦Y1 says that after maximal delay d1 (taken
relative to σ1, not σ) σ1 will activate the exit control Y1 of T1. Second, σ1 |= 0 : ((Y1∧¬Y2) ⊃ wait)
makes sure that when Y1 appears then either σ1 terminates or Y2 appears in the very next
cycle. Thus, the execution σ passes through Y2 or continues outside of σ1 (in σ2) no later than
immediately in the next cycle after T1 has reached Y1. The same applies to the sub-execution
σ2 |= d2 : φ2 with delay d2. Since the total execution σ = σ1 + σ2 is fully covered by σ1 and σ2
we conclude that σ must reach the conjunction of Y1 and Y2 within d1 + d2 ics
8, i.e.,
σ |= d1 + d2 : ◦(Y1 ∧ Y2). (26)
Let us see in more detail why this must be true. First, if σ is not longer than d1 + d2 then
(26) holds trivially. Hence, suppose σ comprises at least d1 + d2 + 1 ics, i.e., |σ| ≥ d1 + d2 + 1.
It is easy to see that then at least one of the executions σi must contain at least di + 1 cycles.
For otherwise, if both |σ1| ≤ d1 and |σ2| ≤ d2 then |σ| ≤ |σ1| + |σ2| − 1 ≤ d1 + d2 − 1 which is a
contradiction. The subtraction of 1 here is due to the fact that σ1 and σ2 form a transition cover
of σ, and the number of transitions of an execution is one less than its length.
Consider the case that |σ1| ≥ d1 +1. Since σ1 |= d1 : ◦Y1 we know that after a maximal delay
d1 the schedule σ1 will activate the exit control Y1, i.e., σ1, i1 |= 0 : Y1 for some i1 ≤ d1 ≤ |σ1|−1.
Further, because of σ1 |= ((Y1 ∧ ¬Y2) ⊃ wait), we either have i1 = |σ1| − 1 or Y2 ∈ σ1(i1 + 1).
The latter implies that σ1, i1 + 1 |= Y1 ∧ Y2 and thus (26) because i1 + 1 ≤ d1 + 1 ≤ d1 + d2.
Thus it remains to look at the former case in which we conclude |σ1| = d1 +1. Thus, schedule
σ2 must have length |σ2| ≥ |σ|−|σ1|+1 = |σ|−d1 ≥ d2 +1. By an argument analogous to above,
σ2 |= d2 : ◦Y2 and σ2 |= ((Y2 ∧ ¬Y1) ⊃ wait) give us Y2 ∈ σ2(j2) for some j2 ≤ d2 ≤ |σ2| − 1
such that either (i) j2 = d2 = |σ2| − 1, or (ii) Y1 ∈ σ2(j2 + 1). Again, (ii) yields (26) immediately
because then σ2, j2 + 1 |= Y1 ∧ Y2 and j2 + 1 ≤ d2 + 1 ≤ d1 + d2.
Now what if (i) is true? Consider the absolute cycle times i and j in the global schedule σ
which correspond to the relative indices i1 and j2 in the sub-executions σ1 and σ2, respectively.
Recall that for these indices we have σ, i |= 0 : Y1 and σ, j |= 0 : Y2. By transition cover
σ = σ1 + σ2 and we must have j − j2 ≤ |σ1| − 1 and i − i1 ≤ |σ2| − 1. From this is follows that
max(i, j) ≤ max(|σ2|+ i1 −1, |σ1|+ j2 −1) = max(d2 +1+d1 −1, d1 +1+d2 −1) = d1 +d2. This
proves (26). Since our argument is symmetric in σ1 and σ2 we have thus proven (26) whenever
|σi| ≥ di + 1 for some i = 1, 2.
One can show in fact that the sum d1 +d2 is the best uniform delay d such that σ |= (d1, d2) :
φ1 ‖ φ2 implies σ |= d1 + d2 : ◦(Y1 ∧ Y2) for arbitrary executions σ. More precisely, one can show
that for any number d < d1 + d2 we can find an execution such that σ |= (d1, d2) : φ1 ‖ φ2 but
8 Recall that ic is a time unit short for ’instruction cycle’
31
B. MULTI-THREADING COMPOSITION
σ 6|= d : ◦(Y1 ∧ Y2) (see also Prop. 2). Of course, this is the WCRT for the joint instantaneous
execution of T1 and T2 under multi-threading ignoring signal cross-dependencies.
To keep matters concise we introduce the derived operator
φ1 ‖X,Y φ2 =df (φ1 ∧ ((X ∧ ¬Y ) ⊃ wait)) ‖ (φ2 ∧ ((Y ∧ ¬X) ⊃ wait))
with bounds Bnd(φ1 ‖X,Y φ2) =df Bnd(φ1) × Bnd(φ2) such that σ |= (f, g) : φ1 ‖X,Y φ2 iff
σ |= (f, g) : (φ1 ∧ ((X ∧ ¬Y ) ⊃ wait)) ‖ (φ2 ∧ ((Y ∧ ¬X) ⊃ wait)).
Multi-threading of Surface Interfaces Let us further assume each thread inside a fork-join
has (at most) one instantaneous entry Xi and one instantaneous exit control Yi and is specified
by its surface WCRT interface:
T1,srf = (d1, e1)
T : X1 ⊃ (◦Y1 ⊕ ◦wait) = X1 ⊃ (d1:Y1 ⊕ e1:wait)
T2,srf = (d2, e2)
T : X2 ⊃ (◦Y2 ⊕ ◦wait) = X1 ⊃ (d2:Y2 ⊕ e2:wait).
The single entry Xi is justified since there are no inter-level transitions which would enter a thread
from outside the concurrent block. There may be inter-thread dependencies between concurrent
threads, though, but we ignore these for the moment. As explained above, if there are several
instantaneous exits Yik in a thread Ti then these are assumed to have been bundled into a single
exit Yi = ⊕kYik, which simply amounts to forming the maximum as we have seen. Moreover,
without loss of generality we may suppose that the Ti are normalized so that we have di ≤ ei for
all sink nodes.
The surface interface of T1,srf ‖Y1,Y2 T2,srf (the system seen in Fig. 10 (a) without fork or join)
is given by the smallest d, e such that
T1,srf ‖Y1,Y2 T2,srf ¹ X1 ∧ X2 ⊃ d : (Y1 ∧ Y2) ⊕ e : wait .
To obtain these the timing interface T1,srf ‖Y1,Y2 T2,srf is transformed in context
9 X1 ∧ X2 as
follows:
X1,X2 ⊢ T1,srf ‖Y1,Y2 T2,srf = (X1 ⊃ (d1:Y1 ⊕ e1:wait)) ‖Y1,Y2 (X2 ⊃ (d2:Y2 ⊕ e2:wait))
= (true ⊃ (d1:Y1 ⊕ e1:wait)) ‖Y1,Y2 (true ⊃ (d2:Y2 ⊕ e2:wait))
= (d1:Y1 ⊕ e1:wait) ‖Y1,Y2 (d2:Y2 ⊕ e2:wait)
¹ ((d1 + d2):(Y1 ∧ Y2) ⊕
(d1 + e2):(Y1 ∧ wait) ⊕
(e1 + d2):(wait ∧ Y2) ⊕
(e1 + e2):(wait ∧ wait))
where we exploit Prop. 2 and the equivalence (true ⊃ φ) ∼= φ. This type adds up all possible
ways in which the through paths and sink paths of T1 and T2 may interleave. The underlying























d1 + d2 d1 + e2 e1 + d2 e1 + e2
)T
The type Y1 ∧ Y2 specifies the cases where each block Ti executes a through path from Xi to
Yi which amounts to a through path of T1,srf ‖Y1,Y2 T2,srf. The types Y1 ∧ wait and wait ∧ Y2
cover the interleaving of a through path in one block with a sink path in the other and finally
wait ∧wait types the executions in which both blocks eventually pause through sink paths. Since
a single pausing thread forces the whole fork-join block to pause we get that in all cases except the
first one, the composition T1,srf ‖Y1,Y2 T2,srf pauses, i.e., it executes a sink path. This is reflected
9 Proving an (in-)equation φ ¹ ψ in the context X1 ∧ X2, i.e., X1 ∧ X2 ⊢ φ ¹ ψ amounts to proving
φ ¹ ψ using the equations X1 ∼= true and X2 ∼= true (which are equivalent to equation X1∧X2 ∼= true.
32
B. MULTI-THREADING COMPOSITION
by the (in)equations Y1 ∧wait ¹ wait , wait ∧ Y2 ¹ wait , wait ∧wait ∼= wait which we can use to
condense the type of T1,srf ‖Y1,Y2 T2,srf into
T1,srf ‖Y1,Y2 T2,srf ¹
(
d1 + d2
max(d1 + e2, e1 + d2, e1 + e2)
)
: X1 ∧ X2 ⊃ (◦(Y1 ∧ Y2) ⊕ ◦wait) (27)
considering that x:wait⊕y:wait ∼= max(x, y):wait in general. This means that the desired through
WCRT of T is d = d1 + d2 and the sink WCRT e = max(d1 + e2, e1 + d2, e1 + e2).
In the computation of the sink WCRT e it is useful to distinguish between sink and non-sink
nodes. Provided at least one system T1 or T2 is a (normalized) sink node satisfying di ≤ ei, one can
simplify e = max(d1+e2, e1+d2, e1+e2) = max(d1, e1)+max(d2, e2). Thus, in this case, we simply
take the maximum of instantaneous and sink paths in each block separately and add them up. If
none of T1 or T2 is a sink node (e1 = e2 = −∞) we get e = max(d1 + e2, e1 + d2, e1 + e2) = −∞,
i.e., T1 ‖Y1,Y2 T2 is not a sink node either. This suggest the following simple WCRT strategy on
normalized surface interfaces: We always compute the sink WCRT as
e = max(d1, e1) + max(d2, e2).
Then, if we find d = d1 + d2 ≥ e, we normalize and put e = −∞. This is very useful since
e = max(d1, e1) + max(d2, e2) is compositional and thus more efficient than the general e =
max(d1 + e2, e1 + d2, e1 + e2) from (27).
Another potential optimization arises if both T1 and T2 are sink nodes with di ≤ ei. Then,
max(d1 + e2, e1 + d2, e1 + e2) = e1 + e2. Thus, for sink interfaces, the instantaneous WCRT
for concurrent blocks can be calculated separately for the through paths and sink paths. The




















which might be exploited for parallelization of WCRT analyses.
Multi-threading of Depth Interfaces The specification for T1,srf ‖Y1,Y2 T2,srf was built from
the surface interfaces of T1,srf and T2,srf which only cover through paths and sink paths. To get
the full picture the concurrent composition still needs to include the source paths and internal
paths that make up the depth interface active ⊃ (◦(Y1 ∧ Y2) ⊕ ◦wait). Let us look at them now.
The interface types of T1,dpt, T2,dpt for source and internal paths are, respectively,
T1,dpt = (s1, t1)
T : active1 ⊃ (◦Y1 ⊕ ◦wait) = active1 ⊃ (s1:Y1 ⊕ t1:wait)
T2,dpt = (s2, t2)
T : active2 ⊃ (◦Y2 ⊕ ◦wait) = active2 ⊃ (s2:Y2 ⊕ t2:wait).
As before we may assume that internal nodes (i.e., those for which ti 6= −∞) satisfy si ≤ ti since
otherwise the internal paths would not contribute to the critical path starting in block Ti and
we could replace ti by −∞ without changing the WCRT.
Obviously, to obtain a source path active ⊃ ◦(Y1∧Y2) which leaves the system instantaneously
we need to interleave the source paths from T1 and T2 and take into account that only one of
the threads has paused while the other has terminated instantaneously at exit control Yi. In
general, for every nonempty subset of pausing threads we calculate the interleaving of their
source paths, while assuming that all other threads are already terminated instantaneously,
contributing nothing.
Unfortunately, we cannot simply put the depth types T1,dpt and T2,dpt in parallel as before,
say
T1,dpt ‖Y1,Y2 T2,dpt
= active1 ⊃ (s1:Y1 ⊕ t1:wait) ‖Y1,Y2 active2 ⊃ (s2:Y2 ⊕ t2:wait)
¹ active1 ∧ active2 ⊃ (s1 + s2) : (Y1 ∧ Y2) ⊕ max(t1 + s2, s1 + t2, t1 + t2) : wait .
33
B. MULTI-THREADING COMPOSITION
The reason is that if a source path, say from T1 has WCRT s1 = −∞, i.e., if T1 is not a source
node, then the overall source WCRT of T would come out as s1 + s2 = −∞ + s2 = −∞ even if
T2 is a source node with s2 ≥ 0. In other words, in T1 would block the execution of the source
node T2. Instead, the right source WCRT would be s2. This is the delay we get if T2 starts
from its pause and block T1 has been instantaneously terminated in the previous clock cycle
and is waiting at the exit control Y1 rather than an internal pause node (from which it would
never reach exit Y1 because s1 = −∞). The key observation is that source-paths in a concurrent
composition are not composed conjunctively like the through-paths but disjunctively as seen in
the following table:
active1 ⊃ ◦Y1 active2 ⊃ ◦Y2 active1 ∧ active2 ⊃ ◦(Y1 ∧ Y2)
s1 ≥ 0 s2 ≥ 0 s1 + s2
s1 = −∞ s2 ≥ 0 s2
s1 ≥ 0 s2 = −∞ s1
s1 = −∞ s2 = −∞ −∞
To implement this disjunctive behavior we might try to coerce −∞ to 0 before we add the
paths as in max(0, s1) + max(0, s2). However, this would give us a source WCRT of 0 in case
s1 = s2 = −∞ which is not correct. For if none of the blocks T1 and T2 is a source node their
composition T should not be a source node either.
The right way to combine the source paths of concurrent blocks T1, T2, ..., Tn is to take the
maximum over all possible ways in which some subset of blocks are started (activei) from an
internal pause while the other blocks already wait at their exit controls Yi. The degenerated case
where all blocks are at their exits Yi must be excluded since no source path of the composition
T can start in this way. Any path which reaches Y1 ∧ Y2 ∧ · · · ∧ Yn would have left the block T
instantaneously in the previous logical tick.
To achieve this composition we activate the source paths of T not with active ⊃ (active1 ∧
active2) but with
active ⊃ (active1 ⊕ active2) ∧ (active1 ⊕ ¬active1) ∧ (active2 ⊕ ¬active2). (28)
This says that once the fork-join block is active at least one block Ti executes a source path
(active1 ⊕ active2) and also that in those executions the activation controls behave as constants
(activei ⊕¬activei), i.e., they are either switched on or switched off, throughout. The right-hand
side of type (28) is equivalent to listing all the possible cases
active ⊃ (active1 ∧ active2) ⊕ (active1 ∧ ¬active2) ⊕ (¬active1 ∧ active2) (29)
in which one block is activated and all others are switched off. Note however that this expansion
(29) is exponential in the number of concurrent blocks, in contrast to (28) which is linear. In
general, for blocks T1, T2, . . . , Tn (28) would be





The other measure to take is to instrument the parallel composition T1 ‖Y1,Y2 T2 so that the
source paths of an activated block Ti do not get preempted by the other block Tj if it is not
active, too. To make inactive blocks wait at their join we strengthen their specification as follows:
T ∗i =df Ti ∧ ((active ∧ ¬activei) ⊃ (Yi ∧ wait)). (30)
The type (active∧¬activei) ⊃ (Yi ∧wait) adds to Ti waiting configurations specified by Yi ∧wait
whenever the parallel composition in which Ti is part of is activated (active) but Ti itself is not
(¬activei). The type Yi ∧ wait consists of singleton events in which control Yi is present. The
conjunction Yi ∧ wait simulates Ti waiting at its exit Yi.
34
B. MULTI-THREADING COMPOSITION
Putting (30) together with (28) we obtain:
active, activei ⊢ T
∗
i = Ti ∧ ((active ∧ ¬activei) ⊃ (Yi ∧ wait))
= Ti ∧ ((true ∧ false) ⊃ (Yi ∧ wait))
= Ti ∧ (false ⊃ (Yi ∧ wait))
= Ti ∧ true
= Ti
= activei ⊃ (si:Yi ⊕ ti:wait)
= true ⊃ (si:Yi ⊕ ti:wait)
= si:Yi ⊕ ti:wait .
active,¬activei ⊢ T
∗
i = Ti ∧ ((active ∧ ¬activei) ⊃ (Yi ∧ wait))
= Ti ∧ ((true ∧ true) ⊃ (Yi ∧ wait))
= Ti ∧ (Yi ∧ wait)
= (activei ⊃ (si:Yi ⊕ ti:wait)) ∧ (Yi ∧ wait)
= (false ⊃ (si:Yi ⊕ ti:wait)) ∧ (Yi ∧ wait)
= true ∧ (Yi ∧ wait)
= Yi ∧ wait .
Now we put the two blocks in parallel in all three activation contexts (29) using the approximation
φ ∧ wait ¹ wait and x:φ ⊕ y:φ ¹ max(x, y):φ:
active, active1, active2 ⊢
T ∗1 ‖Y1,Y2 T
∗
2 = (s1:Y1 ⊕ t1:wait) ‖Y1,Y2 (s2:Y2 ⊕ t2:wait)
¹ (s1 + s2) : (Y1 ∧ Y2) ⊕ (s1 + t2) : (Y1 ∧ wait) ⊕
(t1 + s2) : (wait ∧ Y2) ⊕ (t1 + t2) : (wait ∧ wait)
¹ (s1 + s2) : (Y1 ∧ Y2) ⊕ max(t1 + s2, s1 + t2, t1 + t2) : wait (31)
active, active1,¬active2 ⊢
(T ∗1 ‖Y1,Y2 T
∗
2 ) = (s1:Y1 ⊕ t1:wait) ‖Y1,Y2 (Y2 ∧ wait)
¹ s1:(Y1 ∧ Y2) ⊕ t1:(Y2 ∧ wait). (32)
active,¬active1, active2 ⊢
(T ∗1 ‖Y1,Y2 T
∗
2 ) = (Y1 ∧ wait) ‖Y1,Y2 (s2:Y2 ⊕ t2:wait)
¹ t2:(Y1 ∧ wait) ⊕ s2:(Y1 ∧ Y2). (33)
Finally, we sum up (31)–(33) under active as specified in (28), or equivalently (29):
active ⊢ T ∗1 ‖Y1,Y2 T
∗
2 ¹ (s1 + s2) : (Y1 ∧ Y2) ⊕ max(t1 + s2, s1 + t2, t1 + t2) : wait ⊕
s1:(Y1 ∧ Y2) ⊕ t1:(Y2 ∧ wait) ⊕
t2:(Y1 ∧ wait) ⊕ s2:(Y1 ∧ Y2) ⊕
¹ max(s1, s2, s1 + s2) : (Y1 ∧ Y2) ⊕
max(t1, t2, t1 + s2, s1 + t2, t1 + t2) : wait ,
again using φ ∧ wait ¹ wait , x:φ ⊕ y:φ ¹ max(x, y):φ, as well as associativity, commutativity




2 and t =df
max(t1, t2, t1 + s2, s1 + t2, t1 + t2) the internal WCRT.
Regarding the source WCRT s, this is what we wanted: The WCRT s = max(s1, s2, s1 + s2)
for source paths of T is the sum of all non-∞ source paths of the sub-blocks T1 and T2. If all
source paths are −∞ then the source WCRT of T is −∞, too, and T is a not a source node. A
simple way top compute s this is by iterative monotonic update: initialize x0 ← −∞ and then
for each source WCRT si of block Ti take xi+1 ← si + max(0, xi).
35
B. MULTI-THREADING COMPOSITION
Next, consider the internal WCRT t = max(t1, t2, t1 + s2, s1 + t2, t1 + t2). Let us see how this
could be transformed into a sum-of-maxs. Obviously, t = −∞ iff both t1 = −∞ and t2 = −∞,
i.e., if the composition is an internal node iff one of the blocks is internal. Suppose one of the
blocks, say T1, is internal, whence t1 ≥ 0. By normalization we may assume s1 ≤ t1. We claim
that the internal WCRT t can also be determined by the sum-of-max expression
t′ =df max(0, s1, t1) + max(0, s2, t2)
if at least one of T1, T2 is an internal node and otherwise t = −∞. First observe that because
of t1 ≥ 0 and s1 ≤ t1 the first summand of t
′ reduces to max(0, s1, t1) = t1, so that t
′ =
t1 + max(0, s2, t2) = max(t1, t1 + s2, t1 + t2) invoking the distributive law x + max(y1, y2) =
max(x + y1, x + y2) of max-plus algebra. Now if t2 = −∞ then t
′ = t is easily shown since
t′ = max(t1, t1 + s2, t1 + t2) = max(t1, t1 + s2) and also t = max(t1, t2, t1 + s2, s1 + t2, t1 + t2) =
max(t1, t1 + s2) due to x+(−∞) = −∞ and max(x, y,−∞) = max(x, y). In the other case, both
t1 ≥ 0 and t2 ≥ 0 so we have t2 ≤ t1 + t2 and s1 + t2 ≤ t1 + t2. Then both t2 and s1 + t2 can be
added to the sums of t′, whence t′ = max(t1, t1+s2, t1+t2) = max(t1, t2, t1+s2, s1+t2, t1+t2) = t
as desired. The calculations are symmetric in ti which means that if one of the blocks is internal
then the internal WCRT can indeed be computed by t = max(0, s1, t1)+max(0, s2, t2) as claimed.
Putting it Together: Adding Fork and Join Observe that the surface interface T1 ‖Y1,Y2 T2
does not change if we replace Ti by T
∗
i = Ti ∧ θi where θi =df (active ∧ ¬activei) ⊃ (Yi ∧ wait).
We have







: X1 ∧ X2 ⊃ (◦(Y1 ∧ Y2) ⊕ ◦wait) (34)
with d = d1 + d2 and e = max(d1, e1) + max(d2, e2) (subject to normalization). This is because
the conjunctive additions θi in the latter over the former do not affect the behavior under input




2 ) ¹ ψ since (T1 ∧ θ1) ‖ (T2 ∧ θ2) is a
subset of executions of T1 ‖ T2. Also, we do not lose exactness since every longest path execution
of T1 ‖ T2 which violates (d
′, e′)T : ψ for d′ < d or e′ < e do not (need to) involve the active
control and thus are also executions of T ∗1 ‖ T
∗
2 .
On the other hand, we have the depth interface







: active ⊃ ◦(Y1 ∧ Y2) ⊕ ◦wait ,
with s = max(s1, s2, s1 + s2) and t = max(0, s1, t1) + max(0, s2, t2) (subject to normalization as
described above). Putting both interfaces together yields







: ((X1 ∧ X2) ∨ active) ⊃ ◦(Y1 ∧ Y2) ⊕ ◦wait .
Up to this point we have captured the interleaved execution of threads T1 and T2 without
the fork and join nodes. This is visualized in Fig. 10 (a) where the horizontal bars denote the
concurrent synchronization of T1 and T2 at entry and exit controls. In particular it indicates that
the combined entry and exit are X1∧Y2 and Y1∧Y2, respectively, and that thread Ti yields when












: (X ∨ active(T )) ⊃ (◦(X1 ∧ X2) ⊕ ◦active)
The specification (4) : X ⊃ ◦(X1 ∧ X2) of fork includes the ics for two PAR, one PARE and one
JOIN. The entry (1) : active(T ) ⊃ ◦active takes care of the execution of join on all source and
36
B. MULTI-THREADING COMPOSITION
internal paths of T . Since the join is always executed when at least on thread is active, we add



























d + 4 s + 1
e + 4 t + 1
)
as the WCRT for the composite fork-join block T with canonical interface : (X ∨ active) ⊃
(◦Y ⊕ ◦wait). Note that we do not need to re-normalize. If d ≤ e then d + 4 ≤ e + 4 and if s ≤ t
then s + 1 ≤ t + 1.
37
