Calculating WCET Estimates from Timed Traces by Zolda, Michael & Kirner, Raimund
Research Archive
Citation for published version:
M. Zolda, and R. Kirner, “Calculating WCET estimates from 
timed traces”, Real Time Systems, Vol. 52(1), September 2015.
DOI:
10.1007/s11241-015-9240-1
Document Version:
This is the Published Version.
Copyright and Reuse:
© The Author(s) 2015. This article is published with open 
access at Springerlink.com.
This is an open-access article distributed under the terms of 
the Creative Commons Attribution License (CC BY). The use, 
distribution or reproduction in other forums is permitted, 
provided the original author(s) or licensor are credited and that 
the original publication in this journal is cited, in accordance 
with accepted academic practice. No use, distribution or 
reproduction is permitted which does not comply with these 
terms. 
Enquiries
If you believe this document infringes copyright, please contact the 
Research & Scholarly Communications Team at rsc@herts.ac.uk
Real-Time Syst (2016) 52:38–87
DOI 10.1007/s11241-015-9240-1
Calculating WCET estimates from timed traces
Michael Zolda1 · Raimund Kirner1
Published online: 1 September 2015
© The Author(s) 2015. This article is published with open access at Springerlink.com
Abstract Real-time systems engineers face a daunting duty: they must ensure that
each task in their system can always meet its deadline. To analyse schedulability
they must know the worst-case execution time (WCET) of each task. However, deter-
mining exact WCETs is practically infeasible in cost-constrained industrial settings
involving real-life code and COTS hardware. Static analysis tools that could yield
sufficiently tight WCET bounds are often unavailable. As a result, interest in portable
analysis approaches like measurement-based timing analysis is growing. We present
an approach based on integer linear programming (ILP) for calculating a WCET esti-
mate from a given database of timed execution traces. Unlike previous work, our
method specifically aims at reducing overestimation, by means of an automatic classi-
fication of code executions into scenarios with differing worst-case behaviour. To ease
the integration into existing analysis tool chains, our method is based on the implicit
path enumeration technique. It can thus reuse flow facts from other analysis tools and
produces ILP problems that can be solved by off-the-shelf solvers.
Keywords WCET · Worst-case execution time · IPET · Implicit path-enumeration
technique · ILP · Integer linear programming · Timing analysis · Measurement ·
Context-sensitive · Scenarios
B Michael Zolda
michael@zolda.eu
Raimund Kirner
r.kirner@herts.ac.uk
1 University of Hertfordshire, Hatfield AL10 9AB, UK
123
Real-Time Syst (2016) 52:38–87 39
1 Introduction
Modern software usually adopts a modular design with multiple interacting computa-
tional processes executing concurrently inside the system. These processes compete
for shared system resources like memory space, processing time, etc. In real-time
processor scheduling the demands of each process for processing time are specified
by a set of task parameters. Given all task parameters of the system, real-time sys-
tems engineers can easily check if a given scheduling policy (Liu and Layland 1973;
Dertouzos and Aloysius 1989; Xu and Parnas 1990; Neil et al. 1991) can guarantee to
arbitrate all resource demands in a timely manner during system operation.
We are concerned with the worst-case execution time (WCET)—a crucial task
parameter prescribing the maximal amount of time the corresponding process requires
a processor to complete.
We consider that processes are implemented in software. The problem we need to
solve then is: How can the prescribed WCET of a task (a mathematical abstraction
used in schedulability analysis) be linked to the actual behavior of executable code
on particular hardware? Specifically, how can we determine the WCET of a piece of
code on a given microprocessor?
Measurement-based timing analysis (MBTA) (Bernat et al. 2002, 2003; Kirner
et al. 2005;Wenzel et al. 2009; Stattelmann and Martin 2010) proposes the calculation
of WCET estimates from a database of timed execution traces of code runs on the
target hardware. Furthermore, many analysis tool chains rely on the Implicit Path
Enumeration Technique (IPET) (Li and Malik 1995; Puschner and Schedl 1997; Li
and Malik 1997) to calculate a global WCET estimate from WCET estimates of
individual code constituents.
WCET estimates obtained by MBTA are generally not guaranteed upper WCET
bounds, and care must be taken when they are used to approximate execution times.
For example, the schedulability tests of many classical hard real time scheduling
algorithms admit the approximation of execution times by upper bounds, whereas the
use of mere estimates could result in false positives. WCET estimates are more useful
in various other scenarios, like: soft real time systems, mixed-criticality systems, load
balancing, the fast evaluation of design options, and others.
We present an IPET-based approach for calculating a WCET estimate from a given
database of timed execution traces (Sect. 3). Extending standard IPET, our method
can reuse flow facts from other flow analysis tools. First, however, we discuss related
work (Sect. 2). We conclude with an experimental evaluation of a proof-of-concept
implementation (Sect. 4). In Appendix 1 we present proofs for all theorems in the
main part of the paper.
2 Related work
WCET analysis is usually performed in two stages: A low-level stage where local esti-
mates for individual code constituents—individual instructions, instruction sequences,
basic blocks, etc.—are determined, and a high-level stage, where local estimates are
combined into a global estimate for the complete code.
123
40 Real-Time Syst (2016) 52:38–87
Earlymethods for high-level analysis were based on abstract syntax trees (Puschner
andKoza 1989; Puschner and Schedl 1993; Puschner 1998; Shaw1989; Park and Shaw
1991), where leaf nodes corresponded to elementary statements (e.g. assignments) and
inner nodes corresponded to structured control constructs for sequences, selects, and
iterations (Dijkstra 1970). Determining the local estimates for elementary statements
from the execution times of individual machine instructions was considered easy, so
low-level analysis was not considered further.
Later it became apparent that the jitter in the execution time of individual instruc-
tions that is caused by certain performance-enhancing hardware features, e.g., caches,
instruction pipelines, and branch predictors, must be considered in order to obtain
practically usable WCET estimates. Consequently, various solutions for the low-level
analysis of individual path fragments in the presence of caches, pipelines, and branch
predictors were proposed (Li and Malik 1999; Lundqvist and Stenström 1998, 1999;
Stappert and Altenbernd 2000; Theiling and Ferdinand 1998; Schneider and Fer-
dinand 1999; Ferdinand et al. 1999; Colin and Puaut 2000). In contrast to these
model-based approach, measurement-based timing analysis (MBTA) proposed the
use of timed execution traces of individual code constituents obtained runs of the code
on the target hardware (Bernat et al. 2002, 2003; Kirner et al. 2005; Wenzel et al.
2009; Stattelmann and Martin 2010).
On the high-level stage, the early tree-based approaches for high-level analysiswere
soon found too inflexible: Firstly, tree-based approaches work only for structured pro-
grams, but not for programs that contain unstructured jumps. Although unstructured
code is sometimes dismissed as bad programming discipline, there exist cases that
justify the use of unstructured jumps, e.g., state machines. Moreover, language fea-
tures like exceptions contain implicit unstructured jumps. Secondly, the tree-based
approaches did not support the exclusion of infeasible execution paths. Experimen-
tation with various solutions eventually led to the wide adoption of the Implicit Path
Enumeration Technique (IPET) (Li and Malik 1995; Puschner and Schedl 1997; Li
and Malik 1997) for the high-level stage. We recap IPET in Sect. 3.1.
IPET uses a fixed WCET estimate for each code constituent. In Sect. 3.2 we argue
that this is amajor limitation that can lead to overly pessimistic estimates by combining
mutually exclusive execution scenarios. Researchers have been aware of this and have
proposed various remedies:
Li et al. analyse the control-flow between code mapping to shared cache lines. This
analysis affords linear constraints on the number of cache misses, modeled by an extra
variables. They have shown their approach for direct-mapped (Li and Malik 1995,
1999) and set-associative caches (Li and Malik 1996). In comparison our approach is
not limited to modeling specific hardware, but relies on the observed system behavior.
Still our approach is capable of handling cache-induced jitter, as demonstrated by our
experimental evaluation (cf. Sect. 4).
Ottoson and Sjödin (1997) introduce separate variables for pairs of adjacent pro-
gram constituent to model pipelines. Our concept of clips (cf. Sect. 3.3) allows a more
general separation of contexts, subsuming Ottoson’s and Sjödin’s case.
Engblom and Ermedahl (1999) use instruction sequences. Since instruction
sequences are one of many possible choices for code constituents, our approach can
reuse this idea. Theiling and Ferdinand (1998) have proposed virtual inlining/virtual
123
Real-Time Syst (2016) 52:38–87 41
Fig. 1 Visualization of the CFG from Example 1
unrolling (VIVU), allowing the separation of certain forms of execution scenarios:
Firstly, they can separate the first iteration of loops from subsequent ones. However,
our approach provides a more general way of choosing contexts via clips. Secondly,
they can distinguish procedure invocations that originate from different call sites.
Although our approach currently works only on individual procedures, it could be
extended to integrate the idea of call contexts.
3 Calculating WCET estimates
3.1 The implicit path-enumeration technique
The Implicit Path Enumeration Technique (IPET) (Li andMalik 1995, 1997; Puschner
and Schedl 1997) is a widely-used method for calculating an upper WCET bound for
a piece of code from upper WCET bounds of the code’s constituents.
Standard IPET is based on integer linear programming: The execution count of
each code constituent is expressed as a non-negative integer variable, and the possible
control flow between code constituents is overapproximated by linear constraints. The
cost of each code constituent is its local upperWCET bound, and the objective function
is the cost-weighted sum of execution counts of all code constituents. Maximizing the
objective function yields a WCET bound for the complete code. If estimates rather
than bounds are used as costs, the methods yields a WCET estimate for the code.
Definition 1 (Control-flowgraph)A control-flowgraphG of a programP is a quadru-
ple (V, E, vstart , vend) consisting of a set of nodes V , a set of edges E ⊆ V × V ,
a start node vstart ∈ V , and an end node vend ∈ V , such that all other nodes
are reachable from vstart , and vend is reachable from all other nodes, i.e., if E+
designates the transitive closure of E , {(vstart , v) | v ∈ V \ {vstart }} ⊆ E+ and
{(v, vend) | v ∈ V \ {vend}} ⊆ E+. Moreover, vstart is unreachable from other nodes
and no node is reachable from vend , i.e., {(v, vstart ), (vend , v) | v ∈ V } ∩ E+ = ∅.
Example 1 Figure 1 illustrates the CFG G = (V, E, vstart , vend) with nodes
V = {vstart , vend , v1, v2, v3} and edges
E = {(vstart , v1), (v1, v2), (v1, v3), (v2, v3), (v3, v3), (v3, vend)} .
Each node in aCFGG = (V, E, vstart , vend) of a programP is identifiedwith some
atomically executable code constituent. Different choices, like instructions, instruction
sequences, arbitrary code blocks, etc., affords a lot a flexibility. The start node vstart
and the end node vend mark the entry and exit points of the code and thus correspond
to empty code.
123
42 Real-Time Syst (2016) 52:38–87
Following the modeling presented in Li and Malik (1997), we use an integer node
variable fv for each node v ∈ V , designating the execution count of v during a single
run of P and an integer edge variable f(v,w) for each edge (v,w) ∈ E , designating
the number of control transfers from node v to node w. The solution space is chosen
to overapproximate all feasible runs: For any feasible run π = v1, . . . vn of P , the
valuation
{ fv → cv | v ∈ V } ∪ { fe → ce | e ∈ E}
must be a solution, where cv is the number of occurrences of node v in π , and where
ce is the number of occurrences of edge e in π , i.e.,
cv = |{i | vi = v, 1 ≤ i ≤ n}| and ce = |{i | (vi , vi+1) = e, 1 ≤ i < n}|.
Because numbers of occurrences are cardinals, non-negativity constraints are added
fv ≥ 0, for all v ∈ V and fe ≥ 0, for all e ∈ E .
To model a single run of program P , exactly one occurrence of the start node vstart
and exactly one occurrence of the end node vend is enforced:
fvstart = 1 and fvend = 1.
Just as in Li and Malik (1995, 1997) and Puschner and Schedl (1997) , we consider
sequential programs, i.e., the execution of v = vend must be followed by the execu-
tion of exactly one of its immediate successor nodes w. Likewise, the execution of
any node v = vstart must always follow the execution of exactly one of its immedi-
ate predecessor nodes w. In IPET, this can be expressed by linear constraints called
structural constraints
fv =
∑
(w,v)∈E
f(w,v) and fv =
∑
(v,w)∈E
f(v,w), for all v ∈ V .
To obtain a bounded problem, addition information about cyclic regions is needed.
The notion of a natural loop is sufficient for cycles in structured code: A node v is said
to dominate another node w, if every path from the start node vstart to w must pass
through v. A natural loop is defined by its back edge (w, v), which is an edge, such
that its target node v dominates its source node w. Constraints for natural loops are
typically inequalities that bound the flow through the back edge relative to the total
flow f(w1,v) + · · · + f(wn ,v) into the loop header v, i.e.,
f(w,v) ≤ b · ( f(w1,v) + · · · + f(wn ,v)),
where b is a positive integer constant, where {w1, . . . , wn, w} is the set of immediate
predecessors of node v, and where w dominates v.
123
Real-Time Syst (2016) 52:38–87 43
The objective function of a standard IPET problem is
∑
v∈V
w˜cetv · fv,
where w˜cetv is the local WCET estimate of node v.
Additional linear constraints can be obtained by program analysis or expert insight
into the code (Puschner and Schedl 1997).
Example 2 Reconsider the CFG from Example 1 and assume that the loop can reit-
erate at most 7 times after entry. Also assume WCET estimates of 50, 20, and 30
microseconds for nodes v1, v2, and v3. We obtain the following IPET problem:
– Non-negativity constraints:
fvstart ≥ 0; fv1 ≥ 0; fv2 ≥ 0; fv3 ≥ 0; fvend ≥ 0.
– Structural and single run constraints:
fvstart = f(vstart ,v1); fv1 = f(vstart ,v1); fv2 = f(v1,v2);
fvstart = 1; fv1 = f(v1,v2) + f(v1,v3); fv2 = f(v2,v3);
fv3 = f(v3,v3) + fvend ; fvend = (v3, vend);
fv3 = f(v1,v3) + f(v2,v3) + f(v3,v3); fvend = 1;
– Iteration constraint:
f(v3,v3) ≤ 7 · ( f(v1,v3) + f(v2,v3)).
– Objective function:
0 · fvstart + 50 · fv1 + 20 · fv2 + 30 · fv3 + 0 · fvend .
– We may add additional linear constraints, if available. For example, if we under-
stood that the loop is constrained to four iterations for each entry through edge
(v2, v3), we would add an extra constraint
f(v3,v3) ≤ 7 · f(v1,v3) + 3 · f(v2,v3).
3.2 Pessimism and monotonicity
IPET overapproximates the global WCET, assuming that local WCET behaviors in
two or more different program constituents may coincide, even where such a scenario
cannot be exhibited by the system, as illustrated by the example in Fig. 2. This conser-
vative approximation is sometimes called as pessimism. Given a suitable metric over
123
44 Real-Time Syst (2016) 52:38–87
Fig. 2 Example for pessimistic estimate calculation: consider a program fragment consisting of two con-
secutive constituents A and B. Assume that there are only two execution scenarios. Constituent A exhibits
its WCET in the first scenario; constituent B in second scenario. Standard IPET yields the sum of the local
WCET estimates as global WCET estimate. This is pessimistic, because a run in which both local WCETs
occur does not exist. The global WCET estimate is higher than the actual global WCET, which occurs in
Scenario 2
WCET estimates, we can quantify the pessimism of a given method as the distance of
its estimate(s) from the actual WCET(s).
Note that this extremely simple example serves just to illustrate the concept of
pessimism. The method proposed in this paper requires the existence of different
paths between nodes, as in Example 1.
The IPETmethod ismonotonic in the localWCET estimates, i.e., the globalWCET
estimate cannot decrease, unless some local WCET estimate decreases by a positive
value δ.
Definition 2 (Monotonic estimate calculation method) Let f be a function that takes
as arguments local WCET estimates w˜cet1, . . . , w˜cetn and yields a global WCET
estimate w˜cet = f (w˜cet1, . . . , w˜cetn). We say that f denotes a monotonic estimate
calculation method, iff, for any δ1, . . . , δn ≥ 0,
f (w˜cet1 + δ1, . . . , w˜cetn + δn) ≥ f (w˜cet1, . . . , w˜cetn).
3.3 Context-sensitive IPET
Context-sensitive IPET introduces execution scenarios to standard IPET, to reduce
overestimation. The method reuses all variables, constraints, and extra flow facts from
the respective standard IPET problem. For each node v ∈ V we model a set of
execution scenarios
Ev,1, . . . , Ev,n(v)
with separate WCET estimates
w˜cetv,1, . . . , w˜cetv,n(v).
123
Real-Time Syst (2016) 52:38–87 45
The execution counts of these scenarios are modeled by scenario variables
fv,1, . . . , fv,n(v).
The scenarios of node v ∈ V are refinements of the unspecific scenario of v in
standard IPET. Hence, the estimates w˜cetv,1, . . . , w˜cetv,n(v) do not exceed w˜cetv .
Requirement 1 (Execution scenario specialization) TheWCET estimate under a spe-
cific execution scenario of a node v does not exceed the unspecific WCET estimate
w˜cetv of v, i.e.,
w˜cetv,i ≤ w˜cetv, for all 1 ≤ i ≤ n(v).
The execution scenarios Ev,1, . . . , Ev,n(v) must classify the executions of v, i.e.,
each execution of node v during a run of program P is associated with exactly one
execution scenario of v.
Requirement 2 (Execution scenario classification) For each node v ∈ V , the associ-
ated execution scenarios Ev,1, . . . Ev,n(v) form a classification of the occurrences of
v in any end-to-end path through the CFG.
Requirement 2 yields new constraints
fv =
n(v)∑
i=1
fv,i , for all v ∈ V .
Because number of occurrences are cardinals, we add constraints
fv,i ≥ 0, for all v ∈ V, 1 ≤ i ≤ n(v).
Our objective function is a refinement of the objective function of the corresponding
standard IPET problem, obtained by replacing each summand
w˜cetv · fv by the weighted sum
n(v)∑
i=1
w˜cetv,i · fv,i ,
i.e., the new objective function has the form
∑
v∈V
n(v)∑
i=1
w˜cetv,i · fv,i .
Theorem 1 The global WCET estimate obtained from a context-sensitive IPET prob-
lem never exceeds the global WCET estimate obtained from the respective standard
IPET problem.
123
46 Real-Time Syst (2016) 52:38–87
A proof of this and all further theorems can be found in Appendix 1 of this paper.
Having several different execution scenarios is only useful, if we can establish
additional constraints that involve these scenarios. We therefore add the following
requirement:
Requirement 3 (Constraints) The classification of trace occurrences of a node v ∈ V
into scenarios Ev,1, . . . Ev,n(v) admits additional constraints over the corresponding
scenario variables fv,1, . . . , fv,n(v).
Example 3 Reconsider the IPET problem from Example 2. Assume that the WCET
of node v3 varies depending on which of the three node v1, v2, and v3 were executed
just before node v3, e.g., due to the caching of accesses to instruction and data mem-
ory. Assume that the WCET estimates of node v3 right after executing nodes v1, v2,
and v3 are 30, 10, and 5 microseconds. We introduce different execution scenarios
Ev3,1, Ev3,2, and Ev3,3 for node v3 and add the following constraints to the original
IPET problem:
fv3 = fv3,1 + fv3,2 + fv3,3.
The objective function of our context-sensitive IPET problem is:
0 · fvstart + 50 · fv1 + 20 · fv2 + 30 · fv3,1 + 10 · fv3,2 + 5 · fv3,3 + 0 · fvend .
So far, the context-sensitive IPET problem does not improve on the standard IPET
problem. However, the execution scenario of node v3 depends on which out of nodes
v1, v2, or v3 was executed immediately before, so we add the following constraints:
fv3,1 = f(v1,v3); fv3,2 = f(v2,v3); fv3,3 = f(v3,v3).
The total number of occurrences of edges (v1, v3) and (v2, v3) in a single run must
be 1, due to the structural constraints of the IPET problem. These constraints tighten
the WCET estimate by enforcing a closer upper bound on fv3,3.
Concrete methods with different properties can be created by instantiating context-
sensitive IPET, which is a generic method. In the following sections we develop one
such instantiation:
– In Sect. 3.4, we introduce the notion of a context of a node, to instantiate the notion
of an execution scenario, and two associated refinement operators.
– In Sect. 3.5, we show how to infer context-specific linear constraints over the
number of times the given node may appear on any end-to-end CFG paths, such
that Requirement 3 is fulfilled.
– In Sect. 3.6, we show how to infer WCET estimates for individual contexts from
a database of timed execution traces, such that Requirement 1 is fulfilled.
– In Sect. 3.7, we present an algorithm for obtaining contexts for a given node, such
that Requirements 2 and 3 are fulfilled.
– In Sect.3.8, we put everything together, instantiating context-sensitive IPET.
123
Real-Time Syst (2016) 52:38–87 47
Fig. 3 Visualization of the clip S0 from Examples 4. The single entry edge of S0 are dashed, and its two
exit edges are dotted. The are two paths in S0
3.4 Contexts
We consider the CFG GP = (V, E, vstart , vend) of some fixed program P . We write
U to denote the set of end-to-end paths of GP , i.e.,
U = {v1 . . . vn | v1 = vstart , vn = vend , (vi , vi+1) ∈ E, 1 ≤ i < n, n ∈ N} .
In the following, R∗ denote the reflexive, transitive closure of any binary relation
R. Hence, E∗ denotes the reachability relation in GP , and (E \ K )∗ with K ⊆ E
denotes reachability via paths that do not pass through any of the edges in the set K .
We first introduce the notion of a clip, which is a specification of a set of CFG
paths leading from a specific set of entry edges to a specific set of exit edges. It allows
us to describe sets of paths with similar control flows.
Definition 3 (Clip) A clip S is a pair A, B consisting of a set A ⊆ E of edges
called entry edges, and a set B ⊆ E of edges called exit edges.
Definition 4 (Paths in a clip) The set paths(S) of paths in a clip S = A, B is the
set of all CFG paths that start with some entry edge in A, that end with some exit edge
in B, and that do not contain any further entry or exit edges, i.e.,
paths(S) = {v1 . . . vn | (v1, v2) ∈ A, (vn−1, vn) ∈ B,
(vi , vi+1) ∈ E \ (A ∪ B), 1 < i < n − 1, n ≥ 3}.
Note that by this definition each path in a clip has at least two edges.
Example 4 Reconsider the CFG from Example 1. Consider the clip
S0 = {(v1, v2)}, {(v3, v3), (v3, vend)}.
Figure 3 illustrates this clip. The set of paths in S0 is
paths(S0) = {v1v2v3v3, v1v2v3vend}.
Note that paths(S0) does not contain any longer paths, because the back edge of the
loop is an exit edge.
A context of a node v ∈ V is a clip S such that v has at most one inner occurrence
in any path of S. A context enables us to pinpoint a particular occurrence of a given
node. Later we will use contexts to represent individual execution scenarios.
123
48 Real-Time Syst (2016) 52:38–87
Fig. 4 Visualization of the clips S1 and S2 from Example 5. Entry edges are dashed, exit edges are dotted,
and edges that are both entries and exits are dotdashed. The clip S1 is a context of node v3, because both of
its paths contains only a single inner occurrence of v3. For clip S2, the back edge (v3, v3) is neither an entry
nor an exit edge and may therefore occur arbitrarily often in the paths of S2. For example, v2v3v3vend is
a path in S2, hence S2 is not a context
By inner occurrence we mean any occurrence except at the first or last node. It is
a rather technical condition that is of no conceptual interest, but eliminates that the
unwanted special case of v sitting at the beginning of an entry edge or at the end of
an exit edge.
Definition 5 (Context) A context C of a node v ∈ V is a clip, such that any path
v1 . . . vn in C contains at most one inner occurrence of v, i.e.,
v1 . . . vn ∈ paths(C), vi = v j = v, 1 < i < n, 1 < j < n ⇒ i = j.
Example 5 Reconsider the CFG from Example 1. The clip S1 = A1, B1 with
A1 = {(v3, v3)}; B1 = {(v3, v3), (v3, vend)}
is a context of node v3, because none of its paths paths(S1) = {v3v3v3, v3v3vend}
contains more than one inner occurrence of v3. The clip S2 = A2, B2 with
A2 = {(v2, v3)}; B2 = {(v3, vend)}
is not a context, because the path v2v3v3vend ∈ paths(S2) has two inner occurrences
of v3. Figure 4 illustrates this example.
The following operators allow us measure the length of paths and to concatenate
paths and edges:
Definition 6 (Simple path operators) The length |v1 . . . vn| of a path v1 . . . vn con-
sisting of nodes vi ∈ V for 1 ≤ i ≤ n is defined as
|v1 . . . vn| = n.
The concatenation v1 . . . vn ◦w1 . . . wm of a path v1 . . . vn consisting of nodes vi ∈ V
for 1 ≤ i ≤ n with a path w1 . . . wm consisting of nodes wi ∈ V for 1 ≤ i ≤ m is
defined as
v1 . . . vn ◦ w1 . . . wm = v1 . . . vnw1 . . . wm .
123
Real-Time Syst (2016) 52:38–87 49
The concatenation v1 . . . vn ◦ (x, y) of a path v1 . . . vn consisting of nodes vi ∈ V for
1 ≤ i ≤ n with an edge (x, y) ∈ E is defined as
v1 . . . vn ◦ (x, y) = v1 . . . vnxy.
The concatenation (x, y) ◦ v1 . . . vn of an edge (x, y) ∈ E with a path v1 . . . vn
consisting of nodes vi ∈ V for 1 ≤ i ≤ n is defined as
(x, y) ◦ v1 . . . vn = xyv1 . . . vn .
We use contexts as execution scenarios. Requirement 2 asks for a classification
scheme, i.e., all situations must be covered and scenarios must not overlap. Hence, we
define appropriate notions of coverage and disjointness for contexts: Flow coverage
catches the idea of capturing all possible control flows that a given node can be involved
in; divergence captures the idea of contexts representing disjoint scenarios.
Definition 7 (Flow coverage) A set of paths X covers a node v ∈ V , iff, for all paths
ρ, σ with ρ◦v◦σ ∈ U (recall thatU is the set of all end-to-end paths in the CFG), there
are subpaths ρ1, σ2 and non-empty subpaths ρ2, σ1, with ρ1 ◦ρ2 = ρ and σ1 ◦σ2 = σ ,
such that ρ2 ◦ v ◦ σ1 ∈ X .
Informally speaking, this means that every occurrence of the node v in some end-
to-end path is located inside a subpath that is in X .
Example 6 Reconsider the CFG from Example 1. We have
U = {vstartv1πvend , vstartv1v2πvend | π ∈ {v3}+
}
.
Consider the two clips (cf. Fig. 5)
S3 = {(v1, v2), (v3, v3)}, {(v3, v3), (v3, vend)}, and
S4 = {(v1, v3), (v1, v2), (v3, v3)}, {(v3, vend)}.
The paths of these clips are
paths(S3) = {v1v2v3v3, v1v2v3vend , v3v3v3, v3v3vend} , and
paths(S4) =
{
v1πvend , v1v2πvend | π ∈ {v3}+
}
.
The paths of clip S4 cover node v3, i.e., every occurrence of v3 in some end-to-end
path is located inside a subpath that is an element of paths(S4). The paths of S3 do
not cover v3, e.g., none of the paths in paths(S3) is a subpath of vstartv1v3vend ∈ U .
Definition 8 (Divergent paths) Two paths π and σ are divergent, iff π and σ neither
overlap on more than a single edge, nor one is a subpath of the other, i.e., none of the
following conditions apply:
123
50 Real-Time Syst (2016) 52:38–87
Fig. 5 Visualization of the clips S3 and S4 from Example 6. The paths of S4 cover node v3, but the paths
of S3 do not
– there exist paths α, β, γ with α ◦ β = π , β ◦ γ = σ and |β| ≥ 2;
– there exist paths α, β, γ with α ◦ β = σ , β ◦ γ = π and |β| ≥ 2;
– σ is a subpath of π ;
– π is a subpath of σ .
Theorem 2 (Divergence of paths in clip) Let S = A, B be a clip with π, ρ ∈
paths(S) and π = ρ. Then π and ρ are divergent.
Definition 9 (Divergent sets) Two sets of paths X and Y are divergent, iff the paths
π and σ are divergent, for any choice of π ∈ X and σ ∈ Y .
Example 7 Reconsider clip
S4 = {(v1, v3), (v1, v2), (v3, v3)}, {(v3, vend)}
from Example 6. The paths of clip S4 and the paths of clip
S5 = {(v2, v3)}, {(v3, vend)}
are not divergent, because the path v2v3vend ∈ paths(S5) is a subpath of the path
v1v2v3vend ∈ paths(S4). On the other hand, the paths of clip S4 and clip
S6 = {(v1, v3), (v1, v2)(v3, v3)}, {(v3, v3)}
are divergent.
A simple-history context is a context that can easily be constructed from a CFG,
fulfills flow coverage and divergence, and can serve as a starting point for subsequent
refinement through context splitting operators.
Definition 10 (Simple history clip) The simple-history clip of a node v ∈ V \
{vstart , vend} is the clip
S = (Q ∪ B) ∩ R, B,
where Q is the set of all edges (vstart , q) ∈ E that start with the start node vstart ,
where B is the set of all outgoing edges (v, b) ∈ E of v, and where R is the set of all
edges (w, r) ∈ E , such that there is a path from r to v, i.e.,
Q = {(vstart , q) ∈ E}; B = {(v, b) ∈ E}; R = {(w, r) ∈ E | (r, v) ∈ E∗}.
123
Real-Time Syst (2016) 52:38–87 51
Fig. 6 Visualization of the simple history context S of node v3 from Example 8. The paths of S cover v3,
i.e., every occurrence of node v3 in some path from vstart to vend is located inside a path of the context S
Theorem 3 (Simple history context) For any node v ∈ V \ {vstart , vend}, the simple
history clip S = (Q ∪ B) ∩ R, B with
Q = {(vstart , q) ∈ E}; B = {(v, b) ∈ E}; R = {(w, r) ∈ E | (r, v) ∈ E∗}
is a context, and paths(S) covers v.
Example 8 Reconsider the CFG from Example 1. The simple-history context of node
v3 is the clip (cf. Fig. 6)
S = {(vstart , v1), (v3, v3)}, {(v3, v3), (v3, vend)}.
Contexts can be refined by splitting them into multiple separate “subcontexts”. We
present two fundamental splitting operators:
– Vertical context splitting affords the separation of a chosen set of subpaths of an
given context.
– Horizontal context splitting affords the separation of a chosen subset of paths of a
given context.
Recursive application of these two splitting operators on an initial covering
context—like the simple history context—allow us to obtain a set of fine-grained
contexts for any given node v ∈ V \ {vstart , vend}. Importantly, the splitting operators
must be designed to preserve coverage.Moreover, the resulting contexts must be diver-
gent, to allow the clean separation of scenarios demanded by Requirement 2. In the
following we present our operators and demonstrate that they fulfill these properties.
Definition 11 (Vertical context splitting) Let C = A, B be a context of node v ∈ V ,
and let F be the set of all edges that are neither entry nor exit nodes of C, i.e., F =
E \ (A ∪ B). Choose a set X of edges with
X ⊆ {(x1, x2) | (a, u) ∈ A, (u, x1) ∈ F∗, (x1, x2) ∈ F, (x2, w) ∈ F∗, (w, b) ∈ B
}
.
Vertical context splitting of C along X yields the two clips
C1 = A, (B ∪ X) ∩ Y , and C2 = X, (B ∪ X) ∩ Z,where
Y = {(u, w) | (x, y) ∈ A, (y, u) ∈ (F \ X)∗, (u, w) ∈ E} , and
Z = {(u, w) | (x, y) ∈ X, (y, u) ∈ (F \ X)∗, (u, w) ∈ E} .
The following theorem establishes that the resulting clips are contexts:
123
52 Real-Time Syst (2016) 52:38–87
Theorem 4 (Vertical context splitting) Let C1 and C2 be the clips produced by splitting
a context C = A, B of some node v ∈ V vertically along a set of edges X, i.e.,
C1 = A, (B ∪ X) ∩ Y , and C2 = X, (B ∪ X) ∩ Z, where
Y = {(u, w) | (x, y) ∈ A, (y, u) ∈ (F \ X)∗, (u, w) ∈ E} ,
Z = {(u, w) | (x, y) ∈ X, (y, u) ∈ (F \ X)∗, (u, w) ∈ E} , and
F = E \ (A ∪ B).
Then all of the following assertions hold:
1. C1 = A, (B ∪ X) ∩ Y  and C2 = X, (B ∪ X) ∩ Z are contexts of v;
2. if paths(C) covers v, then paths(C1) ∪ paths(C2) covers v;
3. paths(C1) and paths(C2) are divergent.
Example 9 Reconsider the simple-history context
Cv3 = S = {(vstart , v1), (v3, v3)} , {(v3, v3), (v3, vend)}
of node v3 from Example 8. There is a path from node v1 ∈ A to node v1 that contains
only edges in E \ {(vstart , v1), (v3, v3), (v3, vend)}, and there is a path from node v2
to node v3 that contains only edges from E \ {(vstart , v1), (v3, v3), (v3, vend)}, so we
may choose
X = {(v1, v2)} .
Hence, we have
E \ (A ∪ B ∪ X) = {(v1, v3), (v2, v3)} ;
Y = {(v1, v2), (v1, v3), (v3, v3), (v3, vend)} ;
Z = {(v2, v3), (v3, v3), (v3, vend)} .
Therefore, we obtain subcontexts (cf. Fig. 7)
Cv3,1.0 = A, (A ∪ B ∪ X) ∩ Y 
= {(vstart , v1), (v3, v3)} , {(v1, v2), (v3, v3), (v3, vend)};
Cv3,2.0 = X, (A ∪ B ∪ X) ∩ Z = {(v1, v2)}, {(v3, v3), (v3, vend)}.
Definition 12 (Horizontal context splitting) Let C = A, B be a context of node
v ∈ V , and let D be a partition of A. For each set D ∈ D, let ZD be the set of all edges
(u, w) ∈ E that are reachable from some edge in D without crossing any entry or exit
edge, i.e., there exists some edge (d, x) ∈ D with a (possibly empty) path from node
x to node u that contains only edges in E \ (A ∪ B).
123
Real-Time Syst (2016) 52:38–87 53
Fig. 7 Visualization of vertical context splitting from Example 9. The thick edge is the single split edge
(v1, v2)
Horizontal context splitting of C along D yields the set of contexts {CD | D ∈ D}
of v, where
CD = D, B ∩ ZD.
The fact that CD is a context of v, for any D ∈ D, is shown in the following theorem:
Theorem 5 (Horizontal context splitting)Let C = A, B be a context of node v ∈ V ,
and let D be a partition of A. For each set D ∈ D, let ZD be the set of all edges
(u, w) ∈ E, such that there exists some edge (d, x) ∈ D with a (possibly empty) path
from node x to node u that contains only edges in E \(A∪B). Then CD = D, B∩ZD
is a context of v. Moreover, the sets paths(CD1) and paths(CD2), are divergent, for
any sets D1, D2 ∈ D with D1 = D2. Furthermore, if W ∪ paths(C) covers node v,
for any set of paths W, then W ∪ ⋃D∈D paths(CD) covers node v.
Example 10 Reconsider context
Cv3,1.0 = {(vstart , v1), (v3, v3)}, {(v1, v2), (v3, v3), (v3, vend)}
from Example 9. Choose the following partition of A:
D = {{(vstart , v1)}, {(v3, v3)}} .
We obtain the two new contexts of v3 (cf. Fig. 8):
Cv3,1.1 = {(vstart , v1)} , {(v1, v2), (v3, v3), (v3, vend)};
Cv3,1.2 = {(v3, v3)} , {(v3, v3), (v3, vend)}.
123
54 Real-Time Syst (2016) 52:38–87
Fig. 8 Visualization of horizontal context splitting from Example 10
3.5 Context constraints
We first formalise the notion of occurrence. We start with rather straightforward def-
initions for occurrences of nodes and edges in a path.
Definition 13 (Occurrences of a node) The set occ(v, π) of occurrences of a node
v ∈ V in a path π is defined as
occ(v, π) = {(ρ, σ ) | ρ ◦ v ◦ σ = π} .
Example 11 Consider path π = vstartv1v3v3v3vend . We have
occ(v3, π) = {(vstartv1, v3v3vend), (vstartv1v3, v3vend), (vstartv1v3v3, vend)} .
Theorem 6 Let v ∈ V be a node, and let X be a set of paths. Then
∣∣∣
⋃
π∈X
occ(v, π)
∣∣∣ =
∑
π∈X
∣∣∣occ(v, π)
∣∣∣.
Definition 14 (Occurrences of an edge) The set occ(e, π) of occurrences of an edge
e ∈ E in a path π is defined as
occ(e, π) = {(ρ, σ ) | ρ ◦ e ◦ σ = π} .
Example 12 Reconsider path π = vstartv1v3v3v3vend from Example 11. We have
occ((v3, v3), π) = {(vstartv1, v3vend), (vstartv1v3, vend)}).
123
Real-Time Syst (2016) 52:38–87 55
Theorem 7 Let e ∈ E be an edge, and let X be a set of paths. Then
∣∣∣
⋃
π∈X
occ(e, π)
∣∣∣ =
∑
π∈X
∣∣∣occ(e, π)
∣∣∣.
The notion of a covered occurrence of a node considers inner occurrences in some
path of a given clip. Recall that our notion of an inner occurrence excludes the border
nodes of a path.
Definition 15 (Covered occurrence of a node) Let S be a clip. The set occ(v, π,S)
of S-covered occurrences of a node v ∈ V in a path π is defined as
occ(v, π,S) = {(ρ1, ρ2, σ1, σ2) | ρ1 ◦ ρ2 ◦ v ◦ σ1 ◦ σ2 = π,
ρ2 = 	, σ1 = 	, ρ2 ◦ v ◦ σ1 ∈ paths(S)} .
Example 13 Reconsider the contexts
Cv3,1.1 = {(vstart , v1)}, {(v1, v2), (v3, v3), (v3, vend)};
Cv3,1.2 = {(v3, v3)}, {(v3, v3), (v3, vend)}
from Example 10 and the path π = vstartv1v3v3v3vend from Example 12. We have
occ(v3, π, Cv3,1.1) = {(	, vstartv1v2, v3, v3vend)} ;
occ(v3, π, Cv3,1.2) = {(vstartv1v2, v3, v3, vend), (vstartv1v2v3, v3, vend , 	)} .
Theorem 8 Let v ∈ V be a node, let X be a set of paths, and let S be a clip. Then
∣∣∣
⋃
π∈X
occ(v, π,S)
∣∣∣ =
∑
π∈X
∣∣occ(v, π,S)∣∣.
The occurrence of a node are related to to the occurrence of its contexts, via linear
constraints.
Theorem 9 (Relating nodes to context) Let C1, . . . , Cn be pairwise divergent contexts
of some node v ∈ V , such that ⋃1≤i≤n paths(Ci ) covers v. Then the following
constraint holds:
|occ(v, π)| =
∑
1≤i≤n
∣∣∣occ(v, π, Ci )
∣∣∣, for all π ∈ U .
Lastly, the occurrences of a context are related to the occurrence of its entry and
exit edges, again via linear constraints.
Theorem 10 (Relating contexts to entries and exits) Let C = A, B be a context of
some node v ∈ V \ {vstart , vend} that is neither the start node, nor the end node.
Let X be the set of all edges (x, z) ∈ E, such that there exists an edge (a, w) ∈ A
with a path from node w to node x that contains only edges in E \ (A ∪ B), such that
123
56 Real-Time Syst (2016) 52:38–87
there exists an edge (u, v) ∈ E \ (A ∪ B) with a path from node x to node u that
contains only edges in E \ (A∪ B), and such there exists no path from node x to node
v that contains node z and only edges in E \ (A ∪ B).
Let Y be the set of all edges (z, y) ∈ E, such that there exists an edge (w, b) ∈ B
with a path from node y to node w that contains only edges in E \ (A ∪ B), such that
there exists an edge (v, u) ∈ E \ (A ∪ B) with a path from node u to node y that
contains only edges in E \ (A∪ B), and such there exists no path from node v to node
y that contains node z and only edges in E \ (A ∪ B).
Then the following constraints hold for every path π ∈ paths(C):
|occ(v, π, C)| ≤
∑
a∈A
|occ(a, π)| −
∑
x∈X
|occ(x, π)|;
|occ(v, π, C)| ≤
∑
b∈B
|occ(b, π)| −
∑
y∈Y
|occ(y, π)|.
Example 14 Reconsider contexts
Cv3,1.0 = {(vstart , v1), (v3, v3)} , {(v1, v2), (v3, v3), (v3, vend)};
Cv3,2.0 = {(v1, v2)} , {(v3, v3), (v3, vend)};
Cv3,1.1 = {(vstart , v1)} , {(v1, v2), (v3, v3), (v3, vend)};
Cv3,1.2 = {(v3, v3)} , {(v3, v3), (v3, vend)}.
of node v3 from Examples 9 and 10.
For context Cv3,1.0 we have X = {(v1, v2)} and Y = ∅, yielding
|occ(v3, π, Cv3,1.0)| ≤ |occ((vstart , v1), π)| + |occ((v3, v3), π)|
−|occ((v1, v2), π)|;
|occ(v3, π, Cv3,1.0)| ≤ |occ((v3, v3), π)| + |occ((v3, vend), π)|.
For context Cv3,2.0, we have X = ∅ and Y = ∅, yielding
|occ(v3, π, Cv3,2.0)| ≤ |occ((v1, v2), π)|;
|occ(v3, π, Cv3,2.0)| ≤ |occ((v3, v3)), π)| + |occ((v3, vend), π)|.
For context Cv3,1.1 we have X = {(v1, v2)} and Y = ∅, yielding
|occ(v3, π, Cv3,1.1)| ≤ |occ((vstart , v1), π)| − |occ((v1, v2), π)|;
|occ(v3, π, Cv3,1.1)| ≤ |occ((v3, v3), π)| + |occ((v3, vend), π)|.
For context Cv3,1.2 we have X = ∅ and Y = ∅, yielding
|occ(v3, π, Cv3,1.2)| ≤ |occ((v3, v3), π)|;
|occ(v3, π, Cv3,1.2)| ≤ |occ((v3, v3), π)| + |occ((v3, vend), π)|.
123
Real-Time Syst (2016) 52:38–87 57
3.6 Timed traces and clips
In this section we describe how execution times of nodes can be obtained from a
database of timed execution traces.
A timed trace indicates the execution sequence of nodes during a particular run of the
code on the target platform, together with the execution duration for each occurrence
of each node in the sequence.
Definition 16 (Timed trace) A timed trace of a program P is a finite sequence
π = (v1, t1) . . . (vn, tn), where v1 . . . vn is a path in the program’s CFG GP =
(V, E, vstart , vend), and where t1, . . . , tn are the associated observed execution times
of v1, . . . , vn .
The maximal observed execution time (MOET) of a node within a timed trace is
the maximal execution time that is associated with any occurrence of the node inside
the trace. By inside a trace we mean anything between the first and the last node, but
we do not include the border nodes.
Definition 17 (MOET of node in timed trace) The maximal observed execution time
(MOET) moetv,π of a node v ∈ V inside a timed trace π is defined as the maximum
over all associated execution times of v occurring inside π , i.e.,
moetv,π = max {ti | π = (v1, t1) . . . (vn, tn), vi = v, 1 < i < n} .
Note that moetv,π is undefined, if there is no occurrence of v in π .
Example 15 Consider the timed traces
π1 = (vstart , 0)(v1, 40)(v3, 20)(vend , 0); π2 = (v3, 5)(v3, 4)(v3, 4)(v3, 4);
π3 = (vstart , 0)(v1, 40)(v3, 25)(vend , 0); π4 = (vstart , 0)(v1, 40)(v2, 20);
π5 = (vstart , 0)(v1, 40)(v3, 30)(v3, 20)(vend , 0); π6 = (v3, 5);
π7 = (vstart , 0)(v1, 45)(v2, 15)(v3, 10)(vend , 0).
The MOETs of node v3 are as follows:
moetv3,π1 = 20; moetv3,π2 = 4;
moetv3,π3 = 25; moetv3,π4 undefined;
moetv3,π5 = 30; moetv3,π6 undefined;
moetv3,π7 = 10.
We lift the notion of the MOET of a node in a path to sets of timed traces:
Definition 18 (MOET of node in set of traces) The maximal observed execution time
(MOET) moetv,T of a node v ∈ V over a set of timed traces T is defined as the
123
58 Real-Time Syst (2016) 52:38–87
maximum of all maximal observed execution times of v ∈ V in any timed trace
π ∈ T , i.e.,
moetv,T = max
{
moetv,π | π ∈ T
}
.
Note that moetv,T is undefined, if none of the timed traces in T contains an occur-
rence of v.
Example 16 Reconsider the timed traces π1, . . . , n7 from Example 15. The MOET
of node v3 over the set T = {π1, . . . , π7} of timed traces is
moetv3,T = 30.
Definition 19 (Untimed trace) The corresponding untimed trace π of a timed trace
π = (v1, t1) . . . (vn, tn) is the sequence of nodes occurring in π , i.e.,
π = v1 . . . vn .
Example 17 Reconsider the times traces π1, . . . , π7 from Example 15. The corre-
sponding untimed traces are
π1 = vstartv1v3vend; π2 = v3v3v3v3;
π3 = vstartv1v3vend; π4 = vstartv1v2;
π5 = vstartv1v3v3vend; π6 = v3;
π7 = vstartv1v2v3vend .
Definition 20 (MOET of node in clip) Themaximal observed execution time (MOET)
moetv,S,T of a node v ∈ V in clip S over a set of timed traces T is the MOET of
v over the set of all timed subtraces in T with corresponding untimed traces that are
paths in S, i.e.,
moetv,S,T = max{moetv,π | π ∈ paths(S), σ ◦ π ◦ ρ ∈ T }.
Note that moetv,S,T is undefined, if none of the timed traces in T contains an
occurrence of v, or if none of the timed traces in T that contain an occurrence of v
has a corresponding untimed trace that is a path in clip S.
Example 18 Recall that a context C of a node v is a clip with a constraint on the
number of times that v may occur within the paths of C. So reconsider contexts
Cv3,1.0 = {(vstart , v1), (v3, v3)}, {(v1, v2), (v3, v3), (v3, vend)};
Cv3,2.0 = {(v1, v2)}, {(v3, v3), (v3, vend)};
Cv3,1.1 = {(vstart , v1)}, {(v1, v2), (v3, v3), (v3, vend)};
Cv3,1.2 = {(v3, v3)}, {(v3, v3), (v3, vend)}
123
Real-Time Syst (2016) 52:38–87 59
of node v3 from Examples 9 and 10. Also reconsider the timed traces π1, . . . , π7
from Example 15. The MOETs of node v3 within these contexts over the set T =
{π1, . . . , π7} of timed traces are
moetv3,Cv3,1.0,T = 30; moetv3,Cv3,2.0,T = 10;
moetv3,Cv3,1.1,T = 30; moetv3,Cv3,1.2,T = 4.
Theorem 11 (MOET reduction) Let S be a clip, and let T be a set of timed traces.
Then
moetv,S,T ≤ moetv,T .
3.7 Finding contexts for MBTA
In this section we describe an algorithm for obtaining, for any given node v ∈ V \
{vstart , vend}, a set {Cv,1, . . . Cv,n(v)} of contexts of v, with pairwise divergent sets of
paths that together cover v. The contexts are constructed in such a way that they are
associated with different maximal observed execution times.
To construct a set of contexts for some node v ∈ V \ {vstart , vend}, the algorithm
checks the MOET moetv,C,T of v over the given set T of timed traces in various
candidate contexts C,
We have noted before thatmoetv,C,T needs not be defined under all circumstances:
moetv,C,T is undefined, if none of the timed traces in T contains an occurrence of v,
or if none of the timed traces in T that contain an occurrence of v has a corresponding
untimed trace that is a path in context C.
In MBTA the set T of timed traces is obtained by performing measurements. In
that case, moetv,C,T is undefined, if node v is unreachable, or if none of the paths in
C was exhibited by any measurement.
There are two basic strategies for handling missing measurements:
Conservative approach: In this approach, the algorithm by default attributes missing
measurements to insufficient coverage of the temporal behavior. It assumes that
suitable timed traces can, in principle, be found, and conservatively substitutes the
global MOET moetv,T for moetv,S,T .
Progressive approach: In this approach, the algorithm by default attributes missing
measurements to infeasible paths. It assumes that suitable timed traces can, in princi-
ple, not be found, substitutes 0 formoetv,S,T , and stipulates that the corresponding
clip is infeasible.
However, to simplify the presentation of the algorithm, we just assume that
moetv,S,T is always defined, i.e., there is always at least one matching measurement.
In terms of coverage, this assumption means that the database T contains, for each
node v and each segmentS, at least one trace that first passes through any entry edge of
S, then passed through v, and eventually passes through any exit edge of S. Between
the entry and the exit edge, the path must not pass through any further entry or exit
edges.
123
60 Real-Time Syst (2016) 52:38–87
Algorithm 3.1 is a formal description of our algorithm. More informally, our algo-
rithm proceeds as follows:
1. The algorithm initially finds the set Q of all edges (vstart , q) ∈ E , the set B of
all edges (v, b) ∈ E , and the set R of all edges (w, r) ∈ E , such that there is a
path from r to v. Set R can easily be found by performing a backward depth-first
search, starting from node v. Sets Q and B can be found from an adjacency list or
adjacency matrix of the CFG.
2. Let A = (Q ∪ B) ∩ R. Note that C = A, B is the simple-history context of v.
3. For each edge (u, w) ∈ E , the algorithm checks the condition
moetv,{(u,w)},B,T ≤ moetv,Ou ,B,T ,
where Ou = {(u, x) ∈ E} is the set of all outgoing edges of node u. The condition
is a test if context {(u, w)}, B of v—which has edge (u, w) as its only entry
edge—provides a lower MOET for node v than context Ou, B of v—which
123
Real-Time Syst (2016) 52:38–87 61
has all edges starting from node u as entry edges. If this is true, then the context
{(u, w)}, B captures a case of executing v with a reduced execution time. Let X
be the set of all edges (u, w) ∈ E for which the condition holds. Consider X, B
as a separate context of v.
4. The next step of the algorithm is a vertical context split: The algorithm finds the
set Y of all edges (u, w) ∈ E such that there exists a path from some edge in A
to node u that contains only edges in E \ (A ∪ B ∪ X). It also finds the set Z
of all edges (u, w) such that there exists a path from some edge in X to node u
that contains only edges in E \ (A ∪ B ∪ X). Let A1 = A, B1 = (B ∪ X) ∩ Y ,
A2 = X , and B2 = (B ∪ X) ∩ Z . Note that Cv,1 = A1, B1 and Cv,2 = A2, B2
are contexts with paths(Cv,1) ∩ paths(Cv,2) = ∅ of v that cover node v.
5. The final step of the algorithm is a horizontal split of context Cv,i , for i ∈ {1, 2}:
In this step the algorithm creates a partition Di of set Ai by the MOET of node v,
i.e., Di = Ai/ ∼i , where ∼i is the following equivalence relation:
x ∼i y iff moetv,{x},Bi ,T = moetv,{y},Bi ,T for all x, y ∈ Ai .
For each set D ∈ Di , the algorithm finds the set ZD of all edges (u, w) ∈ E , such
that there exists a path from some edge in D to node u that contains only edges in
E \ (Ai ∪ Bi ).
6. The set of contexts produced by the algorithm is
M = {D, B1 ∩ ZD | D ∈ D1
} ∪ {D, B2 ∩ ZD | D ∈ D2
}
.
Example 19 Reconsider CFG G from Example 1. Also, reconsider the timed traces
π1 = (vstart , 0)(v1, 40)(v3, 20)(vend , 0); π2 = (v3, 5)(v3, 4)(v3, 4)(v3, 4);
π3 = (vstart , 0)(v1, 40)(v3, 25)(vend , 0); π4 = (vstart , 0)(v1, 40)(v2, 20);
π5 = (vstart , 0)(v1, 40)(v3, 30)(v3, 20)(vend , 0); π6 = (v3, 5);
π7 = (vstart , 0)(v1, 45)(v2, 15)(v3, 10)(vend , 0)
from Example 15. We apply Algorithm 3.1 on CFG G, node v3, and the set of timed
traces T = {π1, . . . , π7}:
1. The algorithm initially finds
Q = {(vstart , v1)}; B = {(v3, v3), (v3, vend)} ;
R = {(vstart , v1), (v1, v2), (v1, v3), (v2, v3), (v3, v3)} .
2. The algorithm sets
A = (Q ∪ B) ∩ R = {(vstart , v1), (v3, v3)} ,
The clip Cv3 = A, B is indeed the simple-history context of v3.
123
62 Real-Time Syst (2016) 52:38–87
3. The algorithm finds
moetv3,{(v1,v2)},B = 10 ≤ moetv3,{(v1,v2),(v1,v3)},B = 30;
X = {(v1, v2)}.
4. The Algorithm performs a vertical context split along X :
Y = {(v1, v2), (v1, v3), (v3, v3), (v3, vend)};
Z = {(v2, v3), (v3, v3), (v3, vend)};
A1 = A = {(vstart , v1), (v3, v3)};
B1 = (B ∪ X) ∩ Y = {(v1, v2), (v3, v3), (v3, vend)};
A2 = X = {(v1, v2)};
B2 = (B ∪ X) ∩ Z = {(v3, v3), (v3, vend)}.
Cv,1 = A1, B1
Cv,2 = A2, B2
5. The algorithm finds
moetv3,{(vstart ,v1)},B1 = 30 = moetv3,{(v3,v3)},B2 = 4;
D1 = {{(vstart , v1)}, {(v3, v3)}} ;
D2 = {{(v1, v2)}} ;
Z{(vstart ,v1)} = {(v1, v2), (v1, v3), (v3, v3), (v3, vend)} ;
Z{(v3,v3)} = {(v3, v3), (v3, vend)} ;
Z{(v1,v2)} = {(v2, v3), (v3, v3), (v3, vend)} ;
6. The algorithm produces the set of contexts
M = {{(vstart , v1)}, B1 ∩ Z{(vstart ,v1)},
{(v3, v3)}, B1 ∩ Z{(v3,v3)}, {(v1, v2)}, B2 ∩ Z{(v1,v2)}
}
= {{(vstart , v1)}, {(v1, v2), (v3, v3), (v3, vend)},
{(v3, v3)}, {(v3, v3), (v3, vend)}, {(v1, v2)}, {(v3, v3), (v3, vend)}
}
.
Theorem 12 Given a CFG GP = (V, E, vstart , vend), a node v ∈ V \ {vstart , vend},
and a set of timed traces T , Algorithm 3.1 returns a set of contexts M =
{Cv,1, . . . , Cv,n} of node v, such that paths(Cv,i ) and paths(Cv, j ) are divergent, for
1 ≤ i ≤ n, 1 ≤ j ≤ n, and i = j , and such that ⋃1≤i≤n paths(Cv,i ) covers node v.
3.8 Instantiating context-sensitive IPET
We are now able to put the results from Sects. 3.4 through 3.7 together, to obtain an
instantiation of context-sensitive IPET:
123
Real-Time Syst (2016) 52:38–87 63
1. We use Algorithm 3.1 to generate a set Qv = {Cv,1, . . . Cv,n(v)} of suitable con-
texts, for every node v ∈ V \ {vstart , vend}. By way of Theorem 12, paths(Cv,i )
and paths(Cv, j ) are divergent, for 1 ≤ i ≤ n(v), 1 ≤ j ≤ n(n), and i = j ,
and
⋃
1≤i≤n(v) paths(Cv,i ) covers node v. Therefore, Theorem 9 applies, hence
Requirement 2 is met.
2. For each node v ∈ V \ {vstart , vend}, we interpret the individual contexts
Cv,1, . . . Cv,n(v) as individual execution scenarios Ev,1, . . . Ev,n(v).
3. We use the MOETs moetv,Cv,1,T , . . . ,moetv,Cv,n(v),T of each context as WCET
estimates w˜cetv,1, . . . , w˜cetv,n(v).
4. We use the construction in Theorem 10 to infer ILP constraints over our execution
scenario variables. The translation of the linear constraints presented in the theorem
is straightforward: For example, the linear constraint
|occ(v, π, C)| ≤
∑
e∈A
|occ(e, π)| −
∑
e∈X
|occ(e, π)|, for any path π ∈ paths(C),
translates to a corresponding IPET constraint
fv,i ≤
∑
e∈A
fe −
∑
e∈X
fe.
By adding these constraints, we fulfill Requirement 3. Moreover, Requirement 1
is fulfilled as a consequence of Theorem 11.
Example 20 Reconsider CFG G from Example 1. Example 2 provides an IPET prob-
lem for G, constructed for some hypotheticalWCET estimates of the individual nodes.
We can reuse the constraints from that IPET problem to construct a context-sensitive
IPET problem for the timed trace T = {π1, . . . , π7} from Example 1.
1. The latter example has already illustrated the application of Algorithm 3.1, to
obtain a set
M = {Cv3,1.1, Cv3,1.2, Cv3,2.0
}
of suitable contexts for node v3, where
Cv3,1.1 = {(vstart , v1)}, {(v1, v2), (v3, v3), (v3, vend)};
Cv3,1.2 = {(v3, v3)}, {(v3, v3), (v3, vend)};
Cv3,2.0 = {(v1, v2)}, {(v3, v3), (v3, vend)}.
2. We interpret the individual contexts Cv3,1.1, Cv3,1.2, Cv3,2.0 as individual execution
scenarios Ev3,1, Ev3,2, Ev3,3 with associated variables fv3,1, fv3,2, fv3,3.
3. We use the MOETs
moetv3,Cv3,1.1,T = 30; moetv3,Cv3,1.2,T = 4; moetv3,Cv3,2.0,T = 10.
123
64 Real-Time Syst (2016) 52:38–87
of contexts Cv3,1.1, Cv3,1.2, Cv3,2.0—which we have calculated in Example 15—as
WCET estimates w˜cetv3,1, w˜cetv3,2, w˜cetv3,3.
4. We use the construction in Theorem 10 to infer the IPET constraints. We have
already calculated the linear constraints
|occ(v3, π, Cv3,1.1)| ≤ |occ((vstart , v1), π)| − |occ((v1, v2), π)|;
|occ(v3, π, Cv3,1.1)| ≤ |occ((v3, v3), π)| + |occ((v3, vend), π)|;
|occ(v3, π, Cv3,1.2)| ≤ |occ((v3, v3), π)|;
|occ(v3, π, Cv3,1.2)| ≤ |occ((v3, v3), π)| + |occ((v3, vend), π)|;
|occ(v3, π, Cv3,2.0)| ≤ |occ((v1, v2), π)|;
|occ(v3, π, Cv3,2.0)| ≤ |occ((v3, v3), π)| + |occ((v3, vend), π)|.
in Example 14. These translate to corresponding IPET constraints
fv3,1 ≤ f(vstart ,v1) − f(v1,v2); fv3,1 ≤ f(v3,v3) + f(v3,vend );
fv3,2 ≤ f(v3,v3); fv3,2 ≤ f(v3,v3) + f(v3,vend );
fv3,3 ≤ f(v1,v2); fv3,3 ≤ f(v3,v3) + f(v3,vend ).
4 Experimental evaluation
4.1 The FORTAS high-precision MBTA framework
We have implemented our approach for calculating WCET estimates as a component
of the FORTAS framework (Zolda 2012). The FORTAS framework is a prototypical
implementation of a portable, high-precision MBTA toolchain.
We call the FORTAS framework portable, because it can easily be adapted for dif-
ferent target architectures, essentially by implementing a driver that can execute the
program under analysis on the desired target and collect a timed execution trace. We
call the FORTAS framework a high precision analysis framework, because it incorpo-
rates a method for reducing the effect of underestimation emerging from insufficient
measurement coverage (Bünte et al. 2011) as well as themethod for reducing the effect
of overestimation presented in this paper.
Figure 9 illustrates how our approach fits into the analysis workflow.
Fig. 9 The FORTAS framework features an adaptive refinement loop, where automatically inferred con-
texts (cf. Sect. 3.4) guide the generation of additional input vectors for subsequent measurement runs. These
subsequent measurement runs yield additional timed traces
123
Real-Time Syst (2016) 52:38–87 65
The Framework currently provides a backend for the Infineon TriCore TC1796, a
fairly complex 32-bit microprocessor targeted at the automotive market that features
basic versions of many performance-enhancing features found in modern desktop and
server processors, like caching, pipelining, and branch prediction. For example, the
TC1796 features a simple static rather than a more sophisticated dynamic branch
predictor.
The TriCore TC1796 has a single processing core, yet allows parallel processing
of different types of instructions via three parallel instruction pipelines and a separate
floating point unit. The processing core includes a special debugging interface, which
allows us to non-intrusively capture cycle-accurate timed traces using a Lauterbach
PowerTrace device.1
As processor platform we are using a TriBoard TC179X evaluation board equipped
with 4MiBofBurst Flashmemory and 1MiBof asynchronous SRAM, both connected
to the processing core via the processor’s External Bus Unit. In our experiments, the
Clock Generation Unit was driven by an external crystal oscillator, producing a CPU
clock at 150 MHz, and a system clock at 75 MHz.
Further details on the processor and on our hardware setup are provided in Appen-
dix 2.
As indicated in Fig. 9, the FORTAS framework uses a feedback-driven anytime
approach to calculate a sequence of increasingly fine-grained WCET estimates, as
new traces are continuously added to the database of timed execution traces.
Our framework employs a hybrid method for generating input data that is based on
genetic programming and source-code analysis (Bünte et al. 2011). The trace database
is iteratively refined with traces that correspond to the identified execution scenarios
by generating according FQL queries (Holzer et al. 2011) that are subsequently fed to
the FShell test case generator (Holzer et al. 2008) to obtain respective input vectors.
After executing the program under analysis with the new input vectors, the resulting
timed execution traces are added to the database. We did not make use of any input
vectors distributed as part of any benchmarks.
Thanks to the anytime approach, the framework can be used to quickly obtain a
roughWCET estimate and more refined estimates later on, all within a single analysis
run. IntermediateWCET estimates can be obtained repeatedly whenever desired, until
the wanted precision has been obtained.
Early intermediate WCET estimates are useful as a quick approximation of the
expected WCET, e.g., to obtain instant feedback on how a certain modification of the
program might affect its WCET.
4.2 Benchmarks
Weused benchmarks from four different benchmark suites to evaluate our approach:
Industry Study (IS): A benchmark suite derived from the code of an engine controller,
provided by an industrial partner.
1 Lauterbach GmbH, Höhenkirchen-Siegertsbrunn, Germany. PowerTrace
123
66 Real-Time Syst (2016) 52:38–87
Mälardalen WCET Benchmark Suite (MD): A collection of benchmark programs
from different research groups and tool vendors. We selected bs, an implemen-
tation of binary search over an array of 15 integer elements, and bsort100, an
implementation of bubble sort over an array of 100 integer elements. For the latter
benchmark we reduced the input vector to 10 integer elements.
PapaBench (PB): Abenchmark suite originating fromUAVsoftware developedwithin
the Paparazzi project (Paparazzi 2012). We chose subproblems A1, A2, F1, and
F2 from the version used in the WCET Tool Challenge 2011 (von Hanxleden
et al. 2011).
Java Optimized Processor Benchmark Suite (JOP): A collection of programs that are
used for evaluating the Java Optimized Processor (JOP) (Schoeberl 2009). We used
the central control function of a C port of the lift control benchmark.
Due to technical limitations in some tools on which the FORTAS framework
depends—like the TriCore compiler tool chain—some source code transformations
must be performed on the benchmarks before analysis. For example, types that are
defined by typedef must be expanded, for loops must be transformed to equivalent
while loops, and the code must be formatted canonically, e.g., individual statements
must occur in separate lines. Although these transformations are rather trivial, there
is currently no tool to perform them automatically, which is the reason for our limited
range of benchmarks.
4.3 Experiments and results
We used two different memory setups: For the internal memory setup we placed the
executable code in the PMI’s scratchpad RAM (SPRAM), and the program data in
the DMI’s local data RAM (LDRAM). For the external memory setup we placed the
executable program code and the program data in the external SRAM, and enabled
the ICACHE.
These two setups represent extreme cases for temporal predictability: Memory
accesses have a constant penalty of 1 cycle for the internal memory setup, whereas
the external memory setup introduces a high access time jitter. Given the hardware
description in Appendix 2, sources of jitter can be found in: possible cache misses,
if instruction caching is used; mixed single/block transfers over the PLMB/DLMB;
possible occupation of the PLMB/DLMB by another bus master, like the PMU/DMU
or the LMI; occupation of the external memory-bus by another bus master; jitter in
DRAM accesses.
The two extreme cases for temporal predictability are therefore, on the one hand,
the use of the separate internal PMI and DMI memories, and, on the other hand, the
shared use of external memory for both, program and data, with instruction caching.
For the internal memory setup, we also analyzed the benchmarks using the
industrial-strength static WCET analysis tool aiT (Thesing et al. 2003). It was not
possible to obtain comparative data for the external memory setup, because aiT does
not support such a more complex setup.
Weperformed the analyses on an IntelCore2QuadQ9450CPU running at 2.66GHz
with 8 GiB of DRAM. For each benchmark, we generated at least 100,000 timed
123
Real-Time Syst (2016) 52:38–87 67
Table 1 Comparison of WCET estimates for internal memory setup
Benchmark MOET (μs) FORTAS aiT Sens./
std. (%)
Contexts Nodes
Sensitive
(μs)
Standard
(μs)
engine_control_cs1-Ak. 11.99 21.13 21.37 19.91 99 937 398
lift_control-ctrl_loop 7.39 10.45 11.00 9.91 95 240 119
binary_search-b._s. 1.68 2.23 2.24 2.10 100 15 14
bsort10-BubbleSort 29.34 40.39 42.44 37.60 95 28 15
a1-course_pid_run 0.91 1.35 1.41 1.89 96 26 17
a1-course_run 1.28 1.83 1.89 2.33 97 12 10
a2-atan2 0.85 1.11 1.12 1.67 99 25 15
a2-compute_dist2_t._h. 0.41 0.56 0.56 0.97 100 6 6
a2-nav_home 8.17 11.48 12.13 16.56 95 27 21
a2-navigation_update 8.23 11.78 12.43 16.75 95 4 5
a2-sin 1.77 2.69 2.71 3.88 99 38 29
f1-check_mega128_v._t. 2.81 3.52 3.63 4.18 97 16 11
f1-servo_set 2.21 2.69 2.76 3.57 97 81 43
f2-vector_10 1.07 1.30 1.32 1.14 98 22 18
traces. Table 1 summarizes our analysis results for the internal memory setup. For
each benchmark we list:
1. the observed end-to-end MOET;
2. the WCET estimate via FORTAS’s context-sensitive IPET;
3. the WCET estimate via FORTAS’s standard IPET;
4. the WCET bound via aiT’s static timing analysis;
5. the quotient between the two FORTAS WCET estimates;
6. the number of contexts produced by context-sensitive IPET for the final estimate;
7. the number of CFG nodes.
The latter two numbers are indicative of the quality of each benchmark: the number
of contexts indicates how many suitable execution scenarios occur. The number of
CFG nodes provides an estimate of the size of the analysis problem.
The observed end-to-end MOET is the maximal observed execution time for the
entire program. It is our best lower bound for the actual WCET. Indeed, both estimates
calculated by the FORTAS framework, as well the upper WCET bound calculated by
aiT are consistently higher.
Comparing the estimates of the FORTAS framework with the bounds of aiT, there
is no consistent ranking. Assuming that aiT produces safe upper bounds, we can
attribute any higher estimates of the FORTAS framework to higher pessimism, and
these estimates are also upper bounds. On possible reason for the lower pessimism of
aiT could be tighter loop iteration constraints.
Those cases where aiT produces larger numbers than the FORTAS framework
could indicate that the latter are less pessimistic, but without knowledge of the actual
123
68 Real-Time Syst (2016) 52:38–87
Table 2 Comparison of WCET estimates for external memory setup.
Benchmark MOET (μs) FORTAS Sens./std. (%) Contexts CFG Nodes
Sensitive
(μs)
Standard
(μs)
engine_control_cs1-Ak. 133.45 161.09 166.25 97 813 398
lift_control-ctrl_loop 64.95 73.59 87.29 84 319 119
binary_search-b._s. 17.76 18.32 22.61 81 17 14
bsort10-BubbleSort 287.89 316.23 392.93 80 32 15
a1-course_pid_run 8.78 9.71 10.83 90 25 17
a1-course_run 11.45 13.33 14.45 92 12 10
a2-atan2 8.29 8.58 8.66 99 24 15
a2-compute_dist2_t._h. 4.02 4.16 4.16 100 6 6
a2-fly_to_xy 9.27 11.87 11.95 99 5 5
a2-nav_home 62.21 76.18 85.81 89 34 21
a2-navigation_update 63.67 77.79 87.41 89 4 5
a2-sin 15.30 17.74 19.61 90 46 29
f1-check_mega128_v._t. 22.00 24.07 26.61 90 18 11
f1-servo_set 17.69 18.74 21.28 88 64 43
f2-vector_10 8.08 8.34 8.59 97 21 18
WCET it is impossible to decide to which degree the FORTAS framework is affected
by underestimation. The magnitude of the deviation between the estimates of both
tools does not provide an indication about the precision of the analyses performed by
the FORTAS framework and aiT.
Comparing the WCET estimates produced by context-sensitive IPET to those pro-
duced by standard IPET, we see that the former results are closer to theMOET than the
latter. This reduction can be a rough indicator of the achieved reduction in pessimism,
illustrating the effect of context-sensitive IPET.
Table 2 summarizes the analysis results for the external memory setup. As men-
tioned before, aiT does not support this configuration, so we can provide results for
the FORTAS framework only.
It can be seen that all MOETs and WCET estimates are considerably higher than
for the internal memory setup. This is unsurprising, as dynamic RAM has typically
a much higher access latency than static RAM. Moreover, the data path to external
memory is much longer than the data path to the SPRAM/LDRAM.
The context-sensitive WCET estimates are close to the respective MOETs. This
indicates that context-free IPET can work well in scenarios with a high execution time
jitter for individual code constituents. Moreover, the distance between the WCET
estimates of context-sensitive IPET and standard IPET is much larger for the external
memory setup than for the internal memory setup—in relative and absolute measures.
This meets our expectation, because the external memory datapath contains many
sources of temporal jitter, whereas the internal memory datapath is virtually free of
jitter, leaving less room for reducing pessimism.
123
Real-Time Syst (2016) 52:38–87 69
Table 3 Mean runtime of lp_solve for internal memory setup
Benchmark Sensitive (ms) Standard (ms) Sens./std (%)
engine_control_cs1-Ak. 682.1 539.6 126
lift_control-ctrl_loop 76.4 70.3 109
binary_search-b._s. 1.7 1.2 142
bsort10-BubbleSort 1.8 3.0 60
a1-course_pid_run 1.6 2.0 80
a1-course_run 0.8 0.9 89
a2-atan2 2.0 0.8 250
a2-compute_dist2_t._h. 0.3 0.3 100
a2-fly_to_xy 0.4 0.4 100
a2-nav_home 2.6 1.9 137
a2-navigation_update 0.5 0.6 83
a2-sin 3.3 3.0 110
f1-check_mega128_v._t. 1.3 1.0 130
f1-servo_set 9.5 7.9 120
f2-vector_10 1.8 1.8 100
Concerning the computational cost of context-sensitive IPET, we have to consider
two things: The complexity of building the context-sensitive IPET model from the
trace database and the cost of solving that model.
The complexity of building the context-sensitive IPET model depends on the con-
crete instantiation of context-sensitive IPET.Considering the instantiation thatwe have
presented in this paper, Algorithm 3.1 has linear time complexity in the number |E |
of CFG edges and linear time complexity in the maximal number of MOET classes.
The cost of solving the resulting context-sensitive IPETmodels depends on the used
LP solver. For our experiments we used lp_solve with standard settings. Tables 3
and 4 present a comparison of the mean solving times of the context-sensitive IPET
problems and their standard IPET counterparts. It can be seen that the solving times for
the two cases are quite similar. In one extreme case the context-sensitive problems took
about twice as long to solve as the corresponding standard IPET problems, whereas in
the other extreme case the context-sensitive problems took only half as long to solve
as the corresponding standard IPET problems. Overall, we observe no indication that
the context-sensitive IPET problems are significantly harder or easier compared to
standard IPET problems.
Summarizing our experimental data, we observed that context-free IPET works
well, especially in scenarios with a high execution-time jitter for individual code
constituents.
5 Conclusion
We have presented context-sensitive IPET, an ILP-based approach for calculating a
WCET estimate from a given database of timed execution traces.
123
70 Real-Time Syst (2016) 52:38–87
Table 4 Mean runtime of lp_solve for external memory setup
Benchmark Sensitive (ms) Standard (ms) Sens./std (%)
engine_control_cs1-Ak. 626.7 510.3 123
lift_control-ctrl_loop 86.5 73.7 117
binary_search-b._s. 0.8 1.6 50
bsort10-BubbleSort 2.4 2.2 109
a1-course_pid_run 2.2 1.0 220
a1-course_run 0.9 0.9 100
a2-atan2 1.6 2.1 76
a2-compute_dist2_t._h. 0.8 0.4 200
a2-fly_to_xy 0.5 0.4 125
a2-nav_home 2.4 2.4 100
a2-navigation_update 0.3 0.6 50
a2-sin 3.9 3.1 125
f1-check_mega128_v._t. 1.5 1.5 100
f1-servo_set 8.4 6.9 122
f2-vector_10 1.9 1.8 106
Our method is based on standard IPET, a widely used method for calculating an
upper WCET bound for a piece of code from upper WCET bounds of its code con-
stituents. It can thus reuse flow facts from other analysis tools and produces ILP
problems that can be solved by off-the-shelf solvers. Unlike previouswork, ourmethod
specifically aims at reducing overestimation, by means of an automatic classification
of code executions into scenarios with differing worst-case behaviour.
Context-sensitive IPET is a generic method. To obtain a concrete method, context-
sensitive IPET must be instantiated with a concrete notion of an execution scenario.
We have presented such an instantiation, which is based on the notion of a context,
which captures control flows within a program. We have also presented an algorithm
for producing such contexts based on measured execution times.
We have implemented our method as a component of the FORTAS framework—
a prototypical implementation of a portable, high-precision MBTA toolchain—and
presented an experimental evaluation of our method. The results of our evaluation
indicate that context-sensitive IPET can yield closer WCET estimates than standard
IPET.
Our results also indicate thatWCET estimates obtained by anMBTA toolchain that
harnesses context-sensitive IPET can be comparable to those obtained by toolchains
that are based on static analysis. However, it is important to bear in mind that MBTA
and static timing analysis have different use-cases and must therefore be understood
as complementary approaches.
Acknowledgments This research has been supported by the Austrian Science Fund (Fonds zur Förderung
der wissenschaftlichen Forschung) within the research project “Formal Timing Analysis Suite of Real-Time
Systems” (FORTAS-RT) under contract P19230-N13, by the EU FP-7 project “Asynchronous and Dynamic
123
Real-Time Syst (2016) 52:38–87 71
Virtualisation through performance ANalysis to support Concurrency Engineering” (ADVANCE) under
contract no. 248828, by ARTEMIS-JU within the FP7 research project “ConstRaint and Application driven
Framework for Tailoring Embedded Real-time Systems” (CRAFTERS) under contract no. 295371, and by
the EU COST Action IC1202 “Timing Analysis On Code-Level” (TACLe).
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 Interna-
tional License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution,
and reproduction in any medium, provided you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license, and indicate if changes were made.
Appendix 1: Proofs
Proof of Theorem 1 Let fv,1, . . . , fv,n(v) be a solution of the context-sensitive
IPET problem. The context-sensitive IPET problem contains more constraints, so
fv,1, . . . , fv,n(v) is also a solution of the respective standard IPET problem. Because
fv = ∑n(v)i=1 fv,i and w˜cetv,i ≤ w˜cetv for 1 ≤ i ≤ n(v), we have
∑
v∈V
n(v)∑
i=1
w˜cetv,i · fv,i ≤
∑
v∈V
n(v)∑
i=1
w˜cetv · fv,i =
∑
v∈V
w˜cetv · fv.
unionsq
Proof of Theorem 2 LetS = A, B be a clip. Choose any non-divergent pathsπ = σ
with
π = u1 . . . un ∈ paths(S), and σ = w1 . . . wm ∈ paths(S).
Case 1: There exist paths α, β, γ with α ◦ β = π , β ◦ γ = σ and |β| ≥ 2. Because
π = σ , we have |α| ≥ 1 or |γ | ≥ 1 (or both). If |α| ≥ 1, then π contains
some entry edge (ui , ui+1) ∈ A, where 2 ≤ i ≤ n − 2. This contradicts the
assumption that π ∈ paths(S). If |γ | ≥ 1, then σ contains some exit edge
(wi , wi+1) ∈ B, where 2 ≤ i ≤ m − 2. This contradicts the assumption that
σ ∈ paths(S).
Case 2: There exist paths α, β, γ with α ◦ β = σ , β ◦ γ = π and |β| ≥ 2. This case
is symmetric to Case 1.
Case 3: Path σ is a subpath of π . Since σ = π , path σ must be a proper subpath of
π , hence π contains some edge (ui , ui+1) ∈ A ∪ B, where 2 ≤ i ≤ n − 2.
This contradicts the assumption that π ∈ paths(S).
Case 4: Path π is a subpath of σ . This case is symmetric to Case 3. unionsq
Proof of Theorem 3 We first show that S is a context, and then show the coverage
property:
1. Consider any path π ∈ paths(S). Any edge (v, b) with source node v is an exit
edge of clip S. Therefore, π can contain at most one more node after the first
occurrence of v. Hence, there is at most one inner occurrence of v in π .
123
72 Real-Time Syst (2016) 52:38–87
2. We show that S covers v. Consider any paths ρ, σ with ρ ◦ v ◦ σ ∈ U . Note that
path ρ starts with the start node vstart and that path σ ends with the end node vend ,
which implies that paths ρ and σ are not empty.
Case 1: Path ρ contains node v. Then choose paths τ1, τ2 with τ1 ◦v ◦τ2 = ρ,
such that τ2 does not contain node v. Path v ◦ τ2 ◦ v starts with an edge in
B ∩ R, path τ2 ◦ v contains no edge in Q ∪ B, and path v ◦ σ starts with
an edge (v, b) ∈ B. Hence, path v ◦ τ2 ◦ v ◦ b is in paths(S), and there are
subpaths ρ1, σ2 with ρ1 ◦ v ◦ τ2 = ρ and b ◦ σ2 = σ .
Case 2: Path ρ does not contain node v. Then path ρ ◦v starts with an edge in
Q∩ R, path ρ ◦v contains no edge in B, except for its first edge (v, b) ∈ B,
and path v ◦ σ starts with an edge (v, b) ∈ B. Hence, path σ ◦ v ◦ b is in
paths(S), and there are subpaths ρ1, σ2 with ρ1 ◦ρ = ρ and b ◦σ2 = σ . unionsq
Proof of Theorem 4 1. We show that C1 and C2 are contexts of v. For every edge
(x1, x2) ∈ X , there exists an edge (a, u) ∈ A with a path from node u to node
x1 that contains only edges in E \ (A ∪ B) and an edge (w, b) ∈ B with a path
from node x2 to node w that contains only edges in E \ (A∪ B). Therefore, every
path in paths(A, X) is a subpath of some path in paths(A, B), hence the clip
A, X contains at most one occurrence of v along each path, i.e., it is a context
of v.
By a similar argument, every path in paths(X, B) is a subpath of some path in
paths(A, B), hence X, B is a context of v.
Lastly, every path in paths(X, X ∩ Z) is a subpath of some path in
paths(A, B), hence X, X is a context of v.
Hence A, (B ∪ X) ∩ Y ) and X, (B ∪ X) ∩ Z) are contexts of v, because
paths
(
A, (B ∪ X) ∩ Y ) ⊆ paths (A, B ∪ X) ⊆ paths (A, B) ∪ paths (A, X) ;
paths
(
X, (B ∪ X) ∩ Z) ⊆ paths (X, (B ∩ Z) ∪ (X ∩ Z)) ⊆
⊆ paths (X, B) ∪ paths (X, X ∩ Z) .
2. We show that paths(C1) ∪ paths(C2) covers v. Choose any paths ρ, σ , with
ρ◦v◦σ ∈ U . By our initial assumption, paths(C) covers v, i.e., there are subpaths
ρ1, σ2 and non-empty subpaths ρ2, σ1, with ρ1 ◦ ρ2 = ρ and σ1 ◦ σ2 = σ , such
that ρ2 ◦ v ◦σ1 ∈ paths(C). Moreover, there is some entry edge (a, u) ∈ A, some
exit edge (w, b) ∈ B, and paths α, β with a ◦u ◦α = ρ2 ◦v and β ◦w ◦b = v ◦σ1.
We show by construction, that there are always subpaths ρ′1, σ ′2 and non-empty
subpaths ρ′2, σ ′1, with ρ′1 ◦ ρ′2 = ρ and σ ′1 ◦ σ ′2 = σ , such that ρ′2 ◦ v ◦ σ ′1 ∈
paths(C1) ∪ paths(C2).
Case 1: Path u ◦α ◦v ◦β ◦w does not contain any edge in X . Then u ◦α ◦v ◦β ◦w
contains only edges in E \ (A∪ B ∪ X), therefore edge (w, b) is in B ∩Y . Hence,
choose ρ′2 = ρ2 and σ ′1 = σ1. Path ρ′2 ◦ v ◦ σ ′1 is in paths(C1).
Case 2: Path u ◦ α ◦ v contains some edge in X , but path v ◦ β ◦ w does not. Then
there is some edge (x1, x2) ∈ X and paths α1, α2, with α1 ◦ x1 ◦ x2 ◦α2 = u ◦α ◦v,
such that path x2 ◦ α2 contains only edges in E \ X . Path x2 ◦ α2 ◦ β ◦ w then
contains only edges in E \ (A∪ B ∪ X), therefore edge (w, b) is in B ∩ Z . Hence,
choose ρ′2 ◦ v = x1 ◦ x2 ◦ α2 and σ ′1 = β ◦w ◦ b. Path ρ′2 ◦ v ◦ σ ′1 is in paths(C2).
123
Real-Time Syst (2016) 52:38–87 73
Case 3: Path u ◦α ◦v does not contain any edge in X , but path v ◦β ◦w does. Then
there is some edge (y1, y2) ∈ X and paths β1, β2, with β1◦ y1◦ y2 ◦β2 = v◦β ◦w,
such that path β1 ◦ y1 contains only edges in E \ X . Path u ◦ α ◦ β1 ◦ y1 then
contains only edges in E \ (A∪ B∪ X), therefore edge (y1, y2) is in X ∩Y . Hence,
choose ρ′2 = a ◦ u ◦ α and σ1 ◦ v = β1 ◦ y1 ◦ y2. Path ρ′2 ◦ v ◦ σ ′1 is in paths(C1).
Case 4: Path u ◦ α ◦ v contains some edge in X , and so does path v ◦ β ◦ w. Then
there is some edge (x1, x2) ∈ X and paths α1, α2, with α1 ◦ x1 ◦ x2 ◦α2 = u ◦α ◦v,
such that path x2 ◦ α2 contains only edges in E \ X . Also, there is some edge
(y1, y2) ∈ X and paths β1, β2, with β1 ◦ y1 ◦ y2 ◦ β2 = v ◦ β ◦ w, such that path
β1 ◦ y1 contains only edges in E \ X . Now choose α3 with α3 ◦ v = x2 ◦ α2, and
choose β3 with v ◦ β3 = β2 ◦ y1. Path α3 ◦ v ◦ β3 then contains only edges in
E \ (A∪ B ∪ X), therefore edge (y1, y2) is in X ∩ Z . Choose ρ′2 ◦ v = x1 ◦ x2 ◦α2
and v ◦ σ ′1 = β2 ◦ y1 ◦ y2. Path ρ′2 ◦ v ◦ σ ′1 is in paths(C2).
3. We show that paths(C1) and paths(C2) are divergent.
Consider the paths
π = u1 . . . un ∈ paths(C1), and σ = w1 . . . wm ∈ paths(C2).
There are two cases how π and σ may overlap:
Case 1: π contains the first edge (w1, w2) ∈ X of σ . Since X ⊆ E \ (A∪ B)
and (u1, u2) ∈ A, we have (ui , ui+1) ∈ X , for some i with 2 ≤ i ≤ (n−1).
However, by the definition of Y , that means (un−1, un) /∈ Y , unless i =
n−1. Hence, any occurrence of the first edge (w1, w2) of π must be on last
edge (un−1, un) of σ . Since π is a path of a context, it contains at least two
edges—an entry edge and an exit edge. Therefore, π cannot be a subpath of
σ , and there are no paths α, β, γ with α ◦ β = π , β ◦ γ = σ , and |β| ≥ 2.
Case 2: σ contains the first edge (u1, u2) ∈ A of π . Since X ⊆ E \ (A ∪ B)
and (w1, w2) ∈ X , we have (wi , wi+1) ∈ A, for some i with 2 ≤ i ≤
(m − 1). However, by the definition of Z , that means (wm−1, um) /∈ Z ,
unless i = m − 1. Hence, any occurrence of the first edge (u1, u2) of σ
must be on last edge (wm−1, wm) of π . Since σ is a path of a context, it
contains at least two edges—an entry edge and an exit edge. Therefore, σ
cannot be a subpath of π , and there are no paths α, β, γ with α ◦ β = σ ,
β ◦ γ = π , and |β| ≥ 2. unionsq
Proof of Theorem 5 First, we show that the clip CD is a context of node v, for any
D ∈ D. Next, we show that W ∪ ⋃D∈D paths(CD) covers node v, if W ∪ paths(C)
covers v, for any set of paths W . Lastly, we show that paths(CD1) and paths(CD2)
are divergent, for any sets D1, D2 ∈ D with D1 = D2.
1. We show that clip CD is a context of node v, for any D ∈ D. We have D ⊆ A,
because D is a partition of A, and B ∩ ZD ⊆ B. Therefore, paths(CD) is a subset
of paths(C).
Clip C is a context of node v, i.e., all paths in paths(C) contain at most one
occurrence of v. Since paths(CD) is a subset of paths(C), its paths also contain
at most one occurrence of v. It follows that CD is a context of v.
123
74 Real-Time Syst (2016) 52:38–87
2. We show that W ∪ ⋃D∈D paths(CD) covers node v, if W ∪ paths(C) covers v,
for any set of pathsW . Choose any paths ρ, σ , with ρ ◦v ◦σ ∈ U . By assumption,
W ∪ paths(C) covers v, i.e., there are subpaths ρ1, σ2 and non-empty subpaths
ρ2, σ1, with ρ1 ◦ ρ2 = ρ and σ1 ◦ σ2 = σ , such that ρ2 ◦ v ◦ σ1 ∈ W ∪ paths(C).
If ρ2 ◦ v ◦ σ1 ∈ W , then ρ2 ◦ v ◦ σ1 ∈ W ∪ ⋃D∈D paths(CD), and we are done.
Otherwise, path ρ2 starts with some entry edge (a, x) ∈ A of context C, and since
D is a partition of A, there is some set D ∈ D, such that (a, x) ∈ D. Furthermore,
path σ1 ends with some exit edge (y, b) ∈ B of context C, and there are no further
occurrence of any edge from A ∪ B in ρ2 ◦ v ◦ σ1, therefore (y, b) is in B ∩ ZD ,
hence ρ2 ◦ v ◦ σ1 ∈ paths(CD) ⊆ W ∪ ⋃D∈D paths(CD).
3. We show that paths(CD1) and paths(CD2) are divergent, for any sets D1, D2 ∈ D
with D1 = D2. Choose any path π ∈ paths(CD1), and any path σ ∈ paths(CD2).
Since D1 ∩ D2 = ∅, paths π and σ must have a different entry edge.
Moreover, we have D1 ⊆ A and σ ∈ paths(C), therefore the entry edge of π can
only occur on the last edge of σ , hence π cannot be a subpath of σ , and there are
no paths α, β, γ with α ◦ β = π , β ◦ γ = σ , and |β| ≥ 2.
Likewise, we have D2 ⊆ A and π ∈ paths(C), therefore the entry edge of σ can
only occur on the last edge of π , hence σ cannot be a subpath of π , and there are
no paths α, β, γ with α ◦ β = σ , β ◦ γ = π , and |β| ≥ 2.
We conclude that CD1 and CD1 are divergent. unionsq
Proof of Theorem 6 We have occ(v, π) ∩ occ(v, σ ) = ∅ for any paths π, σ with
π = σ . From this the theorem follows easily. unionsq
Proof of Theorem 7 We have occ(e, π) ∩ occ(e, σ ) = ∅ for any paths π, σ with
π = σ . From this the theorem follows easily. unionsq
Proof of Theorem 8 We have occ(v, π,S)∩occ(v, σ,S) = ∅ for any paths π, σ with
π = σ . From this the theorem follows easily. unionsq
Proof of Theorem 9 We separately show the ≤ and the ≥ part of the equality:
1. Since the contexts Ci are pairwise divergent, we have
∑
1≤i≤n
|occ(v, π, Ci )| =
∣∣∣
⋃
1≤i≤n
occ(v, π, Ci )
∣∣∣,
for all π ∈ U , for all π ∈ U .
Now, consider any path π ∈ U and any paths ρ, σ , with ρ ◦ v ◦ σ = π . Since⋃
1≤i≤n paths(Ci ) covers node v, there is some index i with 1 ≤ i ≤ n, subpaths
ρ1, σ2, and non-empty subpaths ρ2, σ1 with ρ1 ◦ ρ2 = ρ and σ1 ◦ σ2 = σ , such
that ρ2 ◦ v ◦ σ1 ∈ paths(Ci ).
Also, consider any paths ρ′, σ ′, with ρ′ ◦v ◦σ ′ = π and (ρ, σ ) = (ρ′, σ ′). Again,
since
⋃
1≤i≤n paths(Ci ) covers node v, there is some index i ′ with 1 ≤ i ′ ≤ n,
subpaths ρ′1, σ ′2, and non-empty subpaths ρ′2, σ ′1 with ρ′1 ◦ ρ′2 = ρ′ and σ ′1 ◦ σ ′2 =
σ ′, such that ρ′2 ◦ v ◦ σ ′1 ∈ paths(C′i ). Since (ρ, σ ) = (ρ′, σ ′), we also have
(ρ′1, ρ′2, σ ′1, σ ′2) = (ρ1, ρ2, σ1, σ2).
123
Real-Time Syst (2016) 52:38–87 75
We see that, for any two different elements (ρ, σ ) and (ρ′, σ ′) in occ(v, π), we get
different elements (ρ1, ρ2, σ1, σ2) and (ρ′1, ρ′2, σ ′1, σ ′2) in
⋃
1≤i≤n occ(v, π, Ci ),
hence
|occ(v, π)| ≤ |
⋃
1≤i≤n
occ(v, π, Ci )| =
∑
1≤i≤n
|occ(v, π, Ci )|, for all π ∈ U .
2. Since the contexts Ci are pairwise divergent, we have
∑
1≤i≤n
|occ(v, π, Ci )| =
∣∣∣
⋃
1≤i≤n
occ(v, π, Ci )
∣∣∣,
for all π ∈ U , for all π ∈ U .
Now, consider any path π ∈ U . Choose any paths ρ1, σ2 and any paths ρ2 = 	,
σ1 = 	, such that ρ1 ◦ ρ2 = ρ, such that σ1 ◦ σ2 = σ , and such that ρ2 ◦ v ◦
σ1 ∈ paths(Ci ) with 1 ≤ i ≤ n. Moreover, choose paths ρ′1, σ ′2 and any paths
ρ′2 = 	, σ ′1 = 	, such that ρ′1 ◦ ρ′2 = ρ′, such that σ ′1 ◦ σ ′2 = σ ′, and such that
ρ′2 ◦ v ◦σ ′1 ∈ paths(C′i ) with 1 ≤ i ′ ≤ n, and such that (ρ2, σ1) = (ρ′2, σ ′1). There
are two cases:
Case 1: i = j , i.e., paths ρ2 ◦ v ◦ σ1 and ρ′2 ◦ v ◦ σ ′1 are in the same context.
Then they are divergent, by Theorem 2.
Case 2: i = j , i.e., paths ρ2 ◦ v ◦ σ1 and ρ′2 ◦ v ◦ σ ′1 are in different contexts.
Then they are divergent, by the original assumption that all contexts Ci are
pairwise divergent.
In both cases, paths ρ2 ◦ v ◦ σ1 and ρ′2 ◦ v ◦ σ ′1 are divergent. Therefore, we
have (σ1 ◦ σ2, ρ1 ◦ ρ2) = (σ ′1 ◦ σ ′2, ρ′1 ◦ ρ′2). We see that, for any two different
elements (ρ2, σ1) and (ρ′2, σ ′1) in
⋃
1≤i≤n occ(v, π, Ci ), we get different elements
(ρ1 ◦ ρ2, σ1 ◦ σ2) and (ρ′1 ◦ ρ′2, σ ′1 ◦ σ ′2) in occ(v, π), hence
|occ(v, π)| ≥
∣∣∣
⋃
1≤i≤n
occ(v, π, Ci )
∣∣∣ =
∑
1≤i≤n
|occ(v, π, Ci )|, for all π ∈ U .
unionsq
Proof of Theorem 10 We give a proof for the first inequality. The proof for the second
inequality is symmetric.
By the definition of occ and by the observation that
occ(w, π) ∩ occ(w′, π) = ∅ for all w,w′ ∈ V, w = w′, π ∈ U ,
we have
∑
a∈A
|occ(a, π)| = |{(ρ, σ ) | ρ ◦ a ◦ σ = π, a ∈ A}|;
∑
x∈X
|occ(x, π)| = |{(ρ′, σ ′) | ρ′ ◦ x ◦ σ ′ = π, x ∈ X}|.
123
76 Real-Time Syst (2016) 52:38–87
If we consider the definition of X , we see that ρ′ contains an occurrence of some
edge a ∈ A, such that there is no subsequent occurrence of any edge y ∈ A∪ B in ρ′,
i.e., we have
{
(ρ′, σ ′) | ρ′ ◦ x ◦ σ ′ = π, x ∈ X}
= {(ρ′, σ ′) | ρ′ ◦ x ◦ σ ′ = π, ρ ◦ a ◦ τ = ρ′, a ∈ A,
x ∈ X, (y ∈ A ∪ B, τ1, τ2) : τ1 ◦ y ◦ τ2 = τ } .
We are interested in the number of elements in the latter set, not in the elements
themselves. This allows us to make use of the following equality:
∣∣{(ρ′, σ ′) ∣∣ ρ′ ◦ x ◦ σ ′ = π, ρ ◦ a ◦ τ = ρ′, a ∈ A,
x ∈ X, (y ∈ A ∪ B, τ1, τ2) : τ1 ◦ y ◦ τ2 = τ }|
= |{(ρ, σ ) | ρ ◦ a ◦ σ = π, τ ◦ x ◦ σ ′ = σ, a ∈ A,
x ∈ X, (y ∈ A ∪ B, τ1, τ2) : τ1 ◦ y ◦ τ2 = τ }|.
Since
{(ρ, σ ) | ρ ◦ a ◦ σ = π, τ ◦ x ◦ σ ′ = σ, a ∈ A,
x ∈ X, (y ∈ A ∪ B, τ1, τ2) : τ1 ◦ y ◦ τ2 = τ } ⊆ {(ρ, σ ) | ρ ◦ a ◦ σ = π, a ∈ A},
we have
∑
a∈A
|occ(a, π)| −
∑
x∈X
|occ(x, π)| =∣∣ {(ρ, σ ) | ρ ◦ a ◦ σ = π, a ∈ A} \
{
(ρ, σ ) | ρ ◦ a ◦ σ = π, τ ◦ x ◦ σ ′ = σ, a ∈ A,
x ∈ X, (y ∈ A ∪ B, τ1, τ2) : τ1 ◦ y ◦ τ2 = τ }
∣∣
Expansion of the set subtraction yields
∑
a∈A
|occ(a, π)| −
∑
x∈X
|occ(x, π)| = ∣∣{(ρ, σ ) | ρ ◦ a ◦ σ = π, a ∈ A,
(x ∈ X, y ∈ A ∪ B, τ1, τ2, σ ′) :
τ1 ◦ y ◦ τ2 ◦ x ◦ σ ′ = σ }
∣∣.
Now, choose any element from occ(v, π, A, B), i.e., choose any paths ρ1, σ2,
and any paths ρ2 = 	, σ1 = 	, with ρ1 ◦ ρ2 ◦ v ◦ σ1 ◦ σ2 = π , and where ρ2 ◦ v ◦ σ1 ∈
paths(A, B). By the definition of a context, path ρ2 must start with an entry edge
a ∈ A, must end with an exit edge b ∈ B, and cannot contain any further occurrences
of any edge in A ∪ B. Moreover, path ρ2 ◦ v ◦ σ1 cannot contain any edge x ∈ X .
123
Real-Time Syst (2016) 52:38–87 77
Therefore, (ρ1 ◦ ρ2, σ1 ◦ σ2) is an element of
{(ρ, σ ) | ρ ◦ a ◦ σ = π, a ∈ A,
(x ∈ X, y ∈ A ∪ B, τ1, τ2, σ ′) : τ1 ◦ y ◦ τ2 ◦ x ◦ σ ′ = σ
}
.
Next, choose another element from occ(v, π, A, B), i.e., choose paths ρ′1, σ ′2,
and any paths ρ′2 = 	, σ ′1 = 	, with ρ′1 ◦ ρ′2 ◦ v ◦ σ ′1 ◦ σ ′2 = π , and where ρ′2 ◦ v ◦ σ ′1 ∈
paths(A, B),with (ρ′1, ρ′2, σ ′1, σ ′2) = (ρ1, ρ2, σ1, σ2). By the definition of a context,
path ρ′2 must start with an entry edge a ∈ A, must end with an exit edge b ∈ B, and
cannot contain any further occurrences of any edge in A∪B. Moreover, path ρ′2◦v◦σ ′1
cannot contain any edge x ∈ X . Therefore, (ρ′1 ◦ ρ′2, σ ′1 ◦ σ ′2) is an element of
{(ρ, σ ) | ρ ◦ a ◦ σ = π, a ∈ A,
(x ∈ X, y ∈ A ∪ B, τ1, τ2, σ ′) : τ1 ◦ y ◦ τ2 ◦ x ◦ σ ′ = σ
}
.
By Theorem 2, the paths ρ′2 ◦ v ◦ σ ′1 and ρ2 ◦ v ◦ σ1 are divergent, and therefore we
have (ρ1 ◦ρ2, σ1 ◦σ2) = (ρ′1 ◦ρ′2, σ ′1 ◦σ ′2). We see that, for any two different elements
(ρ1, ρ2, σ1, σ2) and (ρ′1, ρ′2, σ ′1, σ ′2) from occ(v, π, C), we get different elements (ρ1 ◦
ρ2, σ1 ◦ σ2) and (ρ′1 ◦ ρ′2, σ ′1 ◦ σ ′2) in
{(ρ, σ ) | ρ ◦ a ◦ σ = π, a ∈ A,
(x ∈ X, y ∈ A ∪ B, τ1, τ2, σ ′) : τ1 ◦ y ◦ τ2 ◦ x ◦ σ ′ = σ
}
,
hence
|occ(v, π, C)| ≤
∑
a∈A
|occ(a, π)| −
∑
x∈X
|occ(x, π)|.
unionsq
Proof of Theorem 11 It is easy to see that {moetv,π | π ∈ paths(S), σ ◦ π ◦ ρ ∈
T } ⊆ {moetv,π | π ∈ T }. Hence the property follows immediately. unionsq
Proof of Theorem 12 The algorithm starts by constructing the simple-history context
A, B of node v. By Theorem 3, paths(A, B) covers node v. Next, it obtains con-
texts A1, B1 and A1, B1 of node v, by performing a vertical split of context A, B.
By Theorem 4, paths(A1, B1)∪ paths(A2, B2) covers node v, and A1, B1 and
A2, B2 are divergent. The algorithm then performs a horizontal split of contexts
A1, B1 and A2, B2, thus obtaining contexts
M1 =
{
D, B1 ∩ ZD | D ∈ D1
}
, and M2 =
{
D, B2 ∩ ZD | D ∈ D2
}
.
By Theorem 5,
paths
(
A1, B1
) ∪
⋃
D∈D2
paths
(
D, B2 ∩ ZD
)
123
78 Real-Time Syst (2016) 52:38–87
covers node v. Again, by Theorem 5,
⋃
D∈D1
paths
(
D, B1 ∩ ZD
) ∪
⋃
D∈D2
paths
(
D, B1 ∩ ZD
)
=
⋃
D∈D1∪D2
paths
(
D, B1 ∩ ZD
)
covers node v.
By Theorem 5, paths(D, B1∩ZD) and paths(D′, B1∩ZD′) are divergent, for
D, D′ ∈ D1, with D = D′. Also, paths(D, B2 ∩ ZD) and paths(D′, B2 ∩ ZD′)
are divergent, for D, D′ ∈ D2, with D = D′.
Now choose any D ∈ D1 and D′ ∈ D2. Since the contexts A1, B1 and A2, B2
are divergent, and since
paths
(
D, B1 ∩ ZD
) ⊆ A1, B1 and paths
(
D′, B2 ∩ ZD′
) ⊆ A2, B2,
it follows that paths(D, B1 ∩ ZD) and paths(D′, B2 ∩ ZD′) are also divergent.
unionsq
Appendix 2: The TriCore TC1796 processor
The TriCore TC1796
The TC1796 is a fairly complex 32-bit microprocessor targeted at the automotive mar-
ket that offers simple versions of many of the performance-enhancing features found
in modern desktop and server processors, like caching, pipelining, and branch predic-
tion. Contrary to popular belief, the TriCore TC1796 has only a single processing core.
It, however, features three parallel instruction pipelines that allow parallel processing
of different types of instructions, as well as a separate floating point unit.
Here we provide an overview of three features that we consider particularly relevant
for WCET analysis: the memory subsystem, the instruction pipeline, and branch pre-
diction. More details about the TriCore architecture and the TC1796 microprocessor
can be found in the corresponding technical manuals (Infineon Technologies AG2006,
2007).
Figure 10 provides a high-level view on the structure of the bus systems of the
TC1796 processor: The basic design is based on a Harvard architecture, with sepa-
rate interfaces for program and data memory (PMI, DMI). On the whole, the memory
subsystem is fairly complex and allows for an abundance of different memory configu-
rations that can be chosen by the system designer. The system consists of the following
components:
Program memory interface (PMI): The PMI (cf. Fig. 11) is directly connected to the
CPU and is responsible for all accesses to program memory. It is equipped with
64KiB ofRAM, ofwhich 16KiB can be used as instruction cache (ICACHE) and of
which 48KiB can be used as scratchpadmemory (SPRAM). The ICACHE is a two-
way set-associative LRU cachewith a line size of 256 bits, and a validity granularity
123
Real-Time Syst (2016) 52:38–87 79
Fig. 10 Block diagram of the bus systems of the TC1796. Illustration taken from Infineon Technologies
AG (2007)
Fig. 11 Block diagram of the TC1796 program memory interface (PMI). Illustration taken from Infineon
Technologies AG (2007)
of 4 double-words (64 bit). The ICACHE can be globally invalidated to provide
support for cache coherency. The ICACHEcan be bypassed, to provide direct access
to the program local memory-bus (PLMB). The CPU interface supports unaligned,
123
80 Real-Time Syst (2016) 52:38–87
Fig. 12 Block diagram of the TC1796 data-memory interface (DMI). Illustration taken from Infineon
Technologies AG (2007)
Fig. 13 Timing of a LMB basic transaction. Illustration taken from Infineon Technologies AG (2007)
Fig. 14 Timing of a LMB block transaction. Illustration taken from Infineon Technologies AG (2007)
i.e., 16-bit aligned, accesses with a penalty of one cycle for unaligned accesses that
cross cache lines.
Data memory interface (DMI): The DMI (cf. Fig. 12) is directly connected to the
CPU and is responsible for all accesses to data memory. It is equipped with 64 KiB
of RAM, 8 KiB of which is dual-port RAM (DPRAM) that is accessible from the
CPU and from the remote peripheral bus (RPB), and of which 56 KiB is local data
memory (LDRAM). The CPU interface supports unaligned, i.e., 16-bit aligned,
accesses with a minimum penalty of one cycle for unaligned accesses that cross
cache lines. There is a directly accessible interface to the data local memory-bus
(DLMB) that provides access to the rest of the system.
Program local memory-bus (PLMB): The DLMB is a synchronous, pipelined bus that
connects the DMI to the rest of the data-memory system. The bus protocol supports
123
Real-Time Syst (2016) 52:38–87 81
Fig. 15 Block diagram of the
TC1796 program memory
unit (PMU). Illustration taken
from Infineon Technologies
AG (2007)
single transfers of 8, 16, 32, and 64 bits (cf. Fig. 13), as well as block transfers
of 64 bits (cf. Fig. 14). The PLMB is managed by the program local memory-bus
control unit (PBCU), which handles requests from PLMB master devices, which
are the PMI and the program memory unit (PMU). Access arbitration takes place
in each cycle that precedes a possible address cycle, and is based on the priority
of the requesting master device. The PMI has priority over the PMU. Busy slave
devices can delay the start of a PLMB transaction.
Data local memory-bus (DLMB): The DLMB is a synchronous, pipelined bus that
connects the DMI to the rest of the data-memory system. The bus protocol supports
single transfers of 8, 16, 32, and 64 bits (cf. Fig. 13), as well as block transfers of 64
bits (cf. Fig. 14). The DLMB is managed by the data local memory-bus control unit
(DBCU), which handles requests from PLMB master devices, which are the DMI
and the data-memory unit (DMU). Access arbitration takes place in each cycle that
precedes a possible address cycle, and is based on the priority of the requesting
master device. The DMI has priority over the DMU. Busy slave devices can delay
the start of a DLMB transaction.
Program local memory-bus control unit (PBCU): The PBCU is responsible for man-
aging data transfers on the PLMB.
Data local memory-bus control unit (DBCU): ThePBCU is responsible formanaging
data transfers on the DLMB.
Program memory unit (PMU): The PMU (cf. Fig. 15) is connected to the PLMB. It
is equipped with 2MiB of program flash memory (PFLASH), 128 KiB of data flash
memory (DFLASH), and 16 KiB of boot ROM (BROM).
Data memory unit (DMU): The DMU (cf. Fig. 16) is connected to the DLMB. It is
equipped with 64 KiB of SRAM and 16 KiB of standby memory (SBRAM).
Local memory interface (LMI): The LMI is a part of the DMU. It allows the DMI and
the DMU to access the PLMB, thereby enabling data transfers to and from other
PLMB devices, like the EBU.
123
82 Real-Time Syst (2016) 52:38–87
Fig. 16 Block diagram of the TC1796 data-memory unit (DMU). Illustration taken from Infineon Tech-
nologies AG (2007)
Fig. 17 Block diagram of the TC1796 external bus unit (EBU). Illustration taken from Infineon Technolo-
gies AG (2005)
External bus unit (EBU): The EBU (cf. Fig. 17) is connected to the PLMB and serves
as an interface to external memory or peripheral units. It supports asynchronous
or burst-mode external accesses. The external bus may be shared with other bus
masters. Arbitration can be performed either by the EBU, or by an external bus
master.
Local Memory to FPI bridge (LFI bridge): The LFI forms a bi-directional bridge
between the DLMB and the peripheral FPI bus.
Figure 18 provides a high-level view of the structure of the TC1796 CPU, which
consists of the following components:
Instruction fetch unit: The instruction-fetch unit pre-fetches and aligns incoming
instructions from the PMI and issues them to the appropriate instruction pipeline.
123
Real-Time Syst (2016) 52:38–87 83
Fig. 18 Block diagram of the TC1796 central processing unit (CPU), indicating the superscalar pipeline
design. Illustration taken from Infineon Technologies AG (2005)
Execution unit: The execution unit consists of three parallel pipelines, each of which
can process a different type of instructions. The integer pipeline and the load/store
pipeline each consist of the following four stages: fetch, decode, execute, and write-
back. The loop pipeline consists of the two stages: decode and write-back. The
integer pipeline handles data arithmetic instructions, including data conditional
jumps. The load/store pipeline handles load/store memory accesses, address arith-
metic, unconditional jumps, calls, and context switches. The loop pipeline handles
loop instructions, providing zero-overhead loops. The execution unit alsomaintains
the program counter.
General purpose register file (GPR): The GPR provides 16 address registers and 16
data registers.
CPU slave interface (CPS): The CPS provides accesses to the interrupt service
requests registers.
Floating point unit (FPU): The FPU is an optional, partially IEEE-754 compatible
component for processing floating-point instructions.
Individual instructions may experience a jitter in execution time, due to pipeline
stalls. Figure 19 illustrates an example of a pipeline hazard that is resolved by a pipeline
stall: In this case, the integer pipeline is processing a multiply-and-accumulate (MAC)
instruction, which requires two cycles in the execute stage. At the same time the
load/store pipeline is processing a load instruction to the write register of the MAC
instruction, which results in a write-after-write hazard.
For conditional branch instructions, the TC1796 uses a simple, static predictor
that implements the following rules: Backward and short forward branches (16-bit
branches with positive displacement) are predicted taken. Long forward branches are
predicted not taken. Table 5 summarizes the cycle penalties for each combination of
predicted and actual behavior.
123
84 Real-Time Syst (2016) 52:38–87
Fig. 19 Example of a pipeline hazard in the TC1796 CPU. Illustration taken from Infineon Technologies
AG (2000)
Table 5 Branch penalties of the
TC1796 processor, for all
combinations of prediction and
actual outcome
Prediction Outcome Penalty (cycles)
Not taken Not taken 1
Not taken Taken 3
Taken Not taken 3
Taken Taken 2
Fig. 20 Block schematics of the TriBoard-T179X, indicating the connection to external memory, as well
as the OCDS debugging interface. Illustration taken from Infineon Technologies AG (2005)
123
Real-Time Syst (2016) 52:38–87 85
The TriBoard TC179X evaluation board
TheTriBoard is equippedwith 4MBofBurst Flashmemory and1MBof asynchronous
SRAM, which are both connected to the processing core via the External Bus Unit of
the processor, and these are the only devices that are connected to the EBU (cf. Fig. 20).
The Clock Generation Unit, which is controlled by an external crystal oscillator,
produces a clock signal fOSC at 20 MHz. The CPU clock runs at 150 MHz, and the
system clock at 75 MHz. More details can be found in the board manual (Infineon
Technologies AG 2005).
References
Audsley NC, Burns A, Richardson MF, Wellings AJ (1991) Hard real-time scheduling: the deadline-
monotonic approach. In: Halang WA, Ramamritham K (eds) Proceedings of 8th IEEE workshop
on real-time operating systems and software (RTOSS’91), pp. 127–132. Pergamon Press
Bernat G, Colin A, Petters S (2002) Wcet analysis of probabilistic hard real-time systems. In: Proceedings
of 23rd real-time systems symposium (RTSS’02), pp. 279–288, Austin, Texas, USA
Bernat G, Colin A, Petters S (2003) pwcet: a tool for probabilistic worst-case execution time analysis of
real-time systems. In: 3rd international workshop on worst-case execution time analysis (WCET’03),
pp. 21–38, Porto, Portugal
Bünte S, Zolda M, Kirner R (2011) Let’s get less optimistic in measurement-based timing analysis. In: 6th
IEEE international symposium on industrial embedded systems (SIES’11), Los Alamitos, CA. IEEE
ColinA, Puaut I (2000)Worst case execution time analysis for a processorwith branch prediction.Real-Time
Syst 18(2):249–274
Dijkstra E (1970) Notes on structured programming. Circulated privately
Dertouzos M,Mok AK (1989) Multiprocessor online scheduling of hard-real-time tasks. IEEE Trans Softw
Eng 15(12):1497–1506
Engblom J, Ermedahl A (1999) Pipeline timing analysis using a trace-driven simulator. In: Proceedings
of 6th international conference on real-time computing systems and applications (RTCSA ’99), pp.
88–95, Hong Kong, China
Ferdinand C, Martin F, Wilhelm R, Alt M (1999) Cache behavior prediction by abstract interpretation. Sci
Comput Progr 35(2–3):163–189
Holzer A, Schallhart C, Tautschnig M, Veith H (2008) Fshell: systematic test case generation for dynamic
analysis and measurement. In: Proceedings of 20th international conference on computer aided veri-
fication (CAV’08). LNCS, vol. 5123, pp. 209–213. Springer, Princeton
Holzer A, Schallhart C, TautschnigM, Veith H (2011) An introduction to test specification in fql. In: Sharon
B, Daniel K, Orna R (eds) Proceedings of Haifa Verification Conference (HVC’10), Lecture Notes in
Computer Science, vol. 6504, pp. 9–22. Springer, Haifa
Infineon Technologies AG (2000) St.-Martin-Strasse 53, D-81541 München, Germany. TriCore(TM) 1
Pipeline Behaviour and Instruction Execution Timing
Infineon Technologies AG (2005) St.-Martin-Strasse 53, D-81541 München, Germany. TriBoard TC179X
Hardware Manual
Infineon Technologies AG (2006) St.-Martin-Strasse 53, D-81541 München, Germany. TriCore 1 32-bit
Unified Processor Core
Infineon Technologies AG (2007) St.-Martin-Strasse 53, D-81541 München, Germany. TC1796 32-Bit
Single-Chip Microcontroller
Kirner R, Wenzel I, Rieder B, Puschner P (2005) Intelligent systems at the service of mankind. Chapter
using measurements as a complement to static worst-case execution time analysis, vol 2. UBooks,
Augsburg
Liu CL, Layland JW (1973) Scheduling algorithms for multiprogramming in a hard-real-time environment.
J ACM 20(1):46–61
Li Y-TS, Malik S (1997) Performance analysis of embedded software using implicit path enumeration.
IEEE Trans Comput-Aided Des Integr Circuits Syst 16(12):1477–1487
123
86 Real-Time Syst (2016) 52:38–87
Li Y-TS, Malik S, Wolfe A (1995) Efficient microarchitecture modeling and path analysis for real-time
software (rtss’95). In: 16th IEEE real-time systems symposium (RTSS’95), pp. 298–307. Pisa
Li Y-TS,Malik S,Wolfe A (1996) Cachemodeling for real-time software: beyond direct mapped instruction
caches. In: Proceedings of 17th IEEE real-time systems symposium (RTSS’96), pp. 254 –263. IEEE,
Washington
Li Y-TS, Malik S, Wolfe A (1999) Performance estimation of embedded software with instruction cache
modeling. ACM Trans Des Autom Electron Syst 4(3):257–279
Lundqvist T, Stenström P (1998) Integrating path and timing analysis using instruction-level simulation
techniques. In: Proceedings of ACM SIGPLAN workshop on languages, compilers, and tools for
embedded systems (LCTES’98), pp. 1–15. Springer, New York
Lundqvist T, Stenström P (1999) An integrated path and timing analysis method based on cycle-level
symbolic execution. Real-Time Syst 17(2–3):183–207
Nassi I, Shneiderman B (1973) Flowchart techniques for structured programming. SIGPLAN Not 8(8):12–
26
Ottosson G, Sjödin M (1997) Worst case execution time analysis for modern hardware architectures. In:
ACM SIGPLANworkshop on languages, compilers and tools for real-time systems, pp. 47–55. ACM,
Las Vegas
Paparazzi (2012) The free autopilot. http://paparazzi.enac.fr/, August
Park CY, Shaw A (1991) Experiments with a program timing tool based on source-level timing schema.
IEEE Comput 24:48–57
Puschner P, Koza C (1989) Calculating the maximum execution time of real-time programs. Real-Time
Syst 1(2):159–176
Puschner P, Schedl A (1993) A tool for the computation of worst case task execution times. In: Proceedings
of 5th Euromicro workshop on real-time systems (EURO-RTS’93), pp. 224–229. IEEE, Oulu
Puschner P, Schedl A (1997) Computing maximum task execution time—a graph-based approach. J Real-
Time Syst 13(1):67–91
Puschner P (1998) A tool for high-level language analysis of worst-case execution times. In: Proceedings
of 10th Euromicro workshop on real-time systems (Euro-Rts’98), pp. 130–137. Berlin
Schoeberl M (2009) JOP reference handbook: building embedded systems with a java processor. Cre-
ateSpace
Schneider J, Ferdinand C (1999) Pipeline behavior prediction for superscalar processors by abstract inter-
pretation. SIGPLAN Not 34(7):35–44
Shaw A (1989) Reasoning about time in higher-level language software. IEEE Trans Softw Eng 15(7):875–
889
Stappert F, Altenbernd P (2000) Complete worst-case execution time analysis of straight-line hard real-time
programs. J Syst Architect 46(4):339–355
Stattelmann S,Martin F (2010) On the use of context information for precisemeasurement-based execution-
time estimation. In: Björn L (ed) Proceedings of 10th international workshop on worst-case execution
time (WCET) Analysis (WCET’10). OpenAccess Series in Informatics (OASIcs), vol. 15, pp. 64–76.
Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, Brussels
Theiling H, Ferdinand C (1998) Combining abstract interpretation and ilp for microarchitecture modelling
and program path analysis. In: Proceedings of 19th IEEE real-time systems symposium (RTSS’98),
pp. 144–153. Madrid
Thesing S, Souyris J, Heckmann R, Randimbivololona F, Langenbach M, Wilhelm R, Ferdinand C (2003)
An abstract interpretation-based timing validation of hard real-time avionics software. In: Proceedings
of 2003 international conference on dependable systems and networks (DSN’03). IEEE, San Francisco
vonHanxleden R, Holsti N, Lisper B, Ploedereder E,WilhelmR, Bonenfant A, Cassé H, Bünte S, FellgerW,
Gepperth S, Gustafsson J, Huber B, Islam NM, Kästner D, Kirner R, Kovacs L, Krause F, de Michiel
M, Olesen MC, Prantl A, Puffitsch W, Rochange C, Schoeberl M, Wegener S, Zolda M, Zwirchmayr
J (2011) Wcet tool challenge 2011: Report. In: Proceedings of 11th international workshop on worst-
case execution time (WCET) analysis (WCET’11), Porto, Portugal. The analysis problems for the
WCET Tool Challenge can be found at http://www.mrtc.mdh.se/projects/WCC/2011
Wenzel I, Kirner R, Rieder B, Puschner P (2009) Measurement-based timing analysis. In: Proceedings
of the 3rd international symposium on leveraging applications of formal methods, verification and
validation (ISoLA’08). Communications in computer and information science, vol. 17, pp. 430–444.
Springer, Porto Sani
123
Real-Time Syst (2016) 52:38–87 87
Xu J, Parnas D (1990) Scheduling processes with release times, deadlines, precedence and exclusion rela-
tions. IEEE Trans Softw Eng 16(3):360–369
Zolda M (2012) Precise measurement-based worst-case execution time estimation. PhD thesis, Vienna
University of Technology, Karlsplatz 13, Vienna
Michael Zolda is a research fellow at the University of Hertford-
shire, UK. He received his doctoral degree from Vienna Univer-
sity of Technology in 2012. From 2007 to 2011 he worked on
the FWF/DFG Research Project FORTAS-RT on execution time
analysis of real-time systems. He has published multiple papers at
acclaimed international conferences and workshops. Currently he
is working on dependable stream processing systems within the
EC/transnational research project CRAFTERS. He is also taking
part in the European ICT COST Action TACLe (Timing Analysis on
Code-Level).
Raimund Kirner is a Reader in Cyberphysical Systems at the Uni-
versity of Hertfordshire. He has published more than 90 refereed
journal and conference papers and received two patents. He received
his PhD in 2003 from the TU Vienna and his Habilitation in 2010.
His research focus is on embedded computing, parallel computing,
and system reliability. He currently works on adequate hardware
and software architectures to bridge the gap between the many-core
computing and embedded computing. He also published excessively
on worst-case execution time analysis and served as PC chair of
WDES’06, WCET’08, and SEUS’13. He is the local principal inves-
tigator of the Artemis-JU Project CRAFTERS and was local co-
investigator of the FP7 Project ADVANCE. Further, he has been
the principal investigator of three research projects funded by the
Austrian Science Foundation (COSTA, FORTAS, SECCO). He is a
member of the IEEE, the ACM, and the IFIP Working Group 10.4
(Embedded Systems).
123
