How to Compute Worst-Case Execution Time by Optimization Modulo Theory
  and a Clever Encoding of Program Semantics by Henry, Julien et al.
ar
X
iv
:1
40
5.
79
62
v1
  [
cs
.PL
]  
30
 M
ay
 20
14
How to Compute Worst-Case Execution Time by Optimization
Modulo Theory and a Clever Encoding of Program Semantics∗
Julien Henry Mihail Asavoae David Monniaux Claire Ma¨ıza
October 17, 2018
Abstract
In systems with hard real-time constraints, it is
necessary to compute upper bounds on the worst-
case execution time (WCET) of programs; the
closer the bound to the real WCET, the better.
This is especially the case of synchronous reactive
control loops with a fixed clock; the WCET of the
loop body must not exceed the clock period.
We compute the WCET (or at least a close up-
per bound thereof) as the solution of an optimiza-
tion modulo theory problem that takes into ac-
count the semantics of the program, in contrast
to other methods that compute the longest path
whether or not it is feasible according to these
semantics. Optimization modulo theory extends
satisfiability modulo theory (SMT) to maximiza-
tion problems.
Immediate encodings of WCET problems into
SMT yield formulas intractable for all current
production-grade solvers — this is inherent to the
DPLL(T) approach to SMT implemented in these
solvers. By conjoining some appropriate “cuts” to
these formulas, we considerably reduce the com-
∗The research leading to these results has received fund-
ing from the French Agence nationale de la recherche,
grant W-SEPT (ANR-12-INSE-0001), and from from the
European Research Council under the European Union’s
Seventh Framework Programme (FP7/2007–2013) / ERC
grant agreement 306595 “STATOR”.
This article was also published in the proceedings of the
2014 ACM SIGPLAN Conference on Languages, Compilers
and Tools for Embedded Systems (LCTES).
putation time of the SMT-solver.
We experimented our approach on a variety
of control programs, using the OTAWA analyzer
both as baseline and as underlying microarchitec-
tural analysis for our analysis, and show notable
improvement on the WCET bound on a variety of
benchmarks and control programs.
1 Introduction
In embedded systems, it is often necessary to
ascertain that the worst-case execution time
(WCET) of a program is less than a certain
threshold. This is in particular the case for syn-
chronous reactive control loops (infinite loops that
acquire sensor values, compute appropriate ac-
tions and update, write them to actuators, and
wait for the next clock tick) [7]: the WCET of the
loop body (“step”) must be less than the period
of the clock.
Computing the WCET of a program on a
modern architecture requires a combination of
low-level, microarchitectural reasoning (regarding
pipeline and cache states, busses, cycle-accurate
timing) and higher-level reasoning (program con-
trol flow, loop counts, variable pointers). A com-
mon approach is to apply a form of abstract inter-
pretation to the microarchitecture, deduce worst-
case timings for elementary blocks, and reassem-
ble these into the global WCET according to the
control flow and maximal iteration counts using
integer linear programming (ILP) [38, 40].
1
One pitfall of this approach is that the reassem-
bly may take into account paths that cannot actu-
ally occur in the real program, possibly overesti-
mating the WCET. This is because this reassem-
bly is mostly driven by the control-flow structure
of the program, and (in most approaches) ignores
semantic conditions. For instance, a control pro-
gram may (clock-)enable certain parts of the pro-
gram according to modular arithmetic with re-
spect to time:
i f ( c lock % 4==0) { /∗ A ∗ / }
/∗ u n r e l a t e d c o d e ∗ /
i f ( c lock % 12==1) { /∗ B ∗ / }
These arithmetic constraints entail that certain
combinations of parts cannot be active simulta-
neously (sections A and B are mutually incom-
patible). If such constraints are not taken into
account (as in most approaches), the WCET will
be grossly over-estimated.
The purpose of this article is to take such se-
mantic constraints into account, in a fully auto-
mated and very precise fashion. Specifically, we
consider the case where the program for which
WCET is to be determined contains only loops
for which small static bounds can be determined
(but our approach can also be applied to general
programs through summarization, see section 8).
This is very commonly the case for synchronous
control programs, such as those found in aircraft
fly-by-wire controls [37]. Programs of this form
are typically compiled into C from high-level data-
flow synchronous programming languages such as
Simulink1, Lustre or Scade2 [7].
We compute the WCET of such programs by ex-
pressing it as the solution of an optimization mod-
ulo theory problem. Optimization modulo the-
ory is an extension of satisfability modulo theory
1
Simulink
TM is a block diagram environment for mul-
tidomain simulation and model-based design from The
Mathworks.
2
Scade
TM is a model-based development environment
dedicated to critical embedded software, from Esterel Tech-
nologies, derived from the academic language Lustre.
(SMT) where the returned solution is not just any
solution, but one maximizing some objective; in
our case, solutions define execution traces of the
program, and the objective is their execution time.
Expressing execution traces of programs as solu-
tions to an SMT problem is a classical approach in
bounded model checking ; typically, the SMT prob-
lem includes a constraint stating that the execu-
tion trace reaches some failure point, and an “un-
satisfiable” answer means that this failure point
is unreachable. In the case of optimization, the
SMT solver has to disprove the existence of solu-
tions greater than the maximum to be returned —
in our case, to disprove the existence of traces of
execution time greater than the WCET. Unfortu-
nately, all currently available SMT solvers take un-
acceptably long time to conclude on naive encod-
ings of WCET problems. This is because all these
solvers implement variants of the DPLL(T) ap-
proach [25], which has exponential behavior on so-
called “diamond formulas”, which appear in naive
encodings of WCET on sequences of if-then-elses.
Computing or proving the WCET by direct,
naive encoding into SMT therefore leads to in-
tractable problems, which is probably the reason
why, to our best knowledge, it has not been pro-
posed in the literature. We however show how an
alternate encoding, including “cuts”, makes such
computations tractable. Our contributions are:
1. The computation of worst-case execution
time (WCET), or an over-approximation
thereof, by optimization modulo theory. The
same idea may also be applicable to other
similar problems (e.g. number of calls to
a memory allocator). Our approach ex-
hibits a worst-case path, which may be use-
ful for targeting optimizations so as to lower
WCET [41].
2. The introduction of “cuts” into the encoding
so as to make SMT-solving tractable, without
any change in the code of the SMT solver.
The same idea may extend to other problems
2
Input
Binary
CFG
Recon-
struction
Control-
flow Graph
Loop
Bound
Analysis
Value
Analysis
Control-
flow
Analysis
Annotated
CFG
Basic Block
Timing
Info
Micro-
architectural
Analysis
Path
Analysis
Legend:
Data
Phase
Figure 1: WCET analysis workflow
with an additive or modular structure.
In section 2, we recall the usual approach for
the computation of an upper bound on WCET.
In section 3, we recall the general framework of
bounded model checking using SMT-solving. In
section 4, we explain how we improve upon the
“normal” SMT encoding of programs so as to
make WCET problems tractable, and in section 5
we explain (both theoretically and practically)
why the normal encoding results in intractable
problems. In section 6 we describe our implemen-
tation and experimental results. We present the
related work in section 7, we discuss possible ex-
tensions and future works in section 8, and then,
in section 9 we draw the conclusions.
2 Worst-Case Execution Time
Let us first summarize the classical approach
to static timing analysis (for more detail, read
e.g. [38, 40]). Figure 1 shows the general tim-
ing analysis workflow used in a large part of
WCET tools including industrial ones such as
AiT3 or academic ones such as OTAWA4 [2] or
3http://www.absint.com/ait/
4http://www.otawa.fr
Chronos5 [27]. For the sake of simplicity, we
shall restrict ourselves to mono-processor plat-
forms with no bus-master devices except for the
CPU.
The analysis considers the object code. The
control flow graph is first reconstructed from the
binary. Then, a value analysis (e.g. abstract in-
terpretation for interval analysis) extracts mem-
ory addresses, loop bounds and simple infeasible
paths [17]; such an analysis may be performed
on the binary or the source files (in the latter
case, it is necessary to trace object code and low-
level variables to the source code, perhaps using
the debugging information provided by the com-
piler). This semantic and addressing informa-
tion help the micro-architectural analysis, which
bounds the execution time of basic blocks taking
into account the whole architecture of the plat-
form (pipeline, caches, buses,...)[16, 35]. The most
popular method to derive this architecture analy-
sis is abstract interpretation with specific abstract
domains. For instance, a pipeline abstraction rep-
resents sets of detailed pipeline states, including
values for registers or buffers [16]; while a cache
abstraction typically tracks which value may or
must be in each cache line [35].
The last step of the analysis uses the basic block
execution time and the semantic information to
derive the WCET, usually, in the “implicit path
enumeration technique” (IPET) approach, as the
solution of an integer linear program (ILP) [28].
The ILP variables represent the execution counts
(along a given trace) of each basic block in the pro-
gram. The ILP constraints describe the structure
of the control flow graph (e.g. the number of times
a given block is entered equals the number of times
it is exited), as well as maximal iteration counts
for loops, obtained by value analysis or provided
by the user. Finally, the execution time to be max-
imized is the sum of the basic blocks weighted by
their local worst-case execution time computed by
the microarchitectural analysis.
5http://www.comp.nus.edu.sg/∼rpembed/chronos/
3
The obtained worst-case path may however be
infeasible semantically, for instance, if a condition
tests x < 10 and later the unmodified value of x
is again tested in a condition x > 20 along that
path. This is because the ILP represents mostly
syntactic information from the control-flow graph.
This weakness has long been recognized within
the WCET community, which has devised schemes
for eliminating infeasible worst-case paths, for in-
stance, by modifying the control-flow graph be-
fore the architecture analysis [33], or by adding
ILP constraints [19, 17]. Infeasible paths are found
via pattern matching of conditions [19] or apply-
ing abstract execution [17]; these methods focus
on paths made infeasible by numeric constraints.
These approaches are limited by the expressive-
ness of ILP constraints as used in IPET: they
consider only “conflict conditions” (exclusive con-
ditional statements: “if condition a is true then
condition b must be false”).
On a loop-free program, the ILP approach
is equivalent to finding the longest path in the
control-flow graph, weighted according to the lo-
cal WCET of the basic blocks. Yet, again, this
syntactic longest path may be infeasible. Instead
of piecemeal elimination of infeasible paths, we
propose encoding the set of feasible paths into
an SMT formula, as done in bounded model-
checking; the success of SMT-solving is based on
the ability of SMT solvers to exclude whole groups
of spurious solutions by learning lemmas.
Loop-free programs without recursion may seem
a very restricted class, but in safety-critical control
systems, it is common that the program consists in
one big infinite control loop whose body must ver-
ify a WCET constraint, and this body itself does
not contain loops, or only loops with small static
bounds (say, for retrieving a value from an interpo-
lation table of known static size), which can be un-
rolled. Such programs typically eschew more com-
plicated algorithms, if only because arguing for
their termination or functional correctness would
be onerous with respect to the stringent require-
/∗ S ∗ /
i f ( b ) {
x = x + 2 ; /∗ C ∗ /
} e lse {
x = x + 3 ; /∗ D ∗ /
}
a s s e r t ( x >= 1 0 ) ;
First-order encoding:
((b ∧ x2 = x1 + 2) ∨ (¬b ∧ x2 = x1 + 3))∧x2 ≥ 10
Or, if the logic language comprises the “if then
else” operator: ite(b, x1 + 2, x1 + 3) ≥ 10
If one wants to record the execution trace finely:
(C ⇔ S ∧ b) ∧ (D ⇔ S ∧ ¬b) ∧
(C ⇒ x2 = x1+2)∧ (D ⇒ x2 = x1+3)∧ x2 ≥ 10
Figure 2: Encoding of a simple program into a
first-order logic formula
ments imposed by the authorities. Complicated or
dynamic data structures are usually avoided [28,
ch. II]. This is the class of programs targeted by
e.g. the Astre´e static analyzer [14].
Our approach replaces the path analysis by ILP
(and possibly refinement for infeasible paths) by
optimization modulo theory. The control-flow ex-
traction and micro-architectural analysis are left
unchanged, and one may thus use existing WCET
tools. In this paper we consider a simple architec-
ture (ARMv7), though we plan to look into more
complicated ones and address, for example, per-
sistency analyses for caches, like in [23].
3 Using Bounded Model Check-
ing to Measure Worst-Case
Execution Time
Bounded model checking is an approach for find-
ing software bugs, where traces of length at most
n are exhaustively explored. In most current ap-
proaches, the set of feasible traces of length n is
defined using a first-order logic formula, where,
4
roughly speaking, arithmetic constraints corre-
spond to tests and assignments, control flow is
encoded using Booleans, and disjunctions corre-
spond to multiple control edges. The source pro-
gram may be a high-level language, an inter-
mediate code (e.g. Java bytecode, LLVM bit-
code [26, 20], Common Intermediate Language. . . )
or even, with some added difficulty, binary exe-
cutable code [8].
The first step is to unroll all loops up to stati-
cally determined bounds. Program variables and
registers are then mapped to formula variables
(implicitly existentially quantified). In an impera-
tive language, but not in first-order logic, the same
variable may be assigned several times: there-
fore, as in compilation to static single assignment
(SSA) form, different names have to be introduced
for the same program variable, one for each update
and others for variables whose value differs accord-
ing to where control flows from (Fig. 2). If the
source program uses arrays or pointers to mem-
ory, the formula may need to refer not only to
scalar variables, but also to uninterpreted func-
tions and functional arrays [25]. Modern SMT-
solvers support these datatypes and others suit-
able for the analysis of low-level programs, such
as bit-vectors (fixed-width binary arithmetic). If
constructs occur in the source program that can-
not be translated exactly into the target logic (e.g.
the program has nonlinear arithmetic but the logic
does not), they may be safely over-approximated
by nondeterministic choice. Details on “conven-
tional” first-order encodings for program traces
are given in the literature on bounded model
checking [11] and are beyond the scope of this ar-
ticle.
Let us now see how to encode a WCET problem
into SMT. In a simple model (which can be made
more complex and realistic, see section 8), each
program block i has a fixed execution time ti ∈ N,
and the total execution time T is the sum of the
execution times of the blocks encountered in the
trace. This execution time can be incorporated
into a “conventional” encoding for program traces
in two ways:
Sum encoding If Booleans χi ∈ {0, 1} record
which blocks i were reached by the execution
trace τ , then
T (τ) =

 ∑
i|χi=true
ti

 =
(∑
i
χiti
)
(1)
Counter encoding Alternatively, the program
may be modified by adding a time counter as
an ordinary variable, which is incremented in
each block. The resulting program then un-
dergoes the “conventional” encoding: the fi-
nal value of the counter is the execution time.
An alternative is to attach a cost to transitions
instead of program blocks. The sum encoding is
then done similarly, with Booleans χi,j ∈ {0, 1}
recording which of the transitions have been taken
by an execution trace τ .
T (τ) =

 ∑
(i,j)|χi,j=true
ti,j

 =

∑
(i,j)
χi,jti,j

 (2)
The problem is now how to determine the
WCET β = maxT (τ). An obvious approach is
binary search [36], maintaining an interval [l, h]
containing β: take a middle pointm := ⌈ l+h2 ⌉, test
whether there exists a trace τ such that T (τ) ≥ m;
if so, then set l := m (or set l := T (τ), if avail-
able) and restart, else set h := m− 1 and restart;
stop when the integer interval [l, h] is reduced to a
singleton. l and h may be respectively initialized
to zero and a safe upper bound on worst-case exe-
cution time, for instance one obtained by a simple
“longest path in the acyclic graph” algorithm.
4 Adding Cuts
Experiments with both sum encoding and counter
encoding applied to the “conventional” encoding
5
of programs into SMT were disappointing: the
SMT solver was taking far too much time. In par-
ticular, the last step of computing WCET, that
is, running the SMT-solver in order to disprove
the existence of traces longer than the computed
WCET, was agonizingly slow even for very small
programs. In section 5 we shall see how this is in-
herent to how SMT-solvers based on DPLL(T) —
that is, all current production-grade SMT-solvers
— handle the kind of formulas generated from
WCET constraints; but let us first see how we
worked around this problem so as to make WCET
computations tractable.
4.1 Rationale
A key insight is that the SMT-solver, applied to
such a naive encoding, explores a very large num-
ber of combinations of branches (exponential with
respect to the number of tests), thus a very large
number of partial traces τ1, . . . , τn, even though
the execution time of these partial traces is insuf-
ficient to change the overall WCET (section 5 will
explain this insight in more detail, both theoreti-
cally and experimentally).
Consider the control-flow graph in Fig. 3; let
t1, . . . , t7 be the WCET of blocks 1 . . . 7 estab-
lished by microarchitectural analysis (for the sake
of simplicity, we neglect the time taken for deci-
sions). Assume we have already found a path from
start to end going through block 6, taking β time
units; also assume that t1+ t2+max(t3, t4)+ t5+
t7 ≤ β. Then it is useless for the SMT-solver to
search for paths going through decision 2, because
none of them can have execution time longer than
β; yet that is what happens if using a naive en-
coding with all current production SMT-solvers
(see section 5). If instead of 1 decision we have
42, then the solver may explore 242 paths even
though there is a simple reason why none of them
will increase the WCET.
Our idea is simple: to the original SMT formula
(from “counter encoding” or “sum encoding”),
conjoin constraints expressing that the total exe-
block 1
(start)
Decision 1
block 2
Decision 2
block 3 block 4
block 5
block 6
block 7
(end)
P2
P1
Figure 3: Two portions P1 and P2 of a program
obtained as the range between a node with several
incoming edges and its immediate dominator
cution time of some portions of the program is less
than some upper bound (depending on the por-
tion). This upper bound acts as an “abstraction”
or “summary” of the portion (e.g. here we say that
the time taken in P2 is at most max(t3, t4) + t5),
and the hope is that this summary is sufficient for
the SMT-solver in many cases. There remain two
problems: how to select such portions, and how to
compute this upper bound.
Note that these extra constraints are implied
by the original formula, and thus that conjoin-
ing them to it does not change the solution set or
the WCET obtained, but only the execution time
of the analysis. Such constraints are often called
“cuts” in operation research, thus our terminol-
ogy.
4.2 Selecting portions
The choice of a portion of code to summarize fol-
lows source-level criteria: for instance, a proce-
dure, a block, a macro expansion. If operating
on a control-flow graph, a candidate portion can
be between a node with several incoming edges
and its immediate dominator, if there is non trivial
6
control flow between the two (Fig. 3).6 On struc-
tured languages, this means that we add one con-
straint for the total timing of every “if-then-else”
or “switch” statement (recall that loops are un-
rolled, if needed into a cascade of “if-then-else”).
This is the approach that we followed in our ex-
perimental evaluation (section 6).
Let us however remark that these portions of
code need not be contiguous: with the sum encod-
ing, it is straightforward to encode the fact that
the total time of a number of instruction blocks is
less than a bound, even though these instructions
blocks are distributed throughout the code. This
is also possible, but less easy, with the counter
encoding (one has to encode an upper bound on
the sum of differences between starting and ending
times over all contiguous subsets of the portion).
This means that it is possible to consider por-
tions that are semantically, but not syntactically
related. For instance, one can consider for each
Boolean, or other variable used in a test, a kind
of “slice” of the program that is directly affected
by this variable (e.g. all contents of if-then-elses
testing on this variable) and compute an upper
bound for the total execution time of this slice —
in the example in the introduction where the exe-
cution of two portions A and B depend on a vari-
able clock, we could compute an upper bound on
the total time of the program sliced with respect
to clock, that only contains the portions A and
B. Implementing this “slicing” approach is part of
our future work.
6A dominator D of a block B is a block such that any
path reaching B must go through D. The immediate dom-
inator of a block B is the unique I 6= B dominator of B
such that I does not dominate any other dominator D 6= B
of B. For instance, the immediate dominator of the end of
a cascade of if-then-else statements is the beginning of the
cascade.
4.3 Obtaining upper bounds on the
WCET of portions
Let us now consider the problem of, given a por-
tion, computing an upper bound on its WCET. In
the case of a contiguous portion, an upper bound
may be obtained by a simple syntactic analysis:
the longest syntactic path is used as a bound
(even though it might be unfeasible). This ap-
proach may be extended to non-contiguous por-
tions. Let us denote by P the portion. For each
block b, let tb be the upper bound on the time
spent in block b (obtained from microarchitec-
tural analysis), and let wb be an unknown de-
noting the worst time spent inside P in paths
from the start of the program to the beginning
of b. If b1, . . . , bk are the predecessors of b, then
wb = max(wb1 + tb1 .χP (b1), . . . , wbk + tbk .χP (bk))
where χP (x) is 1 if x ∈ P , 0 otherwise. This sys-
tem of equations can be easily solved in (quasi)
linear time by considering the wb in a topolog-
ical order of the blocks (recall that we consider
loop-free programs). Another approach would be
to recursively call the complete WCET procedure
on the program portion, and use its output as a
bound.
The simpler approach described above gave ex-
cellent results in most benchmarks, and we had to
refine the cuts with the SMT-based procedure for
only one benchmark (see section 6).
4.4 Example
Let us now see a short, but complete example, ex-
tracted from a control program composed of an
initialization phase followed by an infinite loop
clocked at a precise frequency. The goal of the
analysis is to show that the WCET of the loop
body never exceeds the clocking period. For the
sake of brevity, we consider only a very short ex-
tract of the control program, implementing a “rate
limiter”; in the real program its input is the re-
sult of previous computation steps, but here we
consider that the input is nondeterministic within
7
[−10000,+10000]. The code run at every clock
tick is:
/ / r e t u r n s a v a l u e be tween min and max
extern i n t input ( i n t min , i n t max ) ;
void r a t e l i m i t e r s t e p ( ) {
i n t x old = input ( −10000 ,10000) ;
i n t x = input (−10000 ,10000) ;
i f ( x > x old +10)
x = x old +10;
i f ( x < x old −10)
x = x old −10;
x old = x ;
}
This program is compiled to LLVM bitcode,7
then bitcode-level optimizations are applied, re-
sulting in a LLVM control-flow graph (Fig. 5 left).
From this graph we generate a first-order formula
including cuts (Fig. 5 right). Its models describe
execution traces along with the corresponding ex-
ecution time cost given by the “sum encoding”.
Here, costs are attached to the transitions between
each pairs of blocks. These costs are supposed to
be given. Section 6.3 will describe in full details
how we use the OTAWA tool to derive such pre-
cise costs for each transitions.
The SMT encoding of the program semantics
(Fig. 5 right) is relatively simple since the bit-
code has an SSA form: The ite(b, x, y) construct
is an if-then-else statement and is equal to x if
b is true, otherwise is equal to y. In our encod-
ing, SMT variables starting with letter x refer to
the LLVM SSA-variables, there is one Boolean b i
for each LLVM BasicBlock, and one Boolean t i j
for each transition. Each transition t i j have a
cost c i j given by OTAWA. For instance, the
block entry is given the Boolean b 0, the block
if.then is given the Boolean b 1, and the transi-
tion from entry to if.then is given the Boolean
t 0 1 and has a cost of 15 clock cycles. The cuts
7LLVM (http://www.llvm.org/) [26] is a compilation
framework with a standardized intermediate representation
(bitcode), into which one can compile with a variety of com-
pilers including GCC (C, C++, Ada. . . ) and Clang (C,
C++).
are derived as follows: if.end has several incom-
ing transitions and its immediate dominator is
entry. The longest syntactic path between these
two blocks is equal to 21. The cut will then be
c 0 1 + c 1 2 + c 0 2 ≤ 21. There is a similar cut
for the portion between if.end and if.end6. Fi-
nally, we can also add the constraint cost ≤ 43
since it is the cost of the longest syntactic path.
While this longest syntactic path has cost 43 (it
goes both through if.then and if.then4), our
SMT-based approach shows there is no semanti-
cally feasible path longer than 36 clock cycles.
4.5 Relationship with Craig inter-
polants
A Craig interpolant for an unsatisfiable conjunc-
tion F1 ∧ F2 is a formula I such that F1 ⇒ I and
I ∧F2 is unsatisfiable, whose free variables are in-
cluded in the intersection of those of F1 and F2.
In the case of a program A;B consisting of
two portions A and B executed in sequence, the
usual way of encoding the program yields φA∧φB
where φA and φB are, respectively, the encodings
of A and B. The free variables of this formula
are the inputs i1, . . . , im and outputs o1, . . . , on of
the program, as well as all temporaries and lo-
cal variables. Let t1, . . . , tp be the variables live
at the edge from A to B; then the input-output
relationship of the program, with free variables
i1, . . . , im, o1, . . . , on is F :
∃t1, . . . , tp(∃ . . . φA) ∧ (∃ . . . φB)
Let us now assume additionally that o1 is the
final time and t1 is the time when control flow
from A to B (counter encoding). The SMT for-
mulas used in our optimization process are of the
form F ∧ t1 ≥ β. The cut for portion A is of
the form t1 ≤ βA, that for portion B of the form
o1−t1 ≤ βB . Then, if the cut for portion A is used
to prove that F ∧ t1 ≥ β is unsatisfiable, then this
cut is a Craig interpolant for the unsatisfiable for-
mula (φA) ∧ (φB ∧ t1 ≥ β) (similarly, if the cut
8
entry : ; b 0
%c a l l = c a l l i 3 2 @input ( . . . )
%c a l l 1 = c a l l i 3 2 @input ( . . . )
%add = add nsw i 3 2 %c a l l , 10
%cmp = icmp sgt i 3 2 %c a l l 1 , %add
br i 1 %cmp, l a b e l %i f . then , l a b e l %i f . end
i f . then : ; b 1
%add2 = add nsw i 3 2 %c a l l , 10
br l a b e l %i f . end
i f . end : ; b 2
%x . 0 = phi i 3 2 [%add2,% i f . then ] , [% c a l l 1 ,% entry ]
%sub = sub nsw i 3 2 %c a l l , 10
%cmp3 = icmp s l t i 3 2 %x . 0 , %sub
br i 1 %cmp3 , l a b e l %i f . then4 , l a b e l %i f . end6
i f . then4 : ; b 3
%sub5 = sub nsw i 3 2 %c a l l , 10
br l a b e l %i f . end6
i f . end6 : ; b 4
%x . 1 = phi i 3 2 [%sub5 ,% i f . then4 ] , [%x .0 ,% i f . end ]
r e t void
t 0 1, cost = 15
t 0 2, cost = 14
t 1 2, cost = 6
t 2 3, cost = 12
t 2 4, cost = 11
t 3 4, cost = 6
Figure 4: LLVM control-flow graph of the rate limiter step function.
9
−10000 ≤ x call ≤ 10000
∧ −10000 ≤ x call1 ≤ 10000
∧ x add = (x call+ 10)
∧ t 0 1 = (b 0 ∧ (x call1 > x add))
∧ t 0 2 = (b 0 ∧ ¬(x call1 > x add))
∧ b 1 = t 0 1
∧ x add2 = (x call+ 10)
∧ t 1 2 = b 1
∧ b 2 = (t 0 2 ∨ t 1 2)
∧ b 2⇒ (x x.0 = ite(t 1 2, x add2, x call1))
∧ x sub = (x call− 10)
∧ t 2 3 = (b 2 ∧ (x x.0 < x sub))
∧ t 2 4 = (b 2 ∧ ¬(x x.0 < x sub))
∧ b 3 = t 2 3
∧ x sub5 = (x call− 10)
∧ t 3 4 = b 3
∧ b 4 = (t 2 4 ∨ t 3 4)
∧ b 4⇒ (x x.1 = ite(t 3 4, x sub5, x x.0))
∧ b 0 = b 4 = true ; search for a trace from entry to if.end6
ti
m
in
g


∧ c 0 1 = ite(t 0 1, 15, 0) ; t 0 1 has cost 15 if taken, else 0
∧ c 0 2 = ite(t 0 2, 14, 0)
∧ c 1 2 = ite(t 1 2, 6, 0)
∧ c 2 3 = ite(t 2 3, 12, 0)
∧ c 2 4 = ite(t 2 4, 11, 0)
∧ c 3 4 = ite(t 3 4, 6, 0)
∧ cost = (c 0 1 + c 0 2 + c 1 2 + c 2 3 + c 2 4 + c 3 4)
cu
ts
{∧ (c 0 1 + c 1 2 + c 0 2) ≤ 21 ; between entry and if.end
∧ (c 3 4 + c 2 4 + c 2 3) ≤ 22 ; between if.end and if.end6
∧ cost ≤ 43
Figure 5: Encoding of the rate limiter step function (control-flow graph on Fig. 4) as an SMT formula
with cuts.
10
0.01
0.1
1
10
100
1000
10000
10 12 14 16 18 20 22
ti
m
e
(s
)
n
Z3 3.2
Z3 4.3.1
MathSAT 5.2.6
SMTInterpol
Figure 6: Intractability of diamond formulas ob-
tained from timing analysis of a family of pro-
grams with very simple functional semantics. Ex-
ecution times of various state-of-the-art SMT-
solvers on Formula 4, for m = 5n (the hard-
est), showing exponential behavior in the formula
size n. The CPU is a 2 GHz Intel Core 2 Q8400.
for portion B is used, then it is an interpolant for
φB∧(φA∧t1 ≥ β). Our approach may thus be un-
derstood as preventively computing possible Craig
interpolants so as to speed up solving. The same
intuition applies to the sum encoding (up to the
creation of supplementary variables).
5 Intractability: Diamond For-
mulas
Let us now explain why the formulas without
cuts result in unacceptable execution times in the
SMT-solvers.
Consider a program consisting in a sequence of
n fragments where the i-th fragment is of the form:
i f ( bi ) { /∗ b l o c k o f c o s t xi ∗ /
/∗ t im e c o s t 2 , no t ch ang ing bi ∗ /
} e lse {
/∗ t im e c o s t 3 , no t ch ang ing bi ∗ /
}
i f ( bi ) { /∗ b l o c k o f c o s t yi ∗ /
/∗ t im e c o s t 3 ∗ /
} e lse {
/∗ t im e c o s t 2 ∗ /
}
The (bi)1≤i≤n are Booleans. A human observer
easily concludes that the worst-case execution
time is 5n, by analyzing each fragment separately.
Using the “sum encoding”, the timing analysis
is expressed as
T = max
{ n∑
i=1
xi + yi
∣∣∣∣
n∧
i=1
(xi = ite(bi, 2, 3))
∧ (yi = ite(bi, 3, 2))
}
(3)
Given a bound m, an SMT-solver will have to
solve for the unknowns (bi), (xi), (yi)1≤i≤n the con-
straint
(
(b1∧x1 = 2∧y1 = 3)∨(¬b1∧x1 = 3∧y1 = 2)
)
∧ . . .(
(bn ∧ xn = 2 ∧ yn = 3) ∨ (¬bn ∧ xn = 3 ∧ yn = 2)
)
∧
x1 + y1 + · · ·+ xn + yn ≥ m (4)
In the “DPLL(T)” approach (see e.g. Kroening
and Strichman [25] for an introduction), SMT is
implemented as a combination of a SAT solver,8
which searches within a Boolean state space (here,
amounting to b1, . . . , bn ∈ {0, 1}
n, but in gen-
eral arithmetic or other theory predicates are also
taken into account) and a decision procedure for
conjunctions of atomic formulas from a theory T.9
8Almost all current SAT solvers implement variants
of constraint-driven clause learning (CDCL), a major im-
provement over DPLL (Davis, Putnam, Logemann, Love-
land), thus the terminology. None of what we say here,
however, is specific to CDCL: our remarks stay valid as
long as the combination of propositional and theory rea-
soning proceeds by sending clauses constructed from the
predicates syntactically present in the original formula to
the propositional solver.
9We leave out improvements such as theory propagation
for the sake of simplicity. See Kroening and Strichman [25]
for more details.
11
Once b1, . . . , bn have been picked, Formula 4 sim-
plifies to a conjunction
x1 = α1 ∧ y1 = β1 ∧ . . . ∧ xn = αn ∧ yn = βn
∧ x1 + y1 + · · ·+ xn + yn ≥ m (5)
where the αi, βi are constants in {2, 3} such that
for each i, αi+βi = 5. Such a formula is satisfiable
if and only if m ≤ 5n.
Assume now m > 5n. All combinations of
b1, . . . , bn lead to unsatisfiable constraints, thus
Formula 4 is unsatisfiable. Such an exhaustive ex-
ploration is equivalent to exploring 2n paths in the
control flow graph, computing the execution time
for each and comparing it to the bound. Could
an SMT-solver do better? SMT-solvers, when ex-
ploring the Boolean state space, may detect that
the current Boolean choices (say, b3 ∧ ¬b5 ∧ b7)
lead to an arithmetic contradiction, without pick-
ing a value for all the Booleans. The SMT-solver
extracts a (possibly smaller) contradiction (say,
b3 ∧ ¬b5), adds the negation of this contradic-
tion to the Boolean constraints as a theory clause,
and restarts Boolean solving. The hope is that
there exist short contradictions that enable the
SMT-solver to prune the Boolean search space.
Yet, in our case, there are no such short contra-
dictions: if one leaves out any of the conjuncts
in Formula 5, the conjunction becomes satisfiable.
Note the asymmetry between proving satisfiabil-
ity and unsatisfiability: for satisfiability, one can
always hope that clever heuristics will lead to one
solution, while for unsatisfiability, the prover has
to close all branches in the search.
The difficulty of Formula 4 or similar “diamond
formulas” is well-known in SMT circles. It boils
down to the SMT-solver working exclusively with
the predicates found in the original formulas, with-
out deriving new useful ones such as xi + yi ≤
5. All state-of-the-art solvers that we have tried
have exponential running time in n when solv-
ing Formula 4 for m = 5n (Fig. 6)10; the diffi-
culty increases exponentially as upper bound on
10A special version of MathSAT 5, which was kindly
the WCET to be proved becomes closer to the ac-
tual WCET.
There have been proposals of alternative ap-
proaches to DPLL(T), where one would directly
solve for the numeric values instead of solving
for Booleans then turning theory lemmas into
Boolean constraints [12, 13, 29, 5, 32]; but no pro-
duction solver implements them.11 This is the rea-
son why we turned to incorporating cuts into the
encoding.
6 Implementation and Experi-
mental Results
We experimented our approach for computing the
worst-case execution time on benchmarks from
several sources, referenced in Table 1. nsichneu
and statemate belong to the Ma¨lardalen WCET
benchmarks set [18]12, being the largest of
the set (w.r.t. code size). cruise-control
and digital-stopwatch are generated from
ScadeTM designs. autopilot and fly-by-wire
come from the Papabench benchmark [34] derived
from the Paparazzi free software suite for pilot-
ing UAVs (http://paparazzi.enac.fr/). tdf and
miniflight are industrial avionic case-studies.
6.1 Description of the Implementation
We use the infrastructure of the Pagai static an-
alyzer [20]13 to produce an SMT formula corre-
sponding to the semantics of a program expressed
in LLVM bitcode.
made available to us by the authors [36], implements the
binary search approach internally. It suffers from the same
exponential behavior as noted in the figure: in its last step,
it has to prove that the maximum obtained truly is maxi-
mum.
11Dejan Jovanovic was kind enough to experiment with
some of our formulas in his experimental solver [32], but
the execution time was unacceptably high. We stress that
this field of workable alternatives to DPLL(T) is still new
and it is too early to draw conclusions.
12http://www.mrtc.mdh.se/projects/wcet/benchmarks.html
13http://pagai.forge.imag.fr
12
A limitation is that, at present, Pagai consid-
ers that floating-point variables are real numbers
and that integers are unbounded mathematical in-
tegers, as opposed to finite bit-vectors; certainly
an industrial tool meant to provide sound bounds
should have accurate semantics, but this limita-
tion is irrelevant to our proof-of-concept (note
how the bitvectors from functional semantics and
the timing variables are fully separated — their
combination would therefore not pose a prob-
lem to any SMT-solver implementing a variant of
the Nelson-Oppen combination of procedures [25,
ch. 10]).
Using the LLVM optimization facilities, we first
apply some standard transformation to the pro-
gram (loop unrolling, function inlining, SSA) so
as to obtain a single loop-free function; in a man-
ner reminiscent of bounded model checking. Once
the SMT formula is constructed, we enrich it with
an upper timing bound for each basic block.
Finally, we conjoin to our formula the cuts for
the “sum encoding”, i.e., constraints of the form∑
i∈S ci ≤ B, where the ci’s are the cost variables
attached to the basic blocks. There is one such
“cut” for every basic block with several incoming
edges: the constraint expresses an upper bound
on the total timing of the program portion com-
prised between the block and its immediate dom-
inator (Fig. 3). The bound B is the weight of the
maximal path through the range, be it feasible or
infeasible (a more expensive method is to call the
WCET computation recursively on the range).
We use Z3 [31] as an SMT solver and a bi-
nary search strategy to maximize the cost variable
modulo SMT.
The encoding of program semantics into SMT
may not be fully precise in some cases. Whenever
we cannot precisely translate a construct from the
LLVM bitcode, we abstract it by nondetermin-
istic choices into all the variables possibly writ-
ten to by the construct (an operation referred to
as havoc in certain systems); for instance, this
is the case for loads from memory locations that
we cannot trace to a specific variable. We relied
on the LLVM mem2reg optimization phase to lift
memory accesses into SSA (single static assign-
ment) variables; all accesses that it could not lift
were thus abstracted as nondeterministic choice.
We realized that, due to being limited to local,
stack-allocated variables, this phase missed some
possible liftings, e.g. those of global variables.
This resulted in the same variable from the pro-
gram to be analyzed being considered as several
unrelated nondeterministic loads from memory,
thereby breaking dependencies between tests and
preventing infeasible paths from being discarded.
We thus implemented a supplemental lifting phase
for global variables. It is however possible that
our analysis still misses infeasible paths because
of badly abstracted constructs (for instance, ar-
rays), and that further improvements could bring
even better results (that is, upper bounds on the
WCET that would be closer to the real WCET).
Furthermore, some paths are infeasible because
of a global invariant of the control loop (e.g.
some Booleans a and b activate mutually exclusive
modes of operations, and ¬a∨¬b is an invariant);
we have not yet integrated such invariants, which
could be obtained either by static analysis of the
program, either by analysis of the high-level spec-
ification from which the program is extracted [1].
Our current implementation keeps inside the
program the resulting formulas statements and
variables that have no effect on control flow and
thus on WCET. Better performance could prob-
ably be obtained by slicing away such irrelevant
statements.
6.2 Results with bitcode-based timing
The problem addressed in this article is not ar-
chitectural modeling and low-level timing analy-
sis: we assume that worst-case timings for basic
blocks are given by an external analysis. Here we
report on results with a simple timing basis: the
time taken by a LLVM bitcode block is its number
of instructions; our goal here is to check whether
13
Benchmark name LLVM #lines LLVM #BB
statemate 2885 632
nsichneu 12453 1374
cruise-control 234 43
digital-stopwatch 1085 188
autopilot 8805 1191
fly-by-wire 5498 609
miniflight 5860 745
tdf 2689 533
Table 1: Table referencing the various bench-
marks. LLVM #lines is the number of lines in
the LLVM bitcode, and LLVM #BB is its number
of Basic Blocks.
improvements to WCET can be obtained by our
analysis with reasonable computation costs, inde-
pendently of the architecture.
As expected, the naive approach (without
adding cuts to the formula) does not scale at all,
and the computation has reached our timeout in
all of our largest benchmarks. Once the cuts are
conjoined to the formula, the WCET is computed
considerably faster, with some benchmarks need-
ing less than a minute while they timed out with
the naive approach.
Our results (Table 2, first part) show that the
use of bounded model checking by SMT solv-
ing improves the precision of the computed upper
bound on the worst-case execution time, since the
longest syntactic path is in most cases not feasible
due to the semantics of the instructions. As usual
with WCET analyzes, it is difficult to estimate the
absolute quality of the resulting bound, because
the exact WCET is unknown (perhaps what we
obtain is actually the WCET, perhaps it overesti-
mates it somewhat).
On the autopilot software, our analysis re-
duces the WCET bound by 69.7%. This software
has multiple clock domains, statically scheduled
by the periodic_task() function using switches
and arithmetic constraints. Approaches that do
not take functional semantics into account there-
fore consider activation patterns that cannot occur
in the real system, leading to a huge overestima-
tion compared to our semantic-sensitive approach.
6.3 Results with realistic timing
The timing model used in the preceding subsection
is not meant to be realistic. We therefore experi-
mented with realistic timings for the basic blocks,
obtained by the OTAWA tool [2] for an ARMv7
architecture. The results are given in Table 2 (sec-
ond half).
LLVM
IR CFG
ARM CFG
Otawa
costs
(ARM
CFG)
Traceability:
match
blockscosts
(LLVM-IR
CFG)
Encode
into
SMT and
maximise
Final WCET
llvm compiler
Legend:
Data
Phase
Figure 7: General workflow for deriving timings
using OTAWA.
The difficulty here is that OTAWA considers the
basic blocks occurring in binary code, while our
analysis considers the basic blocks in the LLVM
bitcode. The LLVM blocks are close to those
in the binary code, but code generation slightly
changes the block structure in some cases. The
matching of binary code to LLVM bitcode is thus
sometimes imperfect and we had to resort to
one that safely overestimates the execution time.
Fig. 7 gives an overview of the general workflow
for deriving the appropriate costs of LLVM ba-
sic blocks. The alternative would be to generate
the SMT formulas not from LLVM bitcode, but
directly from the binary code; unfortunately a re-
liable implementation needs to address a lot of
14
WCET bounds Analysis time (in seconds)
Benchmark name syntactic/OTAWA max-SMT diff with cuts without cuts #cuts
Bitcode-based timings (in number of LLVM instructions)
statemate 997 951 4.6% 118.3 +∞ 143
nsichneu 9693 5998 38.1% 131.4 +∞ 252
cruise-control 123 121 1.6% 0.1 0.1 13
digital-stopwatch 332 302 9.0% 1.0 35.5 53
autopilot 4198 1271 69.7% 782.0 +∞ 498
fly-by-wire 2932 2792 4.7% 7.6 +∞ 163
miniflight 4015 3428 14.6% 35.8 +∞ 251
tdf 1583 1569 0.8% 5.4 343.8 254
Realistic timings (in cycles) for an ARMv7 architecture
statemate 3297 3211 2.6% 943.5 +∞ 143
nsichneu* (1 iteration) 17242 <13332** 22.7% 3600** +∞ 378
cruise-control 881 873 0.9% 0.1 0.2 13
digital-stopwatch 1012 954 5.7% 0.6 2104.2 53
autopilot 12663 5734 54.7% 1808.8 +∞ 498
fly-by-wire 6361 5848 8.0% 10.8 +∞ 163
miniflight 17980 14752 18.0% 40.9 +∞ 251
tdf 5789 5727 1.0% 13.0 +∞ 254
Table 2: max-SMT is the upper bound on WCET reported by our analysis based on optimization
modulo theory, while syntactic/OTAWA is the execution time of longest syntactic path (provided by
Otawa when using realistic timings). diff is the improvement brought by our method. The analysis
time for max-SMT is reported with and without added cuts; +∞ indicates timeout (1 hour). #cuts
is the number of added cuts. In the second part, *) nsichneu has been simplified to one main-loop
iteration (instead of 2), and has been computed with cuts refinement as described in subsection 6.3.
**) Computation takes longer than 1 hour. A safe bound of 13332 is already known after this time.
15
open questions, and as such, it falls into our fu-
ture plans.
While the nsichneu benchmark is fully han-
dled by our approach when using bitcode-based
timing, it is much harder when using the realistic
metric. We had to improve our implementation
in two ways: 1. We extract cuts for larger por-
tions of the program: we take the portions from
our previous cuts (between merge points and their
immediate dominators) and derive new cuts by re-
cursively grouping these portions by two. We then
have cuts for one half, one quarter, etc. of the
program. 2. Instead of directly optimising the to-
tal cost variable of the program, we successively
optimize the variables expressing the “cuts” (in
order of portion size). This allows to strengthen
the cuts with smaller upper bounds, and helps the
analysis of the bigger portions. In this benchmark,
all the biggest paths are unfeasible because of in-
consistent semantic constraints over the variables
involved in the tests. Better cuts could be derived
if we were not restricted to contiguous portions
in the implementation. The computation time is
around 6.5 hours to get the exact WCET (13298
cycles), but we could have stopped after one hour
and get a correct upper bound of 13332 cycles,
which is already very close to the final result.
7 Related Work
The work closest to ours is from Chu and Jaf-
far [10]. They perform symbolic execution on the
program, thereby unrolling an exponentially-sized
execution tree (each if-then-else construct doubles
the number of branches). This would be intolera-
bly expensive if not for the very astute subsump-
tion criterion used to fold some of the branches
into others already computed. More specifically,
their symbolic execution generalizes each explored
state S to a first-order formula defining states
from which the feasible paths are included in those
starting from S; these formula are obtained from
Craig interpolants extracted from the proofs of in-
feasibility.
In our approach, we also learn formula that
block infeasible paths or paths that cannot lead
to paths longer than the WCET obtained, in two
ways: the SMT-solver learns blocking clauses by
itself, and we feed “cuts” to it. Let us now at-
tempt to give a high-level view of the difference
between our approach and theirs. Symbolic ex-
ecution [6] (in depth-first fashion) can be simu-
lated by SMT-solving by having the SMT-solver
select decision literals [25] in the order of execution
of the program encoded into the formula; in con-
trast, general bounded model checking by SMT-
solving will assert predicates in an arbitrary or-
der, which may be preferrable in some cases (e.g.
if x ≤ 0 is asserted early in the program and
x + y ≥ 0 very late, after multiple if-then-elses,
it is useful to be able to derive y ≥ 0 immedi-
ately without having to wait until the end of each
path). Yet, an SMT-solver based on DPLL(T)
does not learn lemmas constructed from new pred-
icates, while the approach in [10] learns new pred-
icates on-the-fly from Craig interpolants. In our
approach, we help the SMT-solver by preventively
feeding “candidate lemmas”, which, if used in a
proof that there is no path longer than a certain
bound, act as Craig interpolants, as explained in
subsection 4.5. Our approach therefore leverages
both out-of-order predicate selection and interpo-
lation, and, as a consequence, it seems to scale
better.
Two recent works — Biere et al. [4] and its
follow-up Knoop, Kova´cs, and Zwirchmayr [24] —
integrate the WCET path analysis into a coun-
terexample guided abstraction refinement loop.
As such, the IPET approach using ILP is refined
by extracting a witness path for the maximal time,
and testing its feasibility by SMT-solving; if the
path is infeasible, an additional ILP constraint is
generated, to exclude the spurious path. Because
this ILP constraint relates all the conditionals cor-
responding to the spurious witness path, excluding
infeasibile paths in this way exhibits an exponen-
16
tial behavior we strove to avoid. Moreover, our
approach is more flexible with respect to (1) the
class of properties which can be expressed, as it is
not limited by the ILP semantics and (2) the abil-
ity to incorporate non-functional semantics (which
is unclear whether [4] or [24] can).
Metzner [30] proposed an approach where the
program control flow is encoded into a model
along with either the concrete semantics of a sim-
ple model of instruction cache, or an abstrac-
tion thereof. The WCET bound is obtained by
binary search, with each test performed using
the VIS model-checker14. Huber and Schoeberl
[22] proposed a similar approach with the model-
checker UPPAAL.15 In both cases, the functional
semantics are however not encoded, save for loop
bounds: branches are chosen nondeterministi-
cally, and thus the analysis may consider infeasible
paths. Dalsgaard et al. [15] encode into UPPAAL
precise models of a pipeline, instruction cache and
data cache, but again the program is modeled as
“data insensitive”, meaning that infeasible paths
are not discarded except when exceeding a loop
bound.
Holsti [21] considers a loop (though the same
approach can also be applied to loop-free code):
the loop is sliced, keeping only instructions and
variables that affect control flow, and a global
“timing” counter T is added; the input-output re-
lation of the loop body is obtained as a formula in
linear integer arithmetic (Presburger arithmetic);
some form of acceleration is used to establish a
relation between T , some induction variables and
some inputs to the program. Applied to loop-
free programs, this method should give exactly the
same result as our approach. Its main weakness is
that representations of Presburger sets are notori-
ously expensive, whereas SMT scales up (the ex-
amples given in the cited article seem very small,
taking only a few lines and at most 1500 clock cy-
cles for the entire loop execution); also, the restric-
14http://vlsi.colorado.edu/∼vis/
15http://www.uppaal.org/
tion to Presburger arithmetic may exclude many
programs, though one can model constructs out-
side of Presburger arithmetic by nondeterministic
choices. Its strong point is the ability to precisely
deal with loops, including those where the iter-
ation count affects which program fragments are
active.
8 Extensions and Future Work
The “counter encoding” is best suited for code
portions that have a single entry and exit point
(in which case they express the timing difference
between entry and exit). In contrast, the “sum
encoding” may be applied to arbitrary subsets of
the code, which do not in fact need to be con-
nected in the control-flow graph. One may thus
use other heuristic criteria, such as usage of re-
lated variables.
A model based on worst-case execution times
for every block, to be reassembled into a global
worst-case execution time, may be too simplistic:
indeed, the execution time of a block may depend
on which blocks were executed beforehand, or, for
finer modeling, on the value of pointer variables
(for determining cache status).
A very general and tempting idea, as suggested
earlier in MDD-based model-checking [30], in sym-
bolic execution and bounded model checking by [9,
10], in combined abstract interpretation and SAT-
solving [3] is to integrate in the same analysis both
the non-functional semantics (e.g. caches) and the
functional semantics; in our case, we would replace
both the micro-architectural analysis (or part of
it) and the path analysis by a single pass of opti-
mization modulo SMT. Because merely encoding
the functional semantics and a simplistic timing
model already led to intractable formulas, we de-
cided to postpone such micro-architectural mod-
eling until we had solved scalability issues. We in-
tend to integrate such non-functional aspects into
the SMT problem in future work.
Detailed modeling of the cache, pipeline, etc.
17
may be too expensive to compute beforehand and
encode into SMT. One alternative is to itera-
tively refine the model with respect to the cur-
rent “worst-case trace”: to each basic block one
attaches an upper bound on the worst-case execu-
tion time, and once a worst-case trace is obtained,
a trace analysis is run over it to derive stronger
constraints. These constraints can then be incor-
porated in the SMT encoding before searching for
a new longest path.
We have discussed obtaining a tight upper
bound on the worst-case operation time of the pro-
gram from upper bounds on the execution times
of the basic blocks. If using lower bounds on the
worst-case execution times of the basic blocks, one
may obtain a lower bound on the worst-case execu-
tion time of the program. Having both is useful to
gauge the amount of over-approximation incurred.
Also, by applying minimization instead of maxi-
mization, one gets bounds on best-case execution
time, which is useful for some scheduling applica-
tions [39].
On a more practical angle, our analysis is to
be connected to analyses both on the high level
specification (e.g. providing invariants) and on the
object code (micro-architectural timing analysis);
this poses engineering difficulties, because typical
compilation framework may not support sufficient
tracing information.
Our requirement that the program should be
loop-free, or at least contain loops with small con-
stant bounds, can be relaxed through an approach
similar to that of Chu and Jaffar [10]: the body
of a loop can be summarized by its WCET, or
more precisely by some summaries involving the
cost variables and the scalar variables of the pro-
gram. Then, this entire loop can be considered as
a single block in an analysis of a larger program,
with possibly overapproximations in the WCET,
depending on how the summaries are produced.
9 Conclusion
We have shown that optimization using satisfi-
ability modulo theory (SMT) is a workable ap-
proach for bounding the worst-case execution time
of loop-free programs (or programs where loops
can be unrolled). To our knowledge, this is the
first time that such an approach was successfully
applied.
Our approach computes an upper bound on
the WCET, which may or may not be the ac-
tual WCET. The sources of discrepancy are 1) the
microarchitectural analysis (e.g. the cache analy-
sis does not know whether an access is a hit or
a miss), 2) the composition of WCET for basic
blocks into WCET for the program, which may
lose dependencies on execution history16, 3) the
encoding of the program into SMT, which may be
imprecise (e.g. unsupported constructs replaced
by nondeterministic choices).
We showed that straightforward encodings of
WCET problems into SMT yield problems in-
tractable by all current production-grade SMT-
solvers (“diamond formulas”), and how to work
around this issue using a clever encoding. We
believe this approach can be generalized to other
properties, and lead to fruitful interaction between
modular abstraction and SMT-solving.
From a practical point of view, our approach
integrates with any SMT solver without any mod-
ification, which makes it convenient for efficient
and robust implementation. It could also inte-
grate various simple analyses for introducing other
relevant cuts.
While our redundant encoding brings stagger-
ing improvements in analysis time, allowing for-
merly intractable problems to be solved under one
minute, the improvements in the WCET upper
bound brought by the elimination of infeasible
paths depend on the structure of the program be-
16This does not apply to some simple microcontroller ar-
chitectures, without cache or pipeline states, e.g. Atmel
AVRTM and FreescaleTM HCS12.
18
ing analyzed. The improvement on the WCET
bound of some industrial examples (18%, 55%. . . )
is impressive, in a field where improvements are
often of a few percents. This means that, at least
for certain classes of programs, it is necessary to
take infeasible paths into account. At present,
certain industries avoid using formal verification
for WCET because it has a reputation for giving
overly pessimistic over-estimates; it seems likely
that some of this over-estimation arises from in-
feasible paths.
Our approach to improving bounds on WCET
blends well with other WCET analyses. It can be
coupled with an existing micro-architectural anal-
ysis, or part of that analysis may be integrated
into our approach. It can be combined with pre-
cise, yet less scalable analyzes [24, 21] to summa-
rize inner loops; but may itself be used as a way
to summarize the WCET of portion of a larger
program.
References
[1] Mihail Asavoae, Claire Maiza, and Pascal
Raymond. “Program Semantics in Model-
Based WCET Analysis: A State of the Art
Perspective”. In:WCET 2013. Ed. by Claire
Maiza. Vol. 30. OASICS. Schloss Dagstuhl
- Leibniz-Zentrum fuer Informatik, 2013,
pp. 32–41.
[2] Cle´ment Ballabriga et al. “OTAWA: An
Open Toolbox for Adaptive WCET Anal-
ysis”. In: SEUS. Vol. 6399. LNCS. Springer,
2010, pp. 35–46.
[3] Abhijeet Banerjee, Sudipta Chattopadhyay,
and Abhik Roychoudhury. “Precise micro-
architectural modeling for WCET analysis
via AI+SAT”. In: IEEE Real-Time and Em-
bedded Technology and Applications Sym-
posium (RTAS). IEEE Computer Society,
2013, pp. 87–96.
[4] Armin Biere et al. “The Auspicious Couple:
Symbolic Execution and WCET Analy-
sis”. In: WCET. Vol. 30. OASIcs. IBFI
Schloss Dagstuhl, 2013, pp. 53–63. url:
http://drops.dagstuhl.de/opus/volltexte/2013/4122.
[5] Nikolaj Bjørner, Bruno Dutertre, and
Leonardo de Moura. Accelerating lemma
learning using joins - DPLL(⊔). Appeared
as short paper in LPAR 2008, outside of pro-
ceedings. 2008.
[6] Cristian Cadar and Koushik Sen. “Sym-
bolic Execution for Software Testing: Three
Decades Later”. In: Commun. ACM 56.2
(Feb. 2013), pp. 82–90.
[7] Paul Caspi, Pascal Raymond, and Stavros
Tripakis. “Synchronous Programming”. In:
Handbook of Real-Time and Embedded Sys-
tems. Chapman & Hall / CRC, 2008.
Chap. 14.
[8] Sagar Chaki and James Ivers. “Software
model checking without source code”.
English. In: Innovations in Systems
and Software Engineering 6.3 (2010),
pp. 233–242. issn: 1614-5046. doi:
10.1007/s11334-010-0125-0.
[9] Sudipta Chattopadhyay and Abhik Roy-
choudhury. “Scalable and precise refinement
of cache timing analysis via path-sensitive
verification”. In: Real-Time Systems 49.4
(2013), pp. 517–562.
[10] Duc-Hiep Chu and Joxan Jaffar. “Sym-
bolic simulation on complicated loops for
WCET Path Analysis”. In: EMSOFT. 2011,
pp. 319–328. isbn: 978-1-4503-0714-7.
doi: 10.1145/2038642.2038692.
[11] Lucas Cordeiro, Bernd Fischer, and Joa˜o
Marques-Silva. “SMT-Based Bounded
Model Checking for Embedded ANSI-C
Software”. In: IEEE Trans. Software Eng.
38.4 (2012), pp. 957–974.
19
[12] Scott Cotton. “On Some Problems in Satisfi-
ability Solving”. PhD thesis. Grenoble: Uni-
versite´ Joseph Fourier, 2009.
[13] Scott Cotton. “Natural Domain SMT: A
Preliminary Assessment”. In: FORMATS.
Vol. 6246. LNCS. Springer, 2010, pp. 77–91.
[14] Patrick Cousot et al. “The Astre´e An-
alyzer”. In: ESOP. Vol. 3444. LNCS.
Springer, 2005, pp. 21–30.
[15] Andreas Dalsgaard et al. “METAMOC:
Modular Execution Time Analysis us-
ing Model Checking”. In: WCET. 2010,
pp. 113–123.
[16] Jakob Engblom and Bengt Jonsson. “Pro-
cessor Pipelines and Their Properties for
Static WCET Analysis”. In: EMSOFT.
Vol. 2491. LNCS. Springer, 2002, pp. 334–
348.
[17] Jan Gustafsson et al. “Automatic Deriva-
tion of Loop Bounds and Infeasible Paths
for WCET Analysis Using Abstract Execu-
tion”. In: RTSS. 2006.
[18] Jan Gustafsson et al. “The Ma¨lardalen
WCET Benchmarks – Past, Present and Fu-
ture”. In: WCET. Vol. 15. OASICS. IBFI
Schloss Dagstuhl, 2010, pp. 136–146.
[19] Christopher Healy and David Whalley. “Au-
tomatic detection and exploitation of branch
constraints for timing analysis”. In: IEEE
Trans. on Software Engineering 28.8 (Aug.
2002).
[20] Julien Henry, David Monniaux, and
Matthieu Moy. “PAGAI: A Path Sensitive
Static Analyser”. In: Electr. Notes Theor.
Comput. Sci. 289 (2012), pp. 15–25.
[21] Niklas Holsti. “Computing time as a
program variable: a way around infea-
sible paths”. In: WCET. Vol. 08003.
Dagstuhl Seminar Proceedings. IBFI Schloss
Dagstuhl, 2008.
[22] Benedikt Huber and Martin Schoeberl.
“Comparison of Implicit Path Enumera-
tion and Model Checking Based WCET
Analysis”. In: WCET. Vol. 10. OA-
SICS. IBFI Schloss Dagstuhl, 2009. url:
http://drops.dagstuhl.de/opus/volltexte/2009/2281.
[23] Bach Khoa Huynh, Lei Ju, and Abhik Roy-
choudhury. “Scope-Aware Data Cache Anal-
ysis for WCET Estimation”. In: IEEE Real-
Time and Embedded Technology and Appli-
cations Symposium. 2011, pp. 203–212.
[24] Jens Knoop, Laura Kova´cs, and Jakob
Zwirchmayr. “WCET squeezing: on-demand
feasibility refinement for proven precise
WCET-bounds”. In: RTNS. 2013, pp. 161–
170.
[25] Daniel Kroening and Ofer Strichman. Deci-
sion Procedures. Springer, 2008.
[26] Chris Lattner and Vikram S. Adve. “LLVM:
A Compilation Framework for Lifelong Pro-
gram Analysis & Transformation”. In: CGO.
IEEE Computer Society, 2004, pp. 75–88.
[27] Xianfeng Li et al. “Chronos: A timing an-
alyzer for embedded software”. In: Science
of Computer Programming 69.1–3 (2007),
pp. 56–67.
[28] Yau-Tsun Steven Li and Sharad Malik.
“Performance analysis of embedded software
using implicit path enumeration”. In: IEEE
Trans. on Computer-Aided Design of Inte-
grated Circuits and Systems 16.12 (1997),
pp. 1477–1487.
[29] Kenneth L. McMillan, Andreas Kuehlmann,
and Mooly Sagiv. “Generalizing DPLL to
Richer Logics”. In: CAV. Vol. 5643. LNCS.
Springer, 2009, pp. 462–476.
[30] Alexander Metzner. “Why Model Checking
Can Improve WCET Analysis”. In: CAV.
2004, pp. 334–347.
20
[31] Leonardo Mendonc¸a de Moura and Nikolaj
Bjørner. “Z3: An Efficient SMT Solver”. In:
TACAS. Vol. 4963. LNCS. Springer, 2008,
pp. 337–340.
[32] Leonardo Mendonc¸a de Moura and Dejan
Jovanovic. “A Model-Constructing Satisfi-
ability Calculus”. In: VMCAI. Vol. 7737.
LNCS. Springer, 2013, pp. 1–12.
[33] Hemendra Negi, Abhik Roychoudhury, and
Tulika Mitra. “Simplifying WCET Analy-
sis By Code Transformations”. In: WCET.
2004.
[34] Fadia Nemer et al. “PapaBench: a Free Real-
Time Benchmark”. In: WCET. Vol. 4. OA-
SICS. IBFI Schloss Dagstuhl, 2006.
[35] Jan Reineke. “Caches in WCET Analysis:
Predictability - Competitiveness - Sensitiv-
ity”. PhD thesis. University of Saarland,
2009.
[36] Roberto Sebastiani and Silvia Tomasi.
“Optimization in SMT with LA(Q) Cost
Functions”. In: IJCAR. Vol. 7364. LNCS.
Springer, 2012, pp. 484–498.
[37] Jean Souyris et al. “Formal Verifica-
tion of Avionics Software Products”.
In: Formal Methods (FM). Ed. by
Ana Cavalcanti and Dennis Dams.
Vol. 5850. LNCS. Springer, 2009, pp. 532–
546. isbn: 978-3-642-05088-6. doi:
10.1007/978-3-642-05089-3 34.
[38] Henrik Theiling, Christian Ferdinand, and
Reinhard Wilhelm. “Fast and Precise
WCET Prediction by Separated Cache and
Path Analyses”. In: Int. J. of Time-Critical
Computing Systems 18 (2000), pp. 157–179.
[39] Reinhard Wilhelm. “Determining Bounds
on Execution Times”. In: Handbook on Em-
bedded Systems. CRC Press, 2006. Chap. 14.
[40] Reinhard Wilhelm et al. “The worst-case
execution-time problem - overview of meth-
ods and survey of tools”. In: ACM Trans.
Embedded Comput. Syst. 7.3 (2008).
[41] Wankang Zhao et al. “Improving WCET by
applying worst-case path optimizations”. In:
Real-Time Systems 34.2 (2006), pp. 129–
152.
21
