Sound and Efficient WCET Analysis in the Presence of Timing Anomalies by Reineke, Jan & Sen, Rathijit
SOUND AND EFFICIENT WCET ANALYSIS
IN THE PRESENCE OF TIMING ANOMALIES1
Jan Reineke2 and Rathijit Sen3
Abstract
Worst-Case-Execution-Time (WCET) analysis computes upper bounds on the execution time of a pro-
gram on a given hardware platform. Abstractions employed for static timing analysis can lead to
non-determinism that may require the analyzer to evaluate an exponential number of choices even
for straight-line code. Pruning the search space is potentially unsafe because of “timing anomalies”
where local worst-case choices may not lead to the global worst-case scenario. In this paper we
present an approach towards more efficient WCET analysis that uses precomputed information to
safely discard analysis states.
1. Introduction
Embedded systems as they occur in application domains such as automotive, aeronautics, and indus-
trial automation often have to satisfy hard real-time constraints. Timeliness of reactions is absolutely
necessary. Off-line guarantees on the worst-case execution time of each task have to be derived using
safe methods.
Static worst-case execution time (WCET) tools employ abstract models of the underlying hardware
platforms. Such models abstract away from parts of the hardware that do not influence the timing
of instructions, like e.g. the values of registers. In addition to the abstraction of data, such models
further abstract components like branch predictors and caches, that have a very large state space.
Good abstractions of hardware components can predict their precise concrete behavior most of the
time. However, in general, abstractions introduce non-determinism: whenever the abstract model
cannot determine the value of a condition, it has to consider all possibilities, to cover any possible
concrete behavior.
Considering all possibilities can make timing analysis very expensive. Even on straight-line code the
timing analyzer might have to consider an exponential number of possibilities in the length of the
path. Often, the different possibilities can be intuitively classified as local worst- or local best-cases:
e.g. cache miss vs. cache hit, pipeline stall or not, branch misprediction or not, etc. In such cases, it
is tempting to only follow the local worst-case if one is interested in the WCET. Unfortunately, due
to timing anomalies [12, 15] this is not always sound. A timing anomaly is a situation where the
local worst-case does not entail the global worst-case. For instance, a cache miss—the local worst-
case—may result in a shorter execution time, than a cache hit, because of scheduling effects. See
Figure 1 for an example. Shortening task A leads to a longer overall schedule, because task B can
1This work has profited from discussions within the ARTIST2 Network of Excellence. It is supported by the German Re-
search Foundation (DFG) as part of SFB/TR AVACS and by the European Community’s Seventh Framework Programme
FP7/2007-2013 under grant agreement n◦ 216008 (Predator).
2Universita¨t des Saarlandes, Saarbru¨cken, Germany, reineke@cs.uni-saarland.de
3University of Wisconsin, Madison, USA, rathijit@cs.wisc.edu
ECRTS 2009 
9th International Workshop on Worst-Case Execution Time (WCET) Analysis 
http://drops.dagstuhl.de/opus/volltexte/2009/2289
1
now block the “more” important task C, which may only run on Resource 2. Analogously, there are
cases where shortening a task leads to an even greater decrease in the overall schedule. Reineke et
al. [15] formally define timing anomalies and give a set of examples for timing anomalies involving
caches and speculation.
In the presence of timing anomalies it is unsound to
A
A
Resource 1
Resource 2
Resource 1
Resource 2
C
B C
B
D E
D E
Figure 1. Scheduling Anomaly.
simply discard non local-worst-cases. We assume
that most if not all modern processors exhibit tim-
ing anomalies and that there is little hope to con-
struct processors with good performance that do not
exhibit them. To obtain an efficient timing analysis,
we would still like to discard non-local-worst-case
states, yet in a sound way. Our idea is to perform
a precomputation on the abstract model of the hard-
ware, that is used for timing analysis. The precom-
putation computes for each pair of analysis states a
bound on the future difference in timing between the two states. This precomputation might be very
expensive, but it only has to be done once for a particular abstract hardware model. Timing analyses
using this hardware model can then safely discard analysis states based on the precomputed differ-
ences, even in the presence of timing anomalies. Assuming that timing anomalies occur seldomly and
can be excluded using the precomputation most of the time, the proposed analysis will be much more
efficient than previous exhaustive methods.
2. Related Work
Graham [7] shows that a greedy scheduler can produce a longer schedule, if provided with shorter
tasks, fewer dependencies, more processors, etc. Graham also gives bounds on these effects, which
are known as scheduling anomalies today.
Lundqvist & Stenstro¨m first introduced timing anomalies in the sense relevant for timing analysis. In
[12] they give an example of a cache miss resulting in a shorter execution time than a cache hit. A
timing anomaly is characterized as a situation where a positive (negative) change of the latency of the
first instruction results in a global decrease (increase) of the execution time of a sequence of instruc-
tions. Situations where the local effect is even accelerated are also considered timing anomalies, i.e.
the global increase (decrease) of the execution time is greater than the local change.
In his PhD thesis [3] and a paper with Jonsson [4], Engblom also discusses timing anomalies. He
translates the notion of timing anomalies of the Lundqvist/Stenstro¨m paper [12] to his model by
assuming that single pipeline stages take longer, in contrast to whole instructions. Both Lundqvist
and Engblom claim that, in processors that contain in-order resources only, no timing anomalies can
occur. This is not always true unfortunately, as corrected in Lundqvist’s thesis [11]. Schneider [17]
and Wenzel et al. [21] note that if there exist several resources that have overlapping, but unequal
capabilities, timing anomalies can also occur.
Reineke et al. [15] provide the first formal definition of timing anomalies. The definition of timing
anomalies in this paper is a slight relaxation of that of [15]. In recent work, Kirner et al. [9] introduce
the notion of parallel timing anomalies. Such anomalies arise if a WCET analysis is split into the
analysis of several hardware components that operate in parallel. Depending on how the analysis
2
results are combined the resulting WCET estimate may be unsafe. The authors identify conditions
that guarantee the safety of two combination methods. This work is orthogonal to ours.
3. Static WCET Analysis Framework
Over the last several years, a more or less standard architecture for static timing-analysis tools has
emerged [8, 19, 5]. Figure 2 gives a general view on this architecture. One can distinguish three
major building blocks:
1. Control-flow reconstruction and static analyses for control and data flow.
2. Micro-architectural analysis, which computes upper and lower bounds on execution times of
basic blocks.
3. Global bound analysis, which computes upper and lower bounds for the whole program.
Micro-architectural analysis [6, 11, 3, 20, 10, 16] determines
Binary
Executable
CFG Re-
construction
Control-flow
Graph
Loop Bound
Analysis
Value
Analysis
Control-flow
Analysis
Annotated
CFG
Basic Block
Timing Info
Micro-
architectural
Analysis
Global
Bound
Analysis
Legend:
Data
Phase
Figure 2. Main components of a timing-
analysis framework and their interac-
tion.
bounds on the execution time of basic blocks, taking into ac-
count the processor’s pipeline, caches, and speculation con-
cepts. Static cache analyses determine safe approximations to
the contents of caches at each program point. Pipeline analysis
analyzes how instructions pass through the pipeline accounting
for occupancy of shared resources like queues, functional units,
etc. Ignoring these average-case-enhancing features would re-
sult in very imprecise bounds.
In this paper, we consider micro-architectural analyses that are
based on abstractions of the concrete hardware and that prop-
agate abstract states through the control-flow-graph [3, 20, 6].
In such analyses, the micro-architectural analysis is the most
expensive part of the WCET analysis. Due to the introduction
of non-determinism by abstraction and the possibility of tim-
ing anomalies it is necessary to consider very many cases. For
complex architectures, this may yield hundreds of millions of
analysis states, and may thus be very memory and time intensive.
In the following, we will describe a simple, formal model of such a micro-architectural analysis.
Later, we will describe how this model can be extended to enable more efficient analyses.
3.1. A formal model of micro-architectural analysis
Micro-architectural analysis determines bounds on the execution times of basic blocks. Using an
abstract model of the hardware, it “simulates” the possible behavior of the hardware in each cycle.
Due to uncertainty about the concrete state of the hardware or about its inputs, abstract models are—
unlike usual cycle-accurate simulators—non-deterministic. Therefore, an abstract state may have
several possible successor states.
3
IDIF MEMEX WB
IDIF MEMEX WB
IDIF MEMEX WB
IDIF MEMEX WB
0 1 2 3 4 5 6 7 8 9
Cycles
In
str
uc
tio
ns
I1
I2
I3
I4
I5 IF
I1 I2 I3 I4 I5 ...
ID EX MEM WB
10
Association of cycles to instructions
Figure 3. Pipelined execution of several instructions and association of instructions with execution cycles (bottom).
Cycle semantics. Let State be the set of states of the abstract hardware model, and let Prog be
the set of programs that can be executed on the hardware. Then the cycle semantics can be formally
modeled by the following function:
cycle : State× Prog → P(State).
It takes an abstract state and a program and computes the set of possible successor states. Such an
abstract hardware model can be proved correct by showing that it is an abstract interpretation [2] of
the concrete hardware model. To bound the execution time of a basic block, one needs to associate
execution cycles with instructions in the program. In pipelined processors several instructions are
executed simultaneously, see Figure 3. There is no canonical association of execution cycles to in-
structions. One of several possibilities is to associate a cycle with the last instruction that was fetched.
This is exemplified in the lower part of Figure 3.
Instruction semantics. Based on such a connection between execution cycles and instructions,
one can lift the cycle semantics to an instruction semantics. This instruction semantics takes an
abstract hardware state and an instruction that is to be fetched and computes the set of possible abstract
hardware states until the next instruction can be fetched. Each of the resulting abstract hardware states
is associated with the number of cycles to reach this state:
execI : State× I → P(State× N).
For a formalization of the connection of the two semantics, see [20]. As an example of the two
semantics in the analysis of the execution of a basic block, see Figure 4.
Instruction semantics can be easily lifted to the execution of basic blocks, which are simply sequences
of instructions ∈ I∗:
execBB : State× I∗ → P(State× N)
execBB(s, ) := {(s, 0)}
execBB(s, ι0 . . . ιn) := {(s′′, t′ + t′′) | (s′, t′) ∈ execI(s, ι0) ∧
(s′′, t′′) ∈ execBB(s′, ι1 . . . ιn)}
As a shortcut, we will also write s t
′−→ι s′ for (s′, t′) ∈ execI(s, ι) and similarly s
t′−−−→ι0...ιn s′ for
(s′, t′) ∈ execBB(s, ι0 . . . ιn).
4
Ba
sic
 B
loc
k
Instruction
Instruction
Cycle semantics: Instruction semantics:
3 3 3 2
1 2 2 22 2 2
Figure 4. Cycle semantics and instruction semantics of a basic block.
Bounds on the execution times of basic blocks. Given an instruction semantics on a finite set of
states State, one can compute bounds on the execution times of the basic blocks of a program. To that
end, one can compute the set of abstract states that might reach the start of each basic block through
a fixed-point iteration. For each of the states that might reach a basic block, the instruction semantics
can then be used to compute the maximum execution time of the block starting in that state:
max(s, ι0 . . . ιn) := max{t | s t−−−→ι0...ιn s′}. (1)
Finally, the maximum of these times for each state reaching a basic block is an upper bound on the
execution time of that basic block in the given program. Analogously to the computation of upper
bounds one can also compute lower bounds on the execution time of a basic block by the min function:
min(s, ι0 . . . ιn) := min{t | s t−−−→ι0...ιn s′}. (2)
4. Timing Anomalies, Domino Effects, and How to Safely Discard States
The approach described in the previous section can be very expensive. Due to non-determinism
introduced by abstraction, the set of states to be considered can be extremely large. It is therefore
tempting to discard states that are non local-worst-case, e.g., states resulting from a cache hit, if both
a cache hit and a cache miss may happen.
4.1. Timing anomalies
Unfortunately, due to so-called timing anomalies [12] this is not always sound. The following is a
definition of timing anomalies that is slightly relaxed compared with that of [15].
Definition 1 (Timing anomaly). An instruction semantics has a timing anomaly if there exists a
sequence of instructions ι0ι1 . . . ιn ∈ I∗, and an abstract state s ∈ State, such that
• there are states s1, s2 ∈ State, with s t1−→ι0 s1 and s
t2−→ι0 s2, and t1 < t2, such that
• t1 + max(s1, ι1 . . . ιn) > t2 + max(s2, ι1 . . . ιn).
5
In an instruction semantics with timing anomalies it is unsound to discard a non-local-worst-case
state (like s1 in the definition). The execution of non-local-worst-case states may be so much slower
than the execution of local-worst-case states, that the non-local-worst-case state yields the global
worst-case timing. Experience shows that reasonable abstract instruction semantics for most modern
processors exhibit timing anomalies.
4.2. How to safely discard analysis states
Our approach is to precompute a function ∆ : State × State → N∞, where N∞ = N ∪ {∞}, that
bounds the maximal difference in worst-case timing between the two states on any possible instruction
sequence. Sometimes the difference in timing between two states cannot be bounded by any constant.
That is why we augment N with∞. The order of natural numbers is lifted to N∞ by adding∞ ≥ n
for all n ∈ N∞.
Definition 2 (Valid ∆). A ∆ function is valid, if for all pairs of states s1, s2 ∈ State and for all
instruction sequences ι0 . . . ιn ∈ I∗:
∆(s1, s2) ≥ max(s1, ι0 . . . ιn)−max(s2, ι0 . . . ιn).
Given such a ∆ function it is possible to safely discard analysis states. If the analysis encounters two
states s1 and s2 with execution times t1 and t2, respectively, it may discard s2 if t1 − t2 ≥ ∆(s2, s1)
and s1 if t2 − t1 ≥ ∆(s1, s2). In these cases, the discarded state can never overtake the other state.
So ∆ can be used to locally exclude the occurrence of a timing anomaly. It is expected that this is
often the case, as timing anomalies are not the common case. This optimization does not influence
the precision of the analysis.
Even if t1 − t2 < ∆(s2, s1), one can safely discard s2, by adding the penalty ∆(s2, s1) − (t1 − t2)
to t1. For the resulting t′1 = t1 + ∆(s2, s1) − (t1 − t2), it holds that t′1 − t2 ≥ ∆(s2, s1). In contrast
to the first optimization, this of course comes at the price of decreased precision. This optimization
offers a way of trading precision for efficiency.
4.3. Domino effects
Unfortunately, the difference in timing between two states cannot always be bounded by a constant.
This situation is known as a domino effect1.
Definition 3 (Domino effect). An instruction semantics has a domino effect if there are two states
s1, s2 ∈ State, such that for each ∆ ∈ N there is a sequence of instructions ι0 . . . ιn ∈ I∗, such that
max(s1, ι0 . . . ιn)−max(s2, ι0 . . . ιn) ≥ ∆.
In other words, there are two states whose timing may arbitrarily diverge. Such effects are known to
exist in pipelines [17] and caches [1, 14]. If such domino effects exist for many pairs of states, valid
∆ functions will often have value ∞. In that case, the ∆ function is rather useless; it can rarely be
used to discard states.
1Domino effects are also known as unbounded timing effects [11].
6
Although the difference in execution times may not be bounded by a constant, the ratio between the
two execution times is always bounded, i.e., max(s1,ι0...ιn)
max(s2,ι0...ιn)
< ρ for some constant ρ. An alternative to
computing ∆ functions is to compute two functions ρ : State×State→ Q and δ : State×State→
Q that together bound the maximal difference between the two states in the following way:
Definition 4 (Valid ρ and δ functions). A pair of functions ρ and δ is valid, if for all pairs of states
s1, s2 ∈ State and instruction sequences ι0 . . . ιn ∈ I∗ :
max(s1, ι0 . . . ιn) ≤ ρ(s1, s2) ·max(s2, ι0 . . . ιn) + δ(s1, s2).
If there are no domino effects, there are valid ρ and δ functions in which ρ is 1 everywhere. In that
case, δ is valid in the sense of Definition 2. Otherwise, ρ and δ can still be used to safely discard
states: Say the analysis wants to discard a state s1 with execution time t1, but keep another state s2
with execution time t2, s.t. ρ(s1, s2) = 1.05 and δ(s1, s2) = 5. Then, the analysis could discard s1,
but remember to multiply the future execution time of s2 by 1.05. Also, similarly to the case of ∆
functions, if t2 − t1 < 5, it would have to add 5− (t2 − t1) to t2.
5. Computation of Valid ∆ Functions
Given an instruction semantics, how to compute a valid ∆ function? Of course, one cannot sim-
ply enumerate all sequences of instructions. However, it is possible to define a system of recursive
constraints whose solutions are valid ∆ functions. These recursive equations correspond to the exe-
cution of no instruction at all or to the execution of a single instruction. Therefore, a finite number of
constraints will suffice.
As ∆ needs to bound the difference in timing for all instruction sequences, it must do so in particular
for the empty sequence. This implies the constraints
∆(s1, s2) ≥ 0 (3)
for all s1, s2 ∈ State. Longer sequences can be covered by the recursive constraints
∆(s1, s2) ≥ t′1 − t′2 + ∆(s′1, s′2) (4)
for all s1, s2, s′1, s
′
2 ∈ State such that s1
t′1−→ι s′1 ∧ s2
t′2−→ι s′2 for some ι.
Theorem 1. Any solution ∆ to this set of constraints is a valid ∆ function.
Proof. A ∆ function that satisfies the above constraints actually fulfills a stronger condition than
validity. The constraints guarantee that ∆(s1, s2) ≥ max(s1, ι0 . . . ιn) − min(s2, ι0 . . . ιn) for all
s1, s2 ∈ State and ι0 . . . ιn ∈ I∗. Proof by induction over the length of the sequence ι0 . . . ιn:
Base case: We have to show that ∆(s1, s2) ≥ max(s1, )−min(s2, ). Since max(s1, )−min(s2, ) = 0,
this is trivially fulfilled by satisfaction of the ∆(s1, s2) ≥ 0 constraints.
Inductive step: We have to show that ∆(s1, s2) ≥ max(s1, ι0 . . . ιn+1) − min(s2, ι0 . . . ιn+1) given
that ∆(s1, s2) ≥ max(s1, ι1 . . . ιn+1)−min(s2, ι1 . . . ιn+1). The recursive constraints guarantee that
∆(s1, s2) ≥ max{t′1 − t′2 + ∆(s′1, s′2) | s1
t′1−→ι0 s′1 ∧ s2
t′2−→ι0 s′2}.
7
Plugging the induction hypothesis (I.H.) for ∆(s′1, s
′
2) into this yields
∆(s1, s2)
I.H.≥ max{t′1 − t′2 + max(s′1, ι1 . . . ιn+1)−min(s′2, ι1 . . . ιn+1) | s1
t′1−→ι0 s′1 ∧ s2
t′2−→ι0 s′2}
Eq. 1,2
= max{t′1 − t′2 + t′′1 − t′′2 | s1
t′1−→ι0 s′1 ∧ s′1
t′′1−−−−−→ι1...ιn+1 s′′1 ∧ s2
t′2−→ι0 s′2 ∧ s′2
t′′2−−−−−→ι1...ιn+1 s′′2}
= max{t′1 + t′′1 | s1
t′1−→ι0 s′1 ∧ s′1
t′′1−−−−−→ι1...ιn+1 s′′1} −min{t′2 + t′′2 | s2
t′2−→ι0 s′2 ∧ s′2
t′′2−−−−−→ι1...ιn+1 s′′2}
Eq. 1,2
= max(s1, ι0 . . . ιn+1)−min(s2, ι0 . . . ιn+1)
The lower the function value of ∆, the more states can be discarded using ∆. We are therefore
computing the least solution to the set of constraints.
Our constraints fall into the class of difference constraints. The least solution of a system of difference
constraints can be found by solving a shortest path problem [13]. To compute this efficiently, one can
use Tarjan’s algorithm of subtree disassembly and FIFO selection rule [18]. Negative cycles in the
constraint graph correspond to domino effects. Tarjan’s algorithm allows for an efficient detection
and elimination of these cycles. The least solution for pairs of states on these cycles is∞.
We have also implemented a solution to compute valid ρ and δ functions, which allows to bound
the effect of domino effects. However, due to space constraints, we cannot present the theoretical
background and the results of these computations here.
6. Case Study
We used the analysis technique described above on a processor model that includes a decoupled
fetch pipeline, multi-cycle functional units and variable memory access latencies. Figure 5 shows
a diagrammatic representation of this model. E0 through Ek represent functional units that execute
instructions. A particular instruction may be scheduled to be executed on any one of a subset of
the available functional units. The L/S unit is for load-store instructions only. Our model currently
does not consider delays due to reorder-buffer unavailability or bus contention cycles. However, stalls
due to data dependencies, functional unit occupancy, and limited fetch buffer space are considered.
Speculative execution is not modeled.
Specifications of instances of this model are used as inputs to our analysis engine that computes the
∆ function as described in Section 5 A simple architecture description language is used to describe
the processors. The size of the state space is reduced by exploiting symmetries in the specification.
For example, instructions are classified into types according to their execution latencies. The analyzer
first builds a transition system corresponding to this specification. Each state in the transition system
comprises of the states of the functional units, fetch buffers, and fetch and data accesses in transit.
The edge between any two adjacent states, S1 and S2 in this transition system is annotated with the
pair (t,I) that indicates a transition from state S1 to S2 after t cycles due to execution of instruction
I. Once we have built this transition system we can construct the constraint system described in the
previous section in a straightforward way.
Figure 6 shows part of the transition system for a processor with a domino effect. In this processor,
functional unit E0 is optimized for instruction type I1 whereas E1 is optimized for I2. I1 needs 2
8
Figure 5. Processor Architecture
cycles to execute on E0, but 4 cycles on E1. The timing requirements for I2 are opposite. There are
2 fetch buffers that are drained in order and Icache hit time is 1 cycle. In the figure, Ij:m indicates
that instruction type Ij is in its mth cycle of execution on that functional unit. A shaded functional
unit indicates a new instruction dispatch (m=1) that cycle. The figure shows that the timing difference
between paths starting from states S0 and S4 is 1 clock cycle per loop. The timing difference is thus
unbounded on the instruction sequence (I1.I2)∗. It is useful to use the cycle ratio (4/3) in this case.
The transition system for a simple processor example with 2 instruction types, 2 functional units,
execution times ranging from 2 to 6 cycles, 4 fetch buffers and no domino effects had 555 states and
generated 97340 constraints. The ∆ function ranged from 0 through 7 with 0s forming 88.1% of
the distribution. Currently we have not experimented with full specifications of real processors. Our
preliminary investigations with scaled down models of processors suggest that the constraint graphs
are usually sparse: the number of inequations for most of the toy models typically lie well within 0.1%
of the theoretical bound of |S|4, where S is the set of states in which a new instruction is fetched.
7. Conclusions and Future Work
We presented a simple model of micro-architectural analysis which resembles several existing anal-
yses. Timing anomalies in abstract hardware models prevent sound and efficient micro-architectural
analysis. To enable efficient and yet sound analysis, we introduced ∆ functions, which allow to
safely prune analysis states even for architectures that exhibit timing anomalies. We have shown how
to compute valid ∆ functions for a hardware model and evaluated our approach on two example ar-
chitectures. We have also introduced valid ρ and δ functions, which allow to prune states even in
the presence of domino effects. On the way, we arrived at a slightly simpler definition of timing
anomalies than in [15] and the first formal definition of domino effects.
In future work, we plan to apply our approach to real architectures, like the MOTOROLA POWERPC
75X or the ARM9, and evaluate the improvement in analysis efficiency. We have seen that valid ∆
functions also allow to trade precision for additional efficiency. It will be interesting to study this
trade-off on real architectures.
9
Figure 6. Example with Domino Effects
Acknowledgements The authors would like to thank Daniel Grund and Claire Burguie`re for valu-
able comments on drafts of this paper.
References
[1] BERG, C. PLRU cache domino effects. In WCET ’06 (2006), Schloss Dagstuhl, Germany.
[2] COUSOT, P., AND COUSOT, R. Building the Information Society. Kluwer Academic Publish-
ers, 2004, ch. Basic Concepts of Abstract Interpretation, pp. 359–366.
[3] ENGBLOM, J. Processor Pipelines and Static Worst-Case Execution Time Analysis. PhD thesis,
Uppsala University, 2002.
[4] ENGBLOM, J., AND JONSSON, B. Processor pipelines and their properties for static WCET
analysis. In EMSOFT’02 (London, UK, 2002), Springer-Verlag, pp. 334–348.
[5] ERMEDAHL, A. A Modular Tool Architecture for Worst-Case Execution Time Analysis. PhD
thesis, Uppsala University, 2003.
[6] FERDINAND, C., AND WILHELM, R. Efficient and precise cache behavior prediction for
real-time systems. Real-Time Systems 17, 2-3 (1999), 131–181.
[7] GRAHAM, R. L. Bounds on multiprocessing timing anomalies. SIAM Journal of Applied
Mathematics 17, 2 (1969), 416–429.
[8] HEALY, C. A., WHALLEY, D. B., AND HARMON, M. G. Integrating the timing analysis of
pipelining and instruction caching. In RTSS’95 (Dec. 1995), pp. 288–297.
[9] KIRNER, R., KADLEC, A., AND PUSCHNER, P. Precise worst-case execution time analysis
for processors with timing anomalies. In ECRTS’09 (July 2009).
10
[10] LI, X., ROYCHOUDHURY, A., AND MITRA, T. Modeling out-of-order processors for WCET
analysis. Real-Time Systems 34, 3 (November 2006), 195–227.
[11] LUNDQVIST, T. A WCET Analysis Method for Pipelined Microprocessors with Cache Memo-
ries. PhD thesis, Chalmers University of Technology, Sweden, June 2002.
[12] LUNDQVIST, T., AND STENSTRO¨M, P. Timing anomalies in dynamically scheduled micro-
processors. In RTSS’99 (Washington, DC, USA, 1999), IEEE Computer Society.
[13] PRATT, V. R. Two easy theories whose combination is hard. Tech. rep., Massachusetts Institute
of Technology, 1977.
[14] REINEKE, J. Caches in WCET Analysis. PhD thesis, Universita¨t des Saarlandes, Saarbru¨cken,
Germany, November 2008.
[15] REINEKE, J., WACHTER, B., THESING, S., WILHELM, R., POLIAN, I., EISINGER, J.,
AND BECKER, B. A definition and classification of timing anomalies. In WCET’06 (July
2006), Schloss Dagstuhl, Germany.
[16] ROCHANGE, C., AND SAINRAT, P. A context-parameterized model for static analysis of
execution times. Trans. on HiPEAC 2, 3 (2007), 109–128.
[17] SCHNEIDER, J. Combined Schedulability and WCET Analysis for Real-Time Operating Sys-
tems. PhD thesis, Saarland University, Germany, Saarbru¨cken, Germany, December 2002.
[18] TARJAN, R. E. Shortest paths. Tech. rep., AT&T Bell Laboratories, Murray Hill, NJ, 1981.
[19] THEILING, H., FERDINAND, C., AND WILHELM, R. Fast and precise WCET prediction by
separated cache and path analyses. Real-Time Systems 18, 2/3 (May 2000), 157–179.
[20] THESING, S. Safe and Precise WCET Determination by Abstract Interpretation of Pipeline
Models. PhD thesis, Saarland University, Saarbru¨cken, Germany, 2004.
[21] WENZEL, I., KIRNER, R., PUSCHNER, P., AND RIEDER, B. Principles of timing anomalies
in superscalar processors. In Proc. 5th International Conference on Quality Software (Sep.
2005).
11
