Safe measurement-based WCET estimation by  & Puaut, Isabelle
Safe measurement-based WCET estimation
Jean-François Deverge and Isabelle Puaut
Université de Rennes 1 - IRISA
Campus Universitaire de Beaulieu
35042 Rennes Cedex, France
{Jean-Francois.Deverge|Isabelle.Puaut}@irisa.fr
Abstract
This paper explores the issues to be addressed to provide
safe worst-case execution time (WCET) estimation methods
based on measurements. We suggest to use structural test-
ing for the exhaustive exploration of paths in a program.
Since test data generation is in general too complex to be
used in practice for most real-size programs, we propose
to generate test data for program segments only, using pro-
gram clustering. Moreover, to be able to combine execution
time of program segments and to obtain the WCET of the
whole program, we advocate the use of compiler techniques
to reduce (ideally eliminate) the timing variability of pro-
gram segments and to make the time of program segments
independent from one another.
1. Motivation
Computation of WCET is an important issue for hard
real-time systems. Common approaches for WCET com-
putations deal with static analysis of program structures.
They rely on hardware models to produce execution time
estimations. Latest processors have performance increas-
ing features like caches, branch predictors or multiple-issue
pipelines that maintain an internal state that is difficult to
predict. As a consequence, these complex hardware mod-
els are harder and harder to design [7], leading to safe but
pessimistic estimations.
An alternative approach is to use measurements on real
hardware (or a cycle accurate simulator) to obtain WCET
estimates. However, exhaustive enumeration of all program
inputs is intractable for most programs. Heuristics, like evo-
lutionary algorithms [16], might be used to generate input
test data that may cover the worst case path of the pro-
gram. While yielding realistic WCET estimations, there is
no guarantee to measure the worst case execution path of
the program. Therefore, these methods have almost been
used to increase confidence of static WCET analysis meth-
ods only [13].
On one hand, program testing may produce unsafe but
realistic results. On the other hand, static WCET analy-
sis approaches produce safe but pessimistic WCET estima-
tions. However, safe and tight estimations of the WCET are
highly desirable. Ideally, one would desire WCET tools that
produce safe and tight results without harness development
of timing models for the next generation processors.
This paper explores the issues to be addressed to design a
measurement-based method that produces safe results. We
propose to rely on structural testing [20] methods to gen-
erate input test data and to exhaustively measure the ex-
ecution time of program paths. We advocate the use of
compiler techniques to reduce (ideally eliminate) the tim-
ing variability of program measurements. In Section 2, we
outline our method for WCET timing analysis and we give
some properties on hardware measurements our method re-
lies on. Section 3 describes how the properties are met,
through the control of the unpredictability of some hard-
ware mechanisms, and contains some preliminary results
of path measurements on a PowerPC 7450. Related work,
some concluding remarks and directions of our ongoing
work are given in Section 4.
2. Method outline
One would obtain the program’s WCET by measuring all
program executions with any of the possible input data for
this program. However, exhaustive enumeration of a pro-
gram input is unfeasible for most programs. Another ap-
proach is to measure all paths of the program. This reduces
the number of measurements because a set of possible in-
put data may activate the same program path. However, the
path coverage is impracticable for program with unbounded
loops, yielding an infinite number of paths [20].
In this paper, we propose to employ structural testing
methods [8, 18, 20] to automatically generate input data. A
key assumption we make is that the measurement of the ex-
ecutions of the same program path, with different data val-
ues, yields the same timing results. Meeting this assumption
requires to control the hardware: this issue is discussed in
Section 3.
Program clustering. Test data generation methods are
mostly based on equations [8] or constraint solving tech-
niques [18]. Due to solver tools and their potentially lack of
scalability, the analysis of complete paths of the whole pro-
gram could be unachievable in practice. Moreover, number
of paths could be exponential even for small program. As
a consequence, we suggest splitting paths into segments to
lower the complexity of test data generation.
Proceedings of the 5th Intl Workshop on Worst-Case Execution Time (WCET) Analysis Page 13 of 49
ECRTS 2005
5th Intl. Workshop on Worst-Case Execution Time (WCET) Analysis
http://drops.dagstuhl.de/opus/volltexte/2007/808
An example, the program of Figure 1 contains
paths. For small values of (for example












Figure 1. Path clustering.
However, for greater values of , it would be suit-
able to split the program and apply measurements on seg-
ments. This process is called program clustering. An in-
tuitive solution to tackle test data generation complexity of
the code of Figure 1 is to build two clusters and . In
this configuration, there are only two paths in cluster and
a single path on cluster .
We propose to cluster the program as follows. The auto-
mated test data generation is first applied to the whole pro-
gram. If it produces too many results or if it does not termi-
nate before a limited amount of time, we stop it. We then
launch test data generation on smaller parts of the program
(e.g. sub trees of the program syntax tree). This iterative
process is repeated until segments are small enough to make
exhaustive path enumeration inside a segment tractable. We
obtain leaf cluster like .
Program measurement. We focus on the exploitation of
program measurements but we don’t address methods to ob-
tain program execution times: there exist multiple hardware
and software methods described in [11, 14]. Observations
points provide execution traces and give the execution times
of the program units observed [11]. In our approach, we
have to place observations points at the cluster boundaries.
For example, there are four observation points , ,
and on Figure 1.
We first measure the two paths of cluster and we ob-
tain values and cycles for instance. The worst value
is the WCET of the cluster and is . Then,
we execute and we measure the single path of cluster .
Table 1 contains the measurement trace of this execution.
The value of is not the WCET for the whole
program, because this execution could have covered the
shortest path of cluster . Consequently, we have to add the
difference between and each measured
during the execution of cluster . In this way, we obtain an
upper bound of the global WCET. Program clustering en-
ables automated test data generation on subprogram paths
or program segments. The longer the program segments
will be, the tighter the WCET estimation will be.
In this section, we have proposed to assemble WCET
of leaves clusters using measurement. We could also in-
vestigate for hybrid approach that couples testing and static
WCET analysis. In such an approach, we should measure
Observation point Time stamp Observation interval
. . . . . .
Table 1. Measurement trace of the path exe-
cution of the cluster .
program segments using testing methods and we should use
static methods to combine these context-independent seg-
ments timings.
3. Obtaining safe program segment measure-
ments
In previous section, we have assumed that any measure-
ments of the different executions of the same program path
gives the same results. In this section, we focus on obtain-
ing such safe and context-independent measurements.
There are three main sources of unpredictability in com-
plex processor architectures:
1. Global mechanisms, like caches, virtual memory trans-
lation (TLB) or branch predictors. Their internal state
and the contents of their tables have direct impact on
the execution time of future instructions of the whole
program [5, 9].
2. Variable latency instructions. Some operations, as the
integer multiplication instruction, may have variable
timing behaviour because the result should be com-
puted faster on small valued input data operands.
Processor may partially implement some operations,
as the float division or the square root instruction. This
means that, in order to support unimplemented opera-
tions in hardware, an exception is raised and operation
should be computed by an exception handler provided
by the operating system.
3. Statistic execution interference phenomenon [12], due
to unpredictability introduced by DRAM refresh. Sim-
ilarly to variable latency instructions, load/store oper-
ations to the main memory may have varying timing
behaviour. Moreover, processors have a built-in multi-
ple level cache hierarchy, and some cache clock speeds
may be different to the clock speed of the core proces-
sor. A tiny deviation on timings may occur if a load
request is received immediately or on the next clock
cycle of the slower cache level.
Gaining control of processor unpredictability. Obtain-
ing safe and context-independent measurements requires to
eliminate (or at least drastically decrease) the sources of
Proceedings of the 5th Intl Workshop on Worst-Case Execution Time (WCET) Analysis Page 14 of 49
timing variability. For that purpose, we are currently con-
sidering a few approaches relying on hardware control and
compilation methods.
Regarding the first source of unpredictability, global
mechanisms might be disabled or we should clear their his-
tory tables before the execution of each program segment.
Cache conscious data placement [4] and cache locking [15]
reduce varying timings of memory accesses. Likewise,
static branch prediction enable to fix behaviour of specu-
lative execution at compilation time [3].
In order to support variable latency operations, [19] pro-
poses to add the difference between the BCET and the
WCET of all the operations of the program path. Another
approach consists in avoiding the use of these operations
and to replace them by predictable instructions.
We could forbid the varying timing behaviour of partially
unimplemented instruction by disabling the execution of the
exception handler. However, this may affect operational
semantics of instructions [10]. It should be preferable to
rewrite temporally predictable exception handler and to ap-
ply the same strategies as those applied for variable latency
operations.
It is not possible to control variability on latency of mem-
ory access. However, we feel that such a fluctuation in mea-
surements follows a true statistical distribution. Models to
quantify pessimism to apply on results of measurements are
related in [2]. In addition, variability of load/store opera-
tions latency may be due to the input-dependent memory
accesses of the program.
Figure 2. Single path program with unpre-
dictable timings of data access.
The sample code from Figure 2 is a single path program.
Nevertheless, the number of cache misses on array de-
pends on the contents of . To make this code temporally
deterministic, we could disable the cache feature before the
memory access to contents of [15]. We could also set the
whole array as non cachable introducing program perfor-
mance penalty. In order to enhance data access latency, we
could employ data cache locking [15] or to do scratchpad
memory allocation [1] of data subject to unknown memory
access patterns.
Preliminary experiments. In order to evaluate if the tim-
ing variability of program segments can be controlled by
software, we conducted experiments on a PowerPC 7450
processor [10]. This 32-bit processor is able to dispatch 3
instructions per cycles on an in-order, seven stage pipeline.
It features two dynamic branch prediction mechanisms: a
2-bit prediction scheme with a branch target buffer, and a
return stack predictor. Our chip has a 64-Kbyte level-one
(L1) cache, and a 256-Kbyte L2 cache. A load will take 3
cycles if the data is in the L1 cache. There is a maximum
latency of 9 processor cycles for L1 data cache miss that
hits in the L2.
For our preliminary evaluation, we evaluated the impact
of hardware control on the execution time of a program seg-
ment (SNU-RT jfdctint) made of a single path. We achieved
these experiments in isolation from asynchronous activities
by disabling operating system’s context switches and dis-
abling external interrupts.
Figure 3 shows the timings of two sets of 25 mea-
surements. Before each jfdctint execution measurement,
we first executed one of twenty-five pollution codes: the
program itself, random generator, load and writes of big
amount of memory, intensive control code, and some
























Time−ordered measures of 25 executions
Figure 3. Measurements of jfdctint execution
times.
The first set of measurements are made with hard-
ware control. After execution of the pollution code, we
have cleared branch predictor buffers, and we have flushed
caches (TLB, L1 and L2). The second set of measurements
is made without any hardware control.
We can note that the variability of program running times
is largely reduced with hardware control. We observe that
measurement variability is decreased from cycles to
cycles. Without hardware control, the best case exe-
cution time is obtained after the execution of the program
itself (warm caches effect). The worst case execution time
is due to a pollution code that fills the entire data cache with
dirty lines. Consequently, for many data accesses of the
measured program, the processor had to update the mem-
ory with the victim cache contents before its replacement
with program data.
We have investigated the sources of variability on
measurement with hardware control. The memory
performance-monitoring counters on the PowerPC 7450 re-
veal that main sources of variability, for program measure-
ments with hardware control, are due to main memory and
L2 variable access latency.
It can be remarked that average program running times
are almost the same with or without hardware control.
There is no performance degradation on that specific exper-
iment because we did neither deactivate the cache nor the
Proceedings of the 5th Intl Workshop on Worst-Case Execution Time (WCET) Analysis Page 15 of 49
branch predictor. This suggests that long program segments
can take advantage of dynamic mechanisms if history tables
or related internal states could be cleared before execution.
4. Conclusion and future work
In this paper, we have proposed to compute the WCET
from execution measurements. We advocate the use of
structural testing methods and program clustering to en-
able measurements of the worst case execution path. This
measurement-based approach would produce safe and tight
results.
Recently, the use of another software unit test approach
has been proposed in [17]. Model checking methods pro-
duce input data to exhaustively cover paths of automatically
generated programs from MatLab/Simulink specifications.
This approach enables to measure WCET of straight-line C
programs with no loops.
Previously, [19] has used data flow analysis to detect
single feasible path segments of the program. In their ap-
proach, only single path segments are measured, and static
WCET analysis is employed on the rest of the program. [19]
gives conditions to obtain safe measurements on processors
with cache.
Clustering techniques have been applied to static WCET
analysis methods to enhance their scalability [6]. The clus-
tering is applied on the syntax tree of the program and the
main criterion used is a limit on the number of generated
constraints. We propose to apply a similar strategy in our
approach, our objective being to reduce the complexity of
test data generation.
Traditional static WCET analysis and measurement are
combined in [2]. There is no control of the hardware and
statistical models are applied, thus providing a probabilistic
safety on the global WCET [2]. The combination of test
data generation methods and these techniques would repre-
sent a fruitful area of study.
Our method has to control any processor features like
cache or branch prediction to reduce the unpredictability
of these advanced processors mechanisms. We plan to fur-
ther study the balance between hardware control, necessary
yielding negative performance impact on execution time,
and the benefit with respect to measurements variability.
References
[1] O. Avissar, R. Barua, and D. Stewart. An optimal mem-
ory allocation scheme for scratch-pad-based embedded sys-
tems. ACM Transactions on Embedded Computing Systems,
1(1):6–26, Nov. 2002.
[2] G. Bernat, A. Colin, and S. M. Petters. WCET analysis of
probabilistic hard real-time system. In Proceedings of the
23rd IEEE Real-Time Systems Symposium, pages 279–288,
Austin, TX, Dec. 2002.
[3] F. Bodin and I. Puaut. A WCET-oriented static branch pre-
diction scheme for real time systems. In Proceedings of the
17th Euromicro Conference on Real-Time Systems, Palma de
Mallorca, Spain, July 2005. To appear.
[4] B. Calder, C. Krintz, S. John, and T. Austin. Cache-
conscious data placement. In Proceedings of the 8th Inter-
national Conference on Architectural Support for Program-
ming Languages and Operating Systems, pages 139–149,
San Jose, CA, Oct. 1998.
[5] J. Engblom. Analysis of the execution time unpredictability
caused by dynamic branch prediction. In Proceedings of the
9th IEEE Real-Time and Embedded Technology and Appli-
cations Symposium, pages 152–159, Toronto, Canada, May
2003.
[6] A. Ermedahl, F. Stappert, and J. Engblom. Clustered calcu-
lation of worst-case execution times. In Proceedings of the
International Conference on Compilers, Architectures and
Synthesis for Embedded Systems, pages 51–62, San Jose,
CA, Oct. 2003.
[7] R. Heckmann, M. Langenbach, S. Thesing, and R. Wil-
helm. The influence of processor architecture on the design
and the results of WCET tools. Proceedings of the IEEE,
91(7):1038–1054, 2003.
[8] G. Lee, J. Morris, K. Parker, G. A. Bundell, and P. Lam.
Using symbolic execution to guide test generation. Soft-
ware Testing, Verification and Reliability, 15(1):41–61, Mar.
2005.
[9] T. Lundqvist and P. Stenström. Timing anomalies in dy-
namically scheduled microprocessors. In Proceedings of
the 20th IEEE Real-Time Systems Symposium, pages 12–21,
Phoenix, AZ, Dec. 1999.
[10] MPC7450 RISC microprocessor family processor manual
revision 5. Freescale Semiconductor, Jan. 2005.
[11] S. M. Petters. Comparison of trace generation methods for
measurement based WCET analysis. In Proceedings of the
3rd International Workshop on Worst Case Execution Time
Analysis, pages 61–64, Porto, Portugal, June 2003.
[12] S. M. Petters and G. Färber. Making worst case execution
time analysis for hard real-time tasks on state of the art pro-
cessors feasible. In Proceedings of the 6th International
Workshop on Real-Time Computing and Applications Sym-
posium, pages 442–449, Hong Kong, China, Dec. 1999.
[13] P. Puschner and R. Nossal. Testing the results of static
worst-case execution-time analysis. In Proceedings of the
19th IEEE Real-Time Systems Symposium, pages 134–143,
Madrid, Spain, Dec. 1998.
[14] B. Sprunt. The basics of performance-monitoring hardware.
IEEE Micro, 22(4):64–71, July 2002.
[15] X. Vera, B. Lisper, and J. Xue. Data cache locking for higher
program predictability. In Proceedings of the ACM SIG-
METRICS International Conference on Measurement and
Modeling of Computer Systems, pages 272–282, San Diego,
CA, 2003.
[16] J. Wegener and M. Grochtmann. Verifying timing con-
straints of real-time systems by means of evolutionary test-
ing. Real-Time Systems, 15(3):275–298, Nov. 1998.
[17] I. Wenzel, B. Rieder, R. Kirner, and P. Puschner. Automatic
timing model generation by CFG partitioning and model
checking. In Proceedings of the Conference on Design, Au-
tomation, and Test in Europe, pages 606–611, Munich, Ger-
many, Mar. 2005.
[18] N. Williams, B. Marre, and P. Mouy. On-the-fly generation
of k-path tests for C functions. In Proceedings of the 19th
IEEE International Conference on Automated Software En-
gineering, pages 290–293, Linz, Austria, Sept. 2004.
[19] F. Wolf, R. Ernst, and W. Ye. Path clustering in software
timing analysis. IEEE Transactions on Very Large Scale In-
tegration Systems, 9(6):773–782, Dec. 2001.
[20] H. Zhu, P. A. V. Hall, and J. H. R. May. Software unit
test coverage and adequacy. ACM Computing Surveys,
29(4):366–427, Dec. 1997.
Proceedings of the 5th Intl Workshop on Worst-Case Execution Time (WCET) Analysis Page 16 of 49
