Ascertaining Uncertainty for Efficient Exact Cache Analysis by Maiza, Claire et al.
Ascertaining Uncertainty
for Efficient Exact Cache Analysis ?
Valentin Touzeau1, Claire Ma¨ıza1, David Monniaux1, and Jan Reineke2
1 Univ. Grenoble Alpes, VERIMAG, F-38000 Grenoble, France
CNRS, VERIMAG, F-38000 Grenoble, France
firstname.lastname@univ-grenoble-alpes.fr
2 Saarland University, Saarland Informatics Campus
Saarbru¨cken, Germany
reineke@cs.uni-saarland.de
Abstract. Static cache analysis characterizes a program’s cache behavior
by determining in a sound but approximate manner which memory
accesses result in cache hits and which result in cache misses. Such
information is valuable in optimizing compilers, worst-case execution
time analysis, and side-channel attack quantification and mitigation.
Cache analysis is usually performed as a combination of “must” and
“may” abstract interpretations, classifying instructions as either “always
hit”, “always miss”, or “unknown”. Instructions classified as “unknown”
might result in a hit or a miss depending on program inputs or the initial
cache state. It is equally possible that they do in fact always hit or always
miss, but the cache analysis is too coarse to see it.
Our approach to eliminate this uncertainty consists in (i) a novel abstract
interpretation able to ascertain that a particular instruction may definitely
cause a hit and a miss on different paths, and (ii) an exact analysis,
removing all remaining uncertainty, based on model checking, using
abstract-interpretation results to prune down the model for scalability.
We evaluated our approach on a variety of examples; it notably improves
precision upon classical abstract interpretation at reasonable cost.
1 Introduction
There is a large gap between processor and memory speeds termed the “memory
wall” [21]. To bridge this gap, processors are commonly equipped with caches,
i.e., small but fast on-chip memories that hold recently-accessed data, in the
hope that most memory accesses can be served at a low latency by the cache
instead of being served by the slow main memory. Due to temporal and spatial
locality in memory access patterns caches are often highly effective.
In hard real-time applications, it is important to bound a program’s worst-
case execution time (WCET). For instance, if a control loop runs at 100 Hz,
? This work was partially supported by the European Research Council under the
European Union’s Seventh Framework Programme (FP/2007-2013) / ERC Grant
Agreement nr. 306595 “STATOR”.
ar
X
iv
:1
70
9.
10
00
8v
2 
 [c
s.P
L]
  2
0 D
ec
 20
18
Result after MC
unknown
∃Miss∃Hit
∀Miss∃Hit ∧ ∃Miss∀Hit
Classical AI
Our new AI
Legend:
Fig. 1. Possible classifications of classical abstract-interpretation-based cache analysis,
our new abstract interpretation, and after refinement by model checking.
one must show that its WCET is less than 0.01 s. In some cases, measuring the
program’s execution time on representative inputs and adding a safety margin
may be enough, but in safety-critical systems one may wish for a higher degree
of assurance and use static analysis to cover all cases. On processors with caches,
such a static analysis involves classifying memory accesses into cache hits, cache
misses, and unclassified [20]. Unclassified memory accesses that in reality result
in cache hits may lead to gross overestimation of the WCET.
Tools such as Otawa3 and aiT4 compute an upper bound on the WCET of
programs after first running a static analysis based on abstract interpretation [11]
to classify memory accesses. Our aim, in this article, is to improve upon that
approach with a refined abstract interpretation and a novel encoding into finite-
state model checking.
Caches may also leak secret information [2] to other programs running on the
same machine—through the shared cache state—or even to external devices—due
to cache-induced timing variations. For instance, cache timing attacks on software
implementations of the Advanced Encryption Standard [1] were one motivation
for adding specific hardware support for that cipher to the x86 instruction set [15].
Cache analysis may help identify possibilities for such side-channel attacks and
quantify the amount of information leakage [7]; improved precision in cache
analysis then translates into fewer false alarms and tighter bounds on leakage.
An ideal cache analysis would statically classify every memory access at every
machine-code instruction in a program into one of three cases: i) the access is a
cache hit in all possible executions of the program ii) the access is a cache miss
in all possible executions of the program iii) in some executions the access is a
hit and in others it is a miss. However, no cache analysis can perfectly classify
all accesses into these three categories.
One first reason is that perfect cache analysis would involve testing the reach-
ability of individual program statements, which is undecidable.5 A simplifying
3 http://www.otawa.fr/: an academic tool developed at IRIT, Toulouse.
4 https://www.absint.com/ait/: a commercial tool developed by Absint GmbH.
5 One may object that given that we consider machine-level aspects, memory is
bounded and thus properties are decidable. The time and space complexity is however
prohibitive.
assumption often used, including in this article, is that all program paths are
feasible—this is safe, since it overapproximates possible program behaviors. Even
with this assumption, analysis is usually performed using sound but incomplete
abstractions that can safely determine that some accesses always hit (“∀Hit” in
Figure 1) or always miss (“∀Miss” in Fig. 1). The corresponding analyses are
called may and must analysis and referred to as “classical AI” in Fig. 1. Due to
incompleteness the status of other accesses however remains “unknown” (Fig. 1).
Contributions In this article, we propose an approach to eliminate this uncer-
tainty, with two main contributions (colored red and green in Figure 1):
1. A novel abstract interpretation that safely concludes that certain accesses
are hits in some executions (“∃Hit”), misses in some executions (“∃Miss”),
or hits in some and misses in other executions (“∃Hit ∧ ∃Miss” in Fig. 1).
Using this analysis and prior must- and may- cache analyses, most accesses
are precisely classified.
2. The classification of accesses with remaining uncertainty (“unknown”, “∃Hit”,
and “∃Miss”) is refined by model checking using an exact abstraction of
the behavior of the cache replacement policy. The results from the abstract
interpretation in the first analysis phase are used to dramatically reduce the
complexity of the model.
Because the model-checking phase is based on an exact abstraction of the
cache replacement policy, our method, overall, is optimally precise: it answers
precisely whether a given access is always a hit, always a miss, or a hit in some
executions and a miss in others (see “Result after MC” in Fig. 1).6 This precision
improvement in access classifications can be beneficial for tools built on top of
the cache analysis: in the case of WCET analysis for example, a precise cache
analysis not only improves the computed WCET bound; it can also lead to a
faster analysis. Indeed, in case of an unclassified access, both possibilities (cache
hit and cache miss) have to be considered [10,17].
The model-checking phase would be sufficient to resolve all accesses, but
our experiments show this does not scale; it is necessary to combine it with the
abstract-interpretation phase for tractability, thereby reducing (a) the number of
model-checker calls, and (b) the size of each model-checking problem.
2 Background: Caches and Static Cache Analysis
Caches Caches are fast but small memories that store a subset of the main
memory’s contents to bridge the latency gap between the CPU and main memory.
To profit from spatial locality and to reduce management overhead, main memory
is logically partitioned into a set of memory blocks M. Each block is cached as a
whole in a cache line of the same size.
6 This completeness is relative to an execution model where all control paths are
feasible, disregarding the functional semantics of the edges.
When accessing a memory block, the cache logic has to determine whether
the block is stored in the cache (“cache hit”) or not (“cache miss”). For efficient
look up, each block can only be stored in a small number of cache lines known as
a cache set. Which cache set a memory block maps to is determined by a subset
of the bits of its address. The cache is partitioned into equally-sized cache sets.
The size k of a cache set in blocks is called the associativity of the cache.
Since the cache is much smaller than main memory, a replacement policy
must decide which memory block to replace upon a cache miss. Importantly,
replacement policies treat sets independently7, so that accesses to one set do not
influence replacement decisions in other sets. Well-known replacement policies
are least-recently-used (LRU), used, e.g., in various Freescale processors such
as the MPC603E and the TriCore17xx; pseudo-LRU (PLRU), a cost-efficient
variant of LRU; and first-in first-out (FIFO). In this article we focus exclusively
on LRU. The application of our ideas to other policies is left as future work.
LRU naturally gives rise to a notion of ages for memory blocks: The age of a
block b is the number of pairwise different blocks that map to the same cache set
as b that have been accessed since the last access to b. If a block has never been
accessed, its age is ∞. Then, a block is cached if and only if its age is less than
the cache’s associativity k.
Given this notion of ages, the state of an LRU cache can be modeled by a
mapping that assigns to each memory block its age, where ages are truncated at
k, i.e., we do not distinguish ages of uncached blocks. We denote the set of cache
states by C = M → {0, . . . , k}. Then, the effect of an access to memory block b
under LRU replacement can be formalized as follows8:
update : C ×M → C
(q, b) 7→ λb′.

0 if b′ = b
q(b′) if q(b′) ≥ q(b)
q(b′) + 1 if q(b′) < q(b) ∧ q(b′) < k
k if q(b′) < q(b) ∧ q(b′) = k
(1)
Programs as Control-flow Graphs As is common in program analysis and
in particular in work on cache analysis, we abstract the program under analysis
by its control-flow graph: vertices represent control locations and edges represent
the possible flow of control through the program. In order to analyze the cache
behavior, edges are adorned with the addresses of the memory blocks that are
accessed by the instruction, including the instruction being fetched.
For instruction fetches in a program without function pointers or computed
jumps, this just entails knowing the address of every instruction—thus the
program must be linked with absolute addresses, as common in embedded code.
For data accesses, a pointer analysis is required to compute a set of possible
7 To our best knowledge, the only exception to this rule is the pseudo round-robin
policy, found, e.g., in the ARM Cortex A-9.
8 Assuming for simplicity that all cache blocks map to the same cache set.
addresses for every access. If several memory blocks may be alternatively accessed
by an instruction, multiple edges may be inserted; so there may be multiple
edges between two nodes. We therefore represent a control-flow graph by a tuple
G = (V,E), where V is the set of vertices and E ⊆ V × (M ∪ {⊥})× V is the
set of edges, where ⊥ is used to label edges that do not cause a memory access.
The resulting control-flow graph G does not include information on the
functional semantics of the instructions, e.g. whether they compute an addition.
All paths in that graph are considered feasible, even if, taking into account the
instruction semantics, they are not—e.g. a path including the tests x ≤ 4 and
x ≥ 5 in immediate succession is considered feasible even though the two tests
are mutually exclusive. All our claims of completeness are relative to this model.
As discussed above, replacement decisions for a given cache set are usually
independent of memory accesses to other cache sets. Thus, analyzing the behavior
of G on all cache sets is equivalent to separately analyzing its projections onto
individual cache sets: a projection of G on a cache set S is G where only blocks
mapping to S are kept. Projected control-flow graphs may be simplified, e.g. a
self-looping edge labeled with no cache block may be removed. Thus, we assume
in the following that the analyzed cache is fully associative, i.e. of a single cache
set.
Collecting Semantics In order to classify memory accesses as “always hit” or
“always miss”, cache analysis needs to characterize for each control location in a
program all cache states that may reach that location in any execution of the
program. This is commonly called the collecting semantics.
Given a control-flow graph G = (V,E), the collecting semantics is defined as
the least solution to the following set of equations, where RC : V → P(C) denotes
the set of reachable concrete cache configurations at each program location, and
RC0 (v) denotes the set of possible initial cache configurations:
∀v′ ∈ V : RC(v′) = RC0 (v′) ∪
⋃
(v,b,v′)∈E
updateC(RC(v), b), (2)
where updateC denotes the cache update function lifted to sets of states, i.e.,
updateC(Q, b) = {update(q, b) | q ∈ Q}.
Explicitly computing the collecting semantics is practically infeasible. For
a tractable analysis, it is necessary to operate in an abstract domain whose
elements compactly represent large sets of concrete cache states.
Classical Abstract Interpretation of LRU Caches To this end, the classical
abstract interpretation of LRU caches [9] assigns to every memory block at every
program location an interval of ages enclosing the possible ages of the block
during any program execution. The analysis for upper bounds, or must analysis,
can prove that a block must be in the cache; conversely, the one for lower bounds,
or may analysis, can prove that a block may not be in the cache.
The domains for abstract cache states under may and must analysis are
AMay = AMust = C = M → {0, ..., k}, where ages greater than or equal to the
Abstract Interpretation
may/must analysis
∃hit/∃miss analysis
Control-flow
graph
Cache
configuration
Simplified
program model
Focused
cache model
Model
checker
Section 3 Section 4
Fig. 2. Overall analysis flow.
cache’s associativity k are truncated at k as in the concrete domain. For reasons
of brevity, we here limit our exposition to the must analysis. The set of concrete
cache states represented by abstract cache states is given by the concretization
function: γMust(qˆMust) = {q ∈ C | ∀m ∈ M : q(m) ≤ qˆMust}. Abstract cache
states can be joined by taking their pointwise maxima: qˆM1 unionsqMust qˆM2 = λm ∈
M : max{qˆM1(m), qˆM2(m)}. For reasons of brevity, we also omit the definition
of the abstract transformer updateMust , which closely resembles its concrete
counterpart given in (1), and which can be found e.g. in [16].
Suitably defined abstract semantics RMust and RMay can be shown to over-
approximate their concrete counterpart:
Theorem 1 (Analysis Soundness [9]). The may and the must abstract se-
mantics are safe approximations of the collecting semantics:
∀v ∈ V : RC(v) ⊆ γMust(RMust(v)), RC(v) ⊆ γMay(RMay(v)). (3)
3 Abstract Interpretation for Definitely Unknown
All proofs can be found in Appendix A. Together, may and must analysis can
classify accesses as “always hit”, “always miss” or “unknown”. An access classified
as “unknown” may still be “always hit” or “always miss” but not detected as
such due to the imprecision of the abstract analysis; otherwise it is “definitely
unknown”. Properly classifying “unknown” blocks into “definitely unknown”,
“always hit”, or “always miss” using a model checker is costly. We thus propose an
abstract analysis that safely establishes that some blocks are “definitely unknown”
under LRU replacement.
Our analysis steps are summarized in Figure 2. Based on the control-flow
graph and on an initial cache configuration, the abstract-interpretation phase
classifies some of the accesses as “always hit”, “always miss” and “definitely
unknown”. Those accesses are already precisely classified and thus do not require
a model-checking phase. The AI phase thus reduces the number of accesses to
be classified by the model checker. In addition, the results of the AI phase are
used to simplify the model-checking phase, which will be discussed in detail in
Section 4.
...
...
v
w
Must : v 7→ k,w 7→ k
May : v 7→ k,w 7→ k
EH : v 7→ k,w 7→ k
EM : v 7→ k,w 7→ k
Must : v 7→ k,w 7→ k
May : v 7→ 1, w 7→ 0
EH : v 7→ 1, w 7→ 0
EM : v 7→ k,w 7→ k
Must : v 7→ 0, w 7→ k
May : v 7→ 0, w 7→ 1
EH : v 7→ 0, w 7→ 1
EM : v 7→ 0, w 7→ k
Must : v 7→ 1, w 7→ 0
May : v 7→ 1, w 7→ 0
EH : v 7→ 1, w 7→ 0
EM : v 7→ 1, w 7→ 0
Fig. 3. Example of two accesses in a loop that are definitely unknown. May/Must and
EH/EM analysis results are given next to the respective control locations.
An access is “definitely unknown” if there is a concrete execution in which
the access misses and another in which it hits. The aim of our analysis is to prove
the existence of such executions to classify an access as “definitely unknown”.
Note the difference with classical may/must analysis and most other abstract
interpretations, which compute properties that hold for all executions, while here
we seek to prove that there exist two executions with suitable properties.
An access to a block a results in a hit if a has been accessed recently, i.e., a’s
age is low. Thus we would like to determine the minimal age that a may have in
a reachable cache state immediately prior to the access in question. The access
can be a hit if and only if this minimal age is lower than the cache’s associativity.
Because we cannot efficiently compute exact minimal ages, we devise an Exists
Hit (EH) analysis to compute safe upper bounds on minimal ages. Similarly, to
be sure there is an execution in which accessing a results in a miss, we compute
a safe lower bound on the maximal age of a using the Exists Miss (EM) analysis.
Example. Let us now consider a small example. In Figure 3, we see a small
control-flow graph corresponding to a loop that repeatedly accesses memory
blocks v and w. Assume the cache is empty before entering the loop. Then,
the accesses to v and w are definitely unknown in fully-associative caches of
associativity 2 or greater: they both miss in the first loop iteration, while they
hit in all subsequent iterations. Applying standard may and must analysis, both
accesses are soundly classified as “unknown”. On the other hand, applying the EH
analysis, we can determine that there are cases where v and w hit. Similarly, the
EM analysis derives that there exist executions in which they miss. Combining
those two results, the two accesses can safely be classified as definitely unknown.
We will now define these analyses and their underlying domains more formally.
The EH analysis maintains upper bounds on the minimal ages of blocks. In
addition, it includes a must analysis to obtain upper bounds on all possible ages
of blocks, which are required for precise updates. Thus the domain for abstract
cache states under the EH analysis is AEH = (M → {0, . . . , k − 1, k})×AMust .
Similarly, the EM analysis maintains lower bounds on the maximal ages of blocks
and includes a regular may analysis: AEM = (M → {0, . . . , k− 1, k})×AMay . In
the following, for reasons of brevity, we limit our exposition to the EH analysis.
The EM formalization is analogous and can be found in the appendix.
The properties we wish to establish, i.e. bounds on minimal and maximal ages,
are actually hyperproperties [6]: they are not properties of individual reachable
states but rather of the entire set of reachable states. Thus, the conventional
approach in which abstract states concretize to sets of concrete states that are
a superset of the actual set of reachable states is not applicable. Instead, we
express the meaning, γEH , of abstract states by sets of sets of concrete states.
A set of states Q is represented by an abstract EH state (qˆ, qˆMust), if for each
block b, qˆ(b) is an upper bound on b’s minimal age in Q, minq∈Q q(b):
γEH : AEH → P(P(C))
(qˆ, qˆMust) 7→
{
Q ⊆ γMust(qˆMust) | ∀b ∈M : min
q∈Q
q(b) ≤ qˆ(b)} (4)
The actual set of reachable states is an element rather than a subset of this
concretization. The concretization for the must analysis, γMust , is simply lifted to
this setting. Also note that—possibly contrary to initial intuition—our abstraction
cannot be expressed as an underapproximation, as different blocks’ minimal ages
may be attained in different concrete states.
The abstract transformer updateEH ((qˆEH , qˆMust), b) corresponding to an ac-
cess to block b is the pair (qˆ′EH , updateMust(qˆMust , b)), where
qˆ′EH = λb
′.

0 if b′ = b
qˆ(b′) if qˆMust(b) ≤ qˆ(b′)
qˆ(b′) + 1 if qˆMust(b) > qˆ(b′) ∧ qˆ(b′) < k
k if qˆMust(b) > qˆ(b
′) ∧ qˆ(b′) = k
(5)
Let us explain the four cases in the transformer above. After an access to b,
b’s age is 0 in all possible executions. Thus, 0 is also a safe upper bound on its
minimal age (case 1). The access to b may only increase the ages of younger
blocks (because of the LRU replacement policy). In the cache state in which b′
attains its minimal age, it is either younger or older than b. If it is younger, then
the access to b may increase b′’s actual minimal age, but not beyond qˆMust(b),
which is a bound on b’s age in every cache state, and in particular in the one
where b′ attains its minimal age. Otherwise, if b′ is older, its minimal age remains
the same and so may its bound. This explains why the bound on b′’s minimal age
does not increase in case 2. Otherwise, for safe upper bounds, in cases 3 and 4,
the bound needs to be increased by one, unless it has already reached k.
Lemma 1 (Local Consistency). The abstract transformer updateEH soundly
approximates its concrete counterpart updateC :
∀(qˆ, qˆMust) ∈ AEH ,∀b ∈M,∀Q ∈ γEH (qˆ, qˆMust) :
updateC(Q, b) ∈ γEH (updateEH ((qˆ, qˆMust), b)). (6)
How are EH states combined at control-flow joins? The standard must join
can be applied for the must analysis component. In the concrete, the union of
the states reachable along all incoming control paths is reachable after the join.
It is thus safe to take the minimum of the upper bounds on minimal ages:
(qˆ1, qˆMust1) unionsqEH (qˆ2, qˆMust2) = (λb.min(qˆ1(b), qˆ2(b)), qˆMust1 unionsqMust qˆMust2) (7)
Lemma 2 (Join Consistency). The join operator unionsqEH is correct:
∀((qˆ1, qˆM1), (qˆ2, qˆM2)) ∈ A2EH , Q1 ∈ γEH (qˆ1, qˆM1), Q2 ∈ γEH (qˆ2, qˆM2) :
Q1 ∪Q2 ∈ γEH ((qˆ1, qˆM1) unionsqEH (qˆ2, qˆM2)). (8)
Given a control-flow graph G = (V,E), the abstract EH semantics is defined
as the least solution to the following set of equations, where REH : V → AEH
denotes the abstract cache configuration associated with each program location,
and RC0 (v) ∈ γEH (REH ,0(v)) denotes the initial abstract cache configuration:
∀v′ ∈ V : REH (v′) = REH ,0(v′) unionsqEH
⊔
(v,b,v′)∈E
updateEH (REH (v), b). (9)
It follows from Lemmas 1 and 2 that the abstract EH semantics includes the
actual set of reachable concrete states:
Theorem 2 (Analysis Soundness). The abstract EH semantics includes the
collecting semantics: ∀v ∈ V : RC(v) ∈ γEH (REH (v)).
We can use the results of the EH analysis to determine that an access results
in a hit in at least some of all possible executions. This is the case if the minimum
age of the block prior to the access is guaranteed to be less than the cache’s
associativity. Similarly, the EM analysis can be used to determine that an access
results in a miss in at least some of the possible executions.
Combining the results of the two analyses, some accesses can be classified
as “definitely unknown”. Then, further refinement by model checking is provably
impossible. Classifications as “exists hit” or “exists miss”, which occur if either
the EH or the EM analysis is successful but not both, are also useful to reduce
further model-checking efforts: e.g. in case of “exists hit” it suffices to determine
by model checking whether a miss is possible to fully classify the access.
4 Cache Analysis by Model Checking
All proofs can be found in Appendix B. We have seen a new abstract analysis
capable of classifying certain cache accesses as “definitely unknown”. The clas-
sical “may” and “must” analyses and this new analysis classify a (hopefully
large) portion of all accesses as “always hit”, “always miss”, or “definitely un-
known”. But, due to the incomplete nature of the analysis the exact status of
some blocks remains unknown. Our approach is summarized at a high level in
Listing 1.1. Functions May, Must, ExistsHit and ExistsMiss return the result
of the corresponding analysis, whereas CheckModel invokes the model checker
(see Listing 1.2). Note that a block that is not fully classified as “definitely
unknown” can still benefit from the Exists Hit and Exists Miss analysis during
the model-checking phase. If the AI phase shows that there exists a path on
which the block is a hit (respectively a miss), then the model checker does not
have to check the “always miss” (respectively “always hit”) property.
function ClassifyBlock(block) {
if (Must(block)) //Must analysis classifies the block
return AlwaysHit;
else if (!May(block)) //May analysis classifies the block
return AlwaysMiss;
else if (ExistHit(block) && ExistMiss(block))
return DefinitelyUnknown; //DU analysis classifies the block
else // Otherwise, we call the model checker
return CheckModel(block, ExistsHit(block), ExistsMiss(block));
}
Listing 1.1. Abstract-interpretation phase
function CheckModel(block, exist_hit, exist_miss) {
if (exist_hit) { //block can not always miss
if (CheckAH(block)) return AlwaysHit;
}
else if (exist_miss) { //block can not always hit
if (CheckAM(block)) return AlwaysMiss;
} else { //AI phase did not provide any information
if (CheckAH(block)) return AlwaysHit;
else if (CheckAM(block)) return AlwaysMiss;
}
return DefinitelyUnknown;
}
Listing 1.2. Model-checking phase
We shall now see how to classify these remaining blocks using model checking.
Not only is the model-checking phase sound, i.e. its classifications are correct, it
is also complete relative to our control-flow-graph model, i.e. there remain no
unclassified accesses: each access is classified as “always hit”, “always miss” or
“definitely unknown”. Remember that our analysis is based on the assumption
that each path is semantically feasible.
In order to classify the remaining unclassified accesses, we feed the model
checker a finite-state machine modeling the cache behavior of the program,
composed of i) a model of the program, yielding the possible sequences of
memory accesses ii) a model of the cache. In this section, we introduce a new
cache model, focusing on the state of a particular memory block to be classified,
which we further simplify using the results of abstract interpretation.
As explained in the introduction, it would be possible to directly encode the
control-flow graph of the program, adorned with memory accesses, as one big
finite-state system. A first step is obviously to slice that system per cache set to
make it smaller. Here we take this approach further by defining a model sound
and complete with respect to a given memory block a: parts of the model that
have no impact on the caching status of a are discarded, which greatly reduces
the model’s size. For each unclassified access, the analysis constructs a model
focused on the memory block accessed, and queries the model checker. Both the
simplified program model and the focused cache model are derived automatically,
and do not require any manual interaction.
The focused cache model is based on the following simple property of LRU: a
memory block is cached if and only if its age is less than the associativity k, or
in other words, if there are less than k younger blocks. In the following, w.l.o.g.,
let a ∈M be the memory block we want to focus the cache model on. If we are
only interested in whether a is cached or not, it suffices to track the set of blocks
younger than a. Without any loss in precision concerning a, we can abstract from
the relative ages of the blocks younger than a and of those older than a.
Thus, the domain of the focused cache model is C = P(M) ∪ {ε}. Here, ε is
used to represent those cache states in which a is not cached. If a is cached, the
analysis tracks the set of blocks younger than a. We can relate the focused cache
model to the concrete cache model defined in Section 2 using an abstraction
function mapping concrete cache states to focused ones:
α : C → C
q 7→
{
ε if q(a) ≥ k
{b ∈M | q(b) < q(a)} if q(a) < k (10)
The focused cache update update models a memory access as follows:
update : C ×M → C
(Q̂, b) 7→

∅ if b = a
ε if b 6= a ∧ Q̂ = ε
Q̂ ∪ {b} if b 6= a ∧ Q̂ 6= ε ∧ |Q̂ ∪ {b}| < k
ε if b 6= a ∧ Q̂ 6= ε ∧ |Q̂ ∪ {b}| = k
(11)
Let us briefly explain the four cases above. If b = a (case 1), a becomes the
most-recently-used block and thus no other blocks are younger. If a is not in
the cache and it is not accessed (case 2), then a remains outside of the cache. If
another block is accessed, it is added to a’s younger set (case 3) unless the access
causes a’s eviction, because it is the kth distinct younger block (case 4).
x y a v w
[−,−]

[x,−]

[y, x]

[a, y]
∅
[v, a]
{v}
[w, v]

Concrete cache model:
Focused cache model:
Fig. 4. Example: concrete vs. focused cache model.
Example. Figure 4 depicts a sequence of memory accesses and the resulting
concrete and focused cache states (with a focus on block a) starting from an
empty cache of associativity 2. We represent concrete cache states by showing
the two blocks of age 0 and 1. The example illustrates that many concrete cache
states may collapse to the same focused one. At the same time, the focused cache
model does not lose any information about the caching status of the focused
block, which is captured by the following lemma and theorem.
Lemma 3 (Local Soundness and Completeness). The focused cache up-
date abstracts the concrete cache update exactly:
∀q ∈ C,∀b ∈M : α(update(q, b)) = update(α(q), b). (12)
The focused collecting semantics is defined analogously to the collecting
semantics as the least solution to the following set of equations, where RC(v)
denotes the set of reachable focused cache configurations at each program location,
and RC,0(v) = α
C
(R
C
0 (v)) for all v ∈ V :
∀v′ ∈ V : RC(v′) = RC,0(v′) ∪
⋃
(v,b,v′)∈E
updateC(R
C
(v), b), (13)
where updateC denotes the focused cache update function lifted to sets of focused
cache states, i.e., updateC(Q, b) = {update(q, b) | q ∈ Q}, and αC denotes the
abstraction function lifted to sets of states, i.e., αC(Q) = {α(q) | q ∈ Q}.
Theorem 3 (Analysis Soundness and Completeness). The focused collect-
ing semantics is exactly the abstraction of the collecting semantics:
∀v ∈ V : αC(RC(v)) = RC(v). (14)
Proof. From Lemma 3 it immediately follows that the lifted focused update
updateC exactly corresponds to the lifted concrete cache update update
C .
Since the concrete domain is finite, the least fixed point of the system of
equations of Def. 2 is reached after a bounded number of Kleene iterations. One
then just applies the consistency lemmas in an induction proof. uunionsq
Thus we can employ the focused cache model in place of the concrete cache
model without any loss in precision to classify accesses to the focused block as
“always hit”, “always miss”, or “definitely unknown”.
For the program model, we simplify the CFG without affecting the correctness
nor the precision of the analysis: i) If we know, from may analysis, that in a
given program instruction a is never in the cache, then this instruction cannot
affect a’s eviction: thus we simplify the program model by not including this
instruction. ii) When we encode the set of blocks younger than a as a bit vector,
we do not include blocks that the may analysis proved not to be in the cache at
that location: these bits would anyway always be 0.
5 Related Work
Earlier work by Chattopadhyay and Roychoudhury [4] refines memory accesses
classified as “unknown” by AI using a software model-checking step: when abstract
interpretation cannot classify an access, the source program is enriched with
annotations for counting conflicting accesses and run through a software model
checker (actually, a bounded model checker). Their approach, in contrast to ours,
takes into account program semantics during the refinement step; it is thus likely
to be more precise on programs where many paths are infeasible for semantic
reasons. Our approach however scales considerably better, as shown in Section 6:
not only do we not keep the program semantics in the problem instance passed to
the model checker, which thus has finite state as opposed to being an arbitrarily
complex program verification instance, we also strive to minimize that instance
by the methods discussed in Section 4.
Chu et al. [5] also refine cache analysis results based on program semantics,
but by symbolic execution, where an SMT solver is used to prune infeasible paths.
We also compare the scalability of their approach to ours.
Our work complements [12], which uses the classification obtained by classical
abstract interpretation of the cache as a basis for WCET analysis on timed
automata: our refined classification would increase precision in that analysis.
Metta et al. [13] also employ model checking to increase the precision of WCET
analysis. However, they do not take into account low-level features such as caches.
6 Experimental Evaluation
In industrial use for worst-case execution time, cache analysis targets a specific
processor, specific cache settings, specific binary code loaded at a specific address.
The processor may have a hierarchy of caches and other peculiarities. Loading
object code and reconstructing a control-flow graph involves dedicated tools. For
data caches, a pointer value analysis must be run. Implementing an industrial-
strength analyzer including a pointer value analysis, or even interfacing in an
existing complex analyzer, would greatly exceed the scope of this article. For
these reasons, our analysis applies to a single-level LRU instruction cache, and
operates at LLVM bitcode level, each LLVM opcode considered as an elementary
instruction. This should be representative of analysis of machine code over LRU
caches at a fraction of the engineering cost.
l l
l
l
ll
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l l
l
l
l
l
l
100
1000
fa
c
re
cu
rs
io
n
bi
na
ry
se
ar
ch
pr
im
e
in
se
rts
or
t
bs
or
t
du
ff
co
u
n
tn
eg
at
ive
bi
to
ni
c
iir
m
a
tri
x1
co
m
pl
ex
_
u
pd
at
es
cr
c
lm
s
filt
er
ba
nk
cjp
eg
_w
rbm
p
di
jks
tra
st
jfd
cti
nt
fir
2d
im
fft
bi
tc
ou
nt
lu
dc
m
p
hu
ff_
de
c
m
in
ve
r
n
de
s
a
dp
cm
_d
ec
a
dp
cm
_e
nc
sh
a
co
ve
r
hu
ff_
en
c
a
n
a
gr
a
m
pe
tri
ne
t
fm
re
f
e
pi
c
st
at
em
at
e
g7
23
_e
nc
m
d5
cjp
eg
_tr
a
n
su
pp
pm
po
w
e
rw
in
do
w
qu
ick
so
rt
ba
sic
m
at
h
h2
64
_d
ec
gs
m
_d
ec
m
pe
g2
gs
m
_e
nc
od
e
a
m
m
u
n
iti
on
te
st
3
Program
N
um
be
r o
f a
cc
es
se
s
block_size l 4 8
Fig. 5. Size of benchmarks in CFG blocks of 4 and 8 LLVM instructions.
We implemented the classical may and must analyses, as well as our new
definitely-unknown analysis and our conversion to model checking. The model-
checking problems are produced in the NuSMV format, then fed to nuXmv [3].9
We used an Intel Core i3-2120 processor (3.30 GHz) with 8 GiB RAM.
Our experimental evaluation is intended to show i) precision gains by model
checking (number of unknowns at the may/must stage vs. after the full analy-
sis) ii) the usefulness of the definitely-unknown analysis (number of definitely-
unknown accesses, which corresponds to the reduced number of MC calls, reduced
MC cumulative execution time), iii) the global analysis efficiency (impact on
analysis execution time, reduced number of MC calls).
As analysis target we use the TACLeBench benchmark suite [8]10, the successor
of the Ma¨lardalen benchmark suite, which is commonly used in experimental
evaluations of WCET analysis techniques. Figure 5 (log. scale) gives the number
of blocks in the control flow graph where a block is a sequence of instructions that
are mapped to the same memory block. In all experiments, we assume the cache to
be initially empty and we chose the following cache configuration: 8 instructions
per block, 4 ways, 8 cache sets. More details on the sizes of the benchmarks and
further experimental results (varying cache configuration, detailed numbers for
each benchmark,...) may be found in the technical report [19].
9 https://nuxmv.fbk.eu/: nuXmv checks for reachability using Kleene iterations over
sets of states implicitly represented by binary decision diagrams (BDDs). We also
tried nuXmv’s implementation of the IC3 algorithm with no speed improvement.
10 http://www.tacle.eu/index.php/activities/taclebench
ll
l l
l
ll
l
l
l
l
l
l
l
l
ll
l
l ll ll l
l
ll l
l ll
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
0%
10%
20%
30%
fa
c
re
cu
rs
io
n
bi
na
ry
se
ar
ch
pr
im
e
in
se
rts
or
t
bs
or
t
du
ff
co
u
n
tn
eg
at
ive
bi
to
ni
c
iir
m
a
tri
x1
co
m
pl
ex
_
u
pd
at
es
cr
c
lm
s
filt
er
ba
nk
cjp
eg
_w
rbm
p
di
jks
tra
st
jfd
cti
nt
fir
2d
im
fft
bi
tc
ou
nt
lu
dc
m
p
hu
ff_
de
c
m
in
ve
r
n
de
s
a
dp
cm
_d
ec
a
dp
cm
_e
nc
sh
a
co
ve
r
hu
ff_
en
c
a
n
a
gr
a
m
pe
tri
ne
t
fm
re
f
e
pi
c
st
at
em
at
e
g7
23
_e
nc
m
d5
cjp
eg
_tr
a
n
su
pp
pm
po
w
e
rw
in
do
w
qu
ick
so
rt
ba
sic
m
at
h
h2
64
_d
ec
gs
m
_d
ec
gs
m
_e
nc
od
e
Sh
ar
e 
of
 a
cc
es
se
s 
re
fin
ed
 b
y 
M
C
	 
 
 
 
 
(%
 of
 bl
oc
ks
 c
la
ss
ifie
d 
by
 A
I p
ha
se
)
Fig. 6. Increase in hit/miss classifications due to MC relative to pure AI-based analysis.
6.1 Effect of Model Checking on Cache Analysis Precision
Here we evaluate the improvement in the number of accesses classified as “always
hit” or “always miss”. In Figure 6 we show by what percentage the number of
such classifications increased from the pure AI phase due to model checking.
As can be observed in the figure, more than 60% of the benchmarks show an
improvement and this improvement is greater than 5% for 45% of them.
We performed the same experiment under varying cache configurations (num-
ber of ways, number of sets, memory-block size) with similar outcomes.
6.2 Effect of the Definitely-Unknown Analysis on Analysis
Efficiency
We introduced the definitely-unknown analysis to reduce the number of MC
calls: instead of calling the MC for each access not classified as either always
hit or always miss by the classical static analysis, we also do not call it on
definitely-unknown blocks. Figure 7(a) shows the number of MC calls with and
without the definitely-unknown analysis. The two lines parallel to the diagonal
correspond to reductions in the number of calls by a factor of 10 and 100. The
definitely-unknown analysis significantly reduces the number of MC calls: for
some of the larger benchmarks by around a factor of 100. For the three smallest
benchmarks, the number of calls is even reduced to zero: the definitely-unknown
analysis perfectly completes the may/must analysis and no more blocks need to
be classified by model checking. For 28 of the 46 benchmarks, fewer than 10
calls to the model checker are necessary after the definitely-unknown analysis.
This reduction of the number of calls to the model checker also results in
significant improvements of the whole execution time of the analysis, which is
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l l
l
l l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
10
1000
10 1000
#MC calls with DU
#M
C 
ca
lls
 w
ith
ou
t D
U
(a) Number of calls to the MC.
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
1
100
1 100
Execution time of MC with DU (in s)
Ex
e
cu
tio
n 
tim
e 
of
 M
C 
wi
th
ou
t D
U 
(in
 s)
(b) Total MC time.
Fig. 7. Analysis efficiency improvements due to the definitely-unknown analysis.
dominated by the time spent in the model checker: see Figure 7(b). On average
(geometric mean) the total MC execution time is reduced by a factor of 3.7
compared with an approach where only the may and must analysis results are
used to reduce the number of MC calls.
Note that the definitely-unknown analysis itself is very fast: it takes less than
one second on all benchmarks.
6.3 Effect of Cache and Program Model Simplifications on
Model-Checking Efficiency
In all experiments we used the focused cache model: without this focused model,
the model is so large that a timeout of one hour is reached for all but the 6
smallest benchmarks. This shows a huge scalability improvement due to the
focused cache model. It also demonstrates that building a single model to classify
all the accesses at once is practically infeasible.
Figure 8 shows the execution time of individual MC calls (on a log. scale)
with and without program-model simplifications based on abstract-interpretation
results. For each benchmark, the figure shows the maximum, minimum, and
mean execution time of all MC calls for that benchmark. We observe that the
maximum execution time is always smaller with the use of the AI phase due to
the simplification of program models. Using AI results, there are fewer MC calls
and many of the suppressed MC calls are “cheap” calls: this explains why the
average may be larger with AI phase. Some benchmarks are missing the “without
AI phase” result: this is the case for benchmarks for which the analysis did not
terminate within one hour.
1e−04
1e−02
1e+00
1e+02
fa
c
re
cu
rs
io
n
bi
na
ry
se
ar
ch
pr
im
e
in
se
rts
or
t
bs
or
t
du
ff
co
u
n
tn
eg
at
ive
bi
to
ni
c
iir
m
a
tri
x1
co
m
pl
ex
_
u
pd
at
es
cr
c
lm
s
filt
er
ba
nk
cjp
eg
_w
rbm
p
di
jks
tra
st
jfd
cti
nt
fir
2d
im
fft
bi
tc
ou
nt
lu
dc
m
p
hu
ff_
de
c
m
in
ve
r
n
de
s
a
dp
cm
_d
ec
a
dp
cm
_e
nc
sh
a
co
ve
r
hu
ff_
en
c
a
n
a
gr
a
m
pe
tri
ne
t
fm
re
f
e
pi
c
st
at
em
at
e
g7
23
_e
nc
m
d5
cjp
eg
_tr
a
n
su
pp
pm
po
w
e
rw
in
do
w
qu
ick
so
rt
ba
sic
m
at
h
h2
64
_d
ec
gs
m
_d
ec
gs
m
_e
nc
od
e
Ti
m
e 
fo
r 
a
 c
a
ll 
to
 M
C 
(s)
Min−Mean−Max: with AI phase without AI phase
Fig. 8. MC execution time for individual call: min, mean, and max.
6.4 Efficiency of the Full Analysis
First, we compare our approach to that of the related work [4,5]. Both tools from
the related work operate at C level, while our analysis operates at LLVM IR level.
Thus it is hard to reasonably compare analysis precision. To compare scalability
we focus on total tool execution time, as this is available. In the experimental
evaluation of [4] we see that it takes 395 seconds to analyze statemate (they stop
the analysis at 100 MC calls). With a similar configuration, 64 sets, 4 ways, 4
instructions per block (resp. 8 instructions per blocks) our analysis makes 3 calls
(resp. 0) to the model checker (compared with 832 (resp. 259) MC calls without
the AI phase) and spends less than 3 seconds (resp. 1.5s) on the entire analysis.
Unfortunately, among all TACLeBench benchmarks [4] gives scalability results
only for statemate, and thus no further comparison is possible. The analysis
from [5] also spends more than 350 seconds to analyze statemate; for ndes it takes
38 seconds whereas our approach makes only 3 calls to the model checker and
requires less than one second for the entire analysis. This shows that our analysis
scales better than the two related approaches. However, a careful comparison of
analysis precision remains to be done.
To see more generally how well our approach scales, we compare the total
analysis time with and without the AI phase. The AI phase is composed of the
may, must and definitely-unknown analyses: without the AI phase, the model
checker is called for each memory access and the program model is not simplified.
On all benchmarks the number of MC calls is reduced by a factor of at least
10, sometimes exceeding a factor of 100 (see Figure 9(a)). This is unsurprising
given the strong effect of the definitely-unknown analysis, which we observed
in the previous section. Additional reductions compared with those seen in
Figure 7(a) result from the classical may and must analysis. Interestingly, the
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
ll
l
l l
l
l
l
l
l
10
1000
10 1000
#MC calls with AI phase
#M
C 
ca
lls
 w
ith
ou
t A
I p
ha
se
(a) Number of calls to the MC.
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
1
100
1 100
Execution time of MC with AI phase (in s)
Ex
e
cu
tio
n 
tim
e 
of
 M
C
	 
 
 
 
 
w
ith
ou
t A
I p
ha
se
 (in
 s)
(b) Total MC time.
Fig. 9. Analysis efficiency improvements due to the entire AI phase.
reduction in total MC time appears to increase with increasing benchmark sizes:
see Figure 9(b). While the improvement is moderate for small benchmarks that
can be handled in a few seconds with and without the AI phase, it increases to
much larger factors for the larger benchmarks.
It is difficult to ascertain the influence our approach would have on a full
WCET analysis, with respect to both execution time and precision. In particular,
WCET analyses that precisely simulate the microarchitecture need to explore
fewer pipeline states if fewer cache accesses are classified as “unknown”. Thus
a costlier cache analysis does not necessarily translate into a costlier analysis
overall. We consider a tight integration with a state-of-the-art WCET analyzer
as interesting future work, which is beyond the scope of this paper.
7 Conclusion and Perspectives
We have demonstrated that it is possible to precisely classify all accesses to an
LRU cache at reasonable cost by a combination of abstract interpretation, which
classifies most accesses, and model checking, which classifies the remaining ones.
Like all other abstraction-interpretation-based cache analyses, at least those
known to us, ours considers all paths within a control-flow graph to be feasible
regardless of functional semantics. Possible improvements over this include:
i) encoding some of the functional semantics of the program into the model-
checking problem [13,4] ii) using “trace partitioning” [18] or “path focusing” [14]
in the abstract-interpretation phase.
References
1. Bernstein, D.J.: Cache-timing attacks on AES (2005), https://cr.yp.to/antiforgery/
cachetiming-20050414.pdf
2. Canteaut, A., Lauradoux, C., Seznec, A.: Understanding cache attacks. Tech. Rep.
5881, INRIA (Apr 2006), https://hal.inria.fr/inria-00071387/en/
3. Cavada, R., Cimatti, A., Dorigatti, M., Griggio, A., Mariotti, A., Micheli, A., Mover,
S., Roveri, M., Tonetta, S.: The nuXmv symbolic model checker. In: Computer-aided
verification (CAV). LNCS, vol. 8559, pp. 334–342. Springer (2014)
4. Chattopadhyay, S., Roychoudhury, A.: Scalable and precise refinement of cache
timing analysis via path-sensitive verification. Real-Time Systems 49(4), 517–562
(2013), http://dx.doi.org/10.1007/s11241-013-9178-0
5. Chu, D., Jaffar, J., Maghareh, R.: Precise cache timing analysis via symbolic
execution. In: 2016 IEEE Real-Time and Embedded Technology and Applica-
tions Symposium (RTAS), Vienna, Austria, April 11-14, 2016. pp. 293–304. IEEE
Computer Society (2016), http://dx.doi.org/10.1109/RTAS.2016.7461358
6. Clarkson, M.R., Schneider, F.B.: Hyperproperties. In: Proceedings of the 21st IEEE
Computer Security Foundations Symposium, CSF 2008, Pittsburgh, Pennsylvania,
23-25 June 2008. pp. 51–65 (2008), http://dx.doi.org/10.1109/CSF.2008.7
7. Doychev, G., Ko¨pf, B., Mauborgne, L., Reineke, J.: CacheAudit: A tool for the
static analysis of cache side channels. ACM Trans. Inf. Syst. Secur. 18(1), 4:1–4:32
(Jun 2015), http://doi.acm.org/10.1145/2756550
8. Falk, H., Altmeyer, S., Hellinckx, P., Lisper, B., Puffitsch, W., Rochange, C.,
Schoeberl, M., Sorensen, R.B., Wa¨gemann, P., Wegener, S.: TACLeBench: A
benchmark collection to support worst-case execution time research. In: 16th
International Workshop on Worst-Case Execution Time Analysis, WCET 2016,
July 5, 2016, Toulouse, France. pp. 2:1–2:10 (2016), http://dx.doi.org/10.4230/
OASIcs.WCET.2016.2
9. Ferdinand, C., Wilhelm, R.: Efficient and precise cache behavior prediction for
real-time systems. Real-Time Systems 17(2–3), 131–181 (Dec 1999)
10. Lundqvist, T., Stenstro¨m, P.: Timing anomalies in dynamically scheduled micro-
processors. In: 20th IEEE Real-Time Systems Symposium (RTSS) (1999)
11. Lv, M., Guan, N., Reineke, J., Wilhelm, R., Yi, W.: A survey on static cache analysis
for real-time systems. Leibniz Transactions on Embedded Systems 3(1), 05–1–05:48
(2016), http://ojs.dagstuhl.de/index.php/lites/article/view/LITES-v003-i001-a005
12. Lv, M., Yi, W., Guan, N., Yu, G.: Combining abstract interpretation with model
checking for timing analysis of multicore software. In: Proceedings of the 31st
IEEE Real-Time Systems Symposium, RTSS 2010, San Diego, California, USA,
November 30 - December 3, 2010. pp. 339–349. IEEE Computer Society (2010),
http://dx.doi.org/10.1109/RTSS.2010.30
13. Metta, R., Becker, M., Bokil, P., Chakraborty, S., Venkatesh, R.: TIC: a scalable
model checking based approach to WCET estimation. In: Kuo, T., Whalley, D.B.
(eds.) Proceedings of the 17th ACM SIGPLAN/SIGBED Conference on Languages,
Compilers, Tools, and Theory for Embedded Systems, LCTES 2016, Santa Barbara,
CA, USA, June 13 - 14, 2016. pp. 72–81. ACM (2016), http://doi.acm.org/10.1145/
2907950.2907961
14. Monniaux, D., Gonnord, L.: Using bounded model checking to focus fixpoint
iterations. In: Yahav, E. (ed.) Static analysis (SAS). LNCS, vol. 6887, pp. 369–385.
Springer (2011)
15. Mowery, K., Keelveedhi, S., Shacham, H.: Are AES x86 cache timing attacks still
feasible? In: Cloud Computing Security Workshop. pp. 19–24. ACM, New York,
NY, USA (2012)
16. Reineke, J.: Caches in WCET analysis: predictability, competitiveness, sensitivity.
Ph.D. thesis, Universita¨t des Saarlandes (2008)
17. Reineke, J., et al.: A definition and classification of timing anomalies. In: 6th
International Workshop on Worst-Case Execution Time Analysis (WCET) (July
2006)
18. Rival, X., Mauborgne, L.: The trace partitioning abstract domain. ACM Transac-
tions on Programming Languages and Systems (TOPLAS) 29(5) (2007)
19. Touzeau, V., Maiza, C., Monniaux, D., Reineke, J.: Ascertaining uncertainty for
efficient exact cache analysis. Tech. Rep. TR-2017-2, VERIMAG (2017)
20. Wilhelm, R., Engblom, J., Ermedahl, A., Holsti, N., Thesing, S., Whalley, D.B.,
Bernat, G., Ferdinand, C., Heckmann, R., Mitra, T., Mueller, F., Puaut, I., Puschner,
P.P., Staschulat, J., Stenstro¨m, P.: The worst-case execution-time problem - overview
of methods and survey of tools. ACM Trans. Embedded Comput. Syst. 7(3) (2008)
21. Wulf, W.A., McKee, S.A.: Hitting the memory wall: Implications of the obvious.
SIGARCH Comput. Archit. News 23(1), 20–24 (Mar 1995), http://doi.acm.org/10.
1145/216585.216588
A Proofs for the Definitely-Unknown Abstract
Interpretation
As mention in the main part of this report, the soundness of the EH analysis
comes from the consistency of the abstract transformers with respect to the
concrete transformers. Here are the detailled proofs for the EH analysis.
Lemma 1 (Local Consistency). The abstract transformer updateEH soundly
approximates its concrete counterpart updateC :
∀(qˆ, qˆMust) ∈ AEH ,∀b ∈M,∀Q ∈ γEH (qˆ, qˆMust) :
updateC(Q, b) ∈ γEH (updateEH ((qˆ, qˆMust), b)). (6)
Proof. Consistency of EH Analysis
Let (qˆ0, qˆMust0) ∈ AEH and b ∈M . We use the additional notations:
– (qˆ1, qˆMust1) = updateEH ((qˆ0, qˆMust0), b)
– Q1 = γEH (qˆ1, qˆMust1)
– Q0 = γEH (qˆ0, qˆMust0)
– Q2 = {updateC(Q, b) | Q ∈ Q0}
We want to prove that Q2 ⊆ Q1.
Let Q˜2 ∈ Q2 and Q˜0 ∈ Q0 such that Q˜2 = updateC(Q˜0, b).
Then:
∀q˜0 ∈ Q˜0, update(q˜0, b) ∈ {update(q, b) | q ∈ γMust(qˆMust0)}
Thus, the consistency of the must analysis gives us:
∀q˜0 ∈ Q˜0, update(q˜0, b) ⊆ γMust({updateaMust(qˆMust0, b)}) = γMust(qˆMust1) (15)
Then: Q˜2 ⊆ γMust(qˆMust1).
To complete the proof that Q2 ⊆ Q1, it remains to prove that:
∀b′ ∈M, ∃q ∈ Q˜2, such that: q(b′) ≤ qˆ1(b′)
Let b′ ∈M . We have:
qˆ1(b
′) =

0 if b = b′
qˆ0(b
′) if qˆMust0(b) ≤ qˆ0(b′)
qˆ0(b
′) + 1 if qˆMust0(b) > qˆ0(b′) ∧ qˆ0(b′) < k
k if qˆMust0(b) > qˆ0(b
′) ∧ qˆ0(b′) = k
Let q˜0 ∈ Q˜0 such that q˜0(b′) ≤ qˆ0(b′). Let q˜2 = update(q˜0, b) ∈ Q˜2. We show
that q˜2 is a good candidate, i.e. q˜2(b
′) ≤ qˆ1(b′).
q˜2(b
′) =

0 if b = b′
q˜0(b
′) if q˜0(b′) ≥ q˜0(b)
q˜0(b
′) + 1 if q˜0(b′) < q˜0(b) ∧ q˜0(b′) < k
k if q˜0(b
′) < q˜0(b) ∧ q˜0(b′) = k
– If b = b′: q˜2(b′) = 0 = qˆ1(b′).
– If b 6= b′ ∧ qˆMust0(b) ≤ qˆ0(b′), then qˆ1(b′) = qˆ0(b′).
• If q˜0(b′) ≥ qˆMust0(b), then: q˜0(b′) ≥ q˜0(b).
Thus: q˜2(b
′) = q˜0(b′).
Finally: qˆ1(b
′) = qˆ0(b′) ≥ q˜0(b′) = q˜2(b′)
• Otherwise, q˜0(b′) < qˆMust0(b) and thus: q˜0(b′) < qˆ0(b′).
∗ if q˜0(b′) ≥ q˜0(b), then:
q˜2(b
′) = q˜0(b′) < qˆ0(b′) ≤ qˆ1(b′)
∗ if q˜0(b′) < q˜0(b), then:
q˜0(b
′) < k and thus: q˜2(b′) = q˜0(b′) + 1 ≤ qˆ0(b′) ≤ qˆ1(b′)
– If b 6= b′ ∧ qˆMust0(b) > qˆ0(b′) ∧ qˆ0(b′) < k, then qˆ1(b′) = qˆ0(b′) + 1. Moreover,
q˜0(b
′) ≤ qˆ0(b′) < k.
Thus: q˜2(b
′) ≤ q˜0(b′) + 1 ≤ qˆ0(b′) + 1 = qˆ1(b′)
– Otherwise, b 6= b′ ∧ qˆMust0(b) > qˆ0(b′) ∧ qˆ0(b′) = k. Then: qˆ1(b′) = l and
trivially: q˜2(b
′) ≤ qˆ1(b′)
In every case, we have: q˜2(b
′) ≤ qˆ1(b′).
Thus, Q˜2 ∈ Q1, proving that Q2 ⊆ Q1 uunionsq
Lemma 2 (Join Consistency). The join operator unionsqEH is correct:
∀((qˆ1, qˆM1), (qˆ2, qˆM2)) ∈ A2EH , Q1 ∈ γEH (qˆ1, qˆM1), Q2 ∈ γEH (qˆ2, qˆM2) :
Q1 ∪Q2 ∈ γEH ((qˆ1, qˆM1) unionsqEH (qˆ2, qˆM2)). (8)
Proof. Let ((qˆ1, qˆM1), (qˆ2, qˆM2)) ∈ A2EH , Q1 ∈ γEH (qˆ1, qˆM1), Q2 ∈ γEH (qˆ2, qˆM2).
We use the additional notation:
– Q3 = Q1 ∪Q2
– (qˆ3, qˆM3) = (qˆ1, qˆM1) unionsqEH (qˆ2, qˆM2)
We want to prove that: Q3 ∈ γEH (qˆ3, qˆM3).
Let b ∈ M , min
q∈Q3
q(b) ≤ min
q∈Q1
q(b) ≤ qˆ1(b). Similarly, min
q∈Q3
q(b) ≤ qˆ2(b). Thus,
min
q∈Q3
q(b) ≤ qˆ3(b).
Then, using the consistency of the must join, we have: Q3 ∈ γEH (qˆ3, qˆM3) uunionsq
Theorem 2 (Analysis Soundness). The abstract EH semantics includes the
collecting semantics: ∀v ∈ V : RC(v) ∈ γEH (REH (v)).
Proof. Both the collecting semantics and the abstract EH semantics are de-
fined as least solutions to sets of equations, i.e., least fixed points of functions
corresponding to these equations. The two domains are both finite for a given
program, as the number of memory blocks is finite. Thus, both domains have
finite ascending chains, and so the least fixed points can be obtained in a finite
number of Kleene iterations.
Let RCi : V → P(C) and REH ,i : V → AEH denote the values reached in the
ith Kleene iteration:
∀v′ ∈ V : RCi+1(v′) = RC0 (v′) ∪
⋃
(v,b,v′)∈E
updateC(RCi (v), b), (16)
∀v′ ∈ V : REH ,i+1(v′) = REH ,0(v′) unionsq
⊔
(v,b,v′)∈E
updateEH (REH ,i(v), b). (17)
We will prove by induction that for all i ∈ N, we have
∀v′ ∈ V : RCi (v′) ∈ γEH (REH ,i(v′)).
This then implies the theorem, as due to finite ascending chains, there is a j ∈ N,
such that the least solutions RC and REH are R
C
j and REH ,j .
Induction base (i = 0): This follows immediately from to the assumption that
RC0 (v) ∈ γEH (REH ,0(v)).
Induction step (i→ i+ 1): Let v′ ∈ V be arbitrary. By induction hypothesis, we
have RCi (v) ∈ γEH (REH ,i(v)) for all v ∈ V s.t. (v, b, v′) ∈ E. By Lemma 1 (local
consistency) this implies updateC(RCi (v), b) ∈ γEH (updateEH (REH ,i(v), b)) for
all b ∈M and v ∈ V s.t. (v, b, v′) ∈ E. Applying Lemma 2 (join consistency) this
in turn implies:⋃
(v,b,v′)∈E update
C(RCi (v), b) ∈ γEH (
⊔
(v,b,v′)∈E) updateEH (REH ,i(v), b)).
Applying Lemma 2 again, as by assumption RC0 (v
′) ∈ γEH (REH ,0(v′)), yields:
RC0 (v
′) ∪
⋃
(v,b,v′)∈E
updateC(RCi (v), b)
∈ γEH (REH ,0(v′) unionsq
⊔
(v,b,v′)∈E
updateEH (REH ,i(v), b)), (18)
which, by (16) and (17), is equivalent to RCi+1(v
′) ∈ γEH (REH ,i+1(v′)). uunionsq
In the case of the EM analysis, we are computing a safe lower bound on the
maximal age of a block, thus the concretization is:
γEM : AEM → P(P(C))
(qˆ, qˆMay) 7→
{
Q ⊆ γMay(qˆMay) | ∀b ∈M : max
q∈Q
q(b) ≥ qˆ(b)} (19)
Similarly to updateEH , the abstract transformer updateEM ((qˆEM , qˆMay), b)
corresponding to an access to block b is defined by the pair (qˆ′EM , updateMay(qˆMay , b)),
where
qˆ′EM = λb
′.

0 if b′ = b
qˆ(b′) if qˆMay(b) < qˆ(b′)
qˆ(b′) + 1 if qˆMay(b) ≥ qˆ(b′) ∧ qˆ(b′) < k
k if qˆMay(b) ≥ qˆ(b′) ∧ qˆ(b′) = k
(20)
When joining two analysis states during the EM analysis, it is safe to take
the maximum of the lower bounds on maximal ages. Indeed, as mentioned for the
EH analysis, the union of reachable states over all incoming paths is reachable.
(qˆ1, qˆMay1) unionsqEM (qˆ2, qˆMay2) = (λb.max(qˆ1(b), qˆ2(b)), qˆMay1 unionsqMay qˆMay2) (21)
The proof of EM analysis soundness is similar to the proof of EH analysis
soundness and can be obtained by substituting the name of the transformers and
the join. Thus, in the following, we only prove the consistency of the transformers
and the join.
Lemma 4 (Local Consistency). The abstract transformer updateEM soundly
approximates its concrete counterpart updateC :
∀(qˆ, qˆMay) ∈ AEM ,∀b ∈M,∀Q ∈ γEM (qˆ, qˆMay) :
updateC(Q, b) ∈ γEM (updateEM ((qˆ, qˆMay), b)). (22)
Proof. Consistency of EM Analysis
Let (qˆ0, qˆMay0) ∈ AEM and b ∈M . We use the additional notations:
– (qˆ1, qˆMay1) = updateEM ((qˆ0, qˆMay0), b)
– Q1 = γEM (qˆ1, qˆMay1)
– Q0 = γEM (qˆ0, qˆMay0)
– Q2 = {updateC(Q, b) | Q ∈ Q0}
We want to prove that Q2 ⊆ Q1.
Let Q˜2 ∈ Q2 and Q˜0 ∈ Q0 such that Q˜2 = updateC(Q˜0, b).
Then:
∀q˜0 ∈ Q˜0, update(q˜0, b) ∈ {update(q, b) | q ∈ γMay(qˆMay0)}
Thus, the consistency of the may analysis gives us:
∀q˜0 ∈ Q˜0, update(q˜0, b) ⊆ γMay({updateaMay(qˆMay0, b)}) = γMay(qˆMay1) (23)
Then: Q˜2 ⊆ γMay(qˆMay1).
To complete the proof that Q2 ⊆ Q1, it remains to prove that:
∀b′ ∈M,∃q ∈ Q˜2, such that: q(b′) ≤ qˆ1(b′)
Let b′ ∈M . We have:
qˆ1(b
′) =

0 if b = b′
qˆ0(b
′) if qˆMay0(b) < qˆ0(b′)
qˆ0(b
′) + 1 if qˆMay0(b) ≥ qˆ0(b′) ∧ qˆ0(b′) < k
k if qˆMay0(b) ≥ qˆ0(b′) ∧ qˆ0(b′) = k
Let q˜0 ∈ Q˜0 such that q˜0(b′) ≥ qˆ0(b′). Let q˜2 = update(q˜0, b) ∈ Q˜2. We show
that q˜2 is a good candidate, i.e. q˜2(b
′) ≥ qˆ1(b′).
q˜2(b
′) =

0 if b = b′
q˜0(b
′) if q˜0(b′) ≥ q˜0(b)
q˜0(b
′) + 1 if q˜0(b′) < q˜0(b) ∧ q˜0(b′) < k
k if q˜0(b
′) < q˜0(b) ∧ q˜0(b′) = k
– If b = b′: q˜2(b′) = 0 = qˆ1(b′).
– If b 6= b′ ∧ qˆMay0(b) < qˆ0(b′), then qˆ1(b′) = qˆ0(b′).
Moreover, b 6= b′ ⇒ q˜2(b′) ≥ q˜0(b′) ≥ qˆ0(b′) = qˆ1(b′).
– If b 6= b′ ∧ qˆMay0(b) ≥ qˆ0(b′) ∧ qˆ0(b′) < k, then qˆ1(b′) = qˆ0(b′) + 1.
• If q˜0(b′) ≤ qˆMay0(b), then: q˜0(b′) ≤ q˜0(b).
Moreover, b 6= b′ ⇒ q˜0(b′) < q˜0(b), and thus: q˜2(b′) ≥ qˆ0(b′) + 1 = qˆ1(b′).
• Otherwise, q˜0(b′) > qˆMay0(b) and thus: q˜0(b′) > qˆ0(b′).
∗ if q˜0(b′) ≥ q˜0(b), then:
q˜2(b
′) = q˜0(b′) ≥ qˆ0(b′) + 1 = qˆ1(b′)
∗ if q˜0(b′) < q˜0(b) ∧ q˜0(b′) < k, then:
q˜2(b
′) = q˜0(b′) + 1 > qˆ0(b′) + 1 = qˆ1(b′)
∗ if q˜0(b′) < q˜0(b) ∧ q˜0(b′) = k, then:
q˜2(b
′) = k ≥ qˆ1(b′)
– If b 6= b′ ∧ qˆMay0(b) ≥ qˆ0(b′) ∧ qˆ0(b′) = k, then qˆ1(b′) = k.
Thus: b 6= b′ ⇒ q˜2(b′) ≥ q˜0(b′) ≥ qˆ0(b′) = k ≥ qˆ1(b′)
In every case, we have: q˜2(b
′) ≥ qˆ1(b′).
Thus, Q˜2 ∈ Q1, proving that Q2 ⊆ Q1 uunionsq
Lemma 5 (Join Consistency). The join operator unionsqEM is correct:
∀((qˆ1, qˆM1), (qˆ2, qˆM2)) ∈ A2EM , Q1 ∈ γEM (qˆ1, qˆM1), Q2 ∈ γEM (qˆ2, qˆM2) :
Q1 ∪Q2 ∈ γEM ((qˆ1, qˆM1) unionsqEM (qˆ2, qˆM2)). (24)
Proof. Let ((qˆ1, qˆM1), (qˆ2, qˆM2)) ∈ A2EM , Q1 ∈ γEM (qˆ1, qˆM1), Q2 ∈ γEM (qˆ2, qˆM2).
We use the additional notation:
– Q3 = Q1 ∪Q2
– (qˆ3, qˆM3) = (qˆ1, qˆM1) unionsqEM (qˆ2, qˆM2)
We want to prove that: Q3 ∈ γEM (qˆ3, qˆM3).
Let b ∈ M , max
q∈Q3
q(b) ≥ max
q∈Q1
q(b) ≥ qˆ1(b). Similarly, max
q∈Q3
q(b) ≥ qˆ2(b). Thus,
max
q∈Q3
q(b) ≥ qˆ3(b).
Then, using the consistency of the may join, we have: Q3 ∈ γEM (qˆ3, qˆM3) uunionsq
B Proofs for Reduced Models in Model Checking
Lemma 3 (Local Soundness and Completeness). The focused cache update
abstracts the concrete cache update exactly:
∀q ∈ C,∀b ∈M : α(update(q, b)) = update(α(q), b). (12)
Proof. Let q = [b1, ..., bk] ∈ C a reachable cache state and b ∈ M a memory
block.
We prove consistency by inspection of different possible cases. First half
of the proof deals with cache states that do not contain the interesting block
a (∀i ∈ J1, kK, bi 6= a). Second half deals with cache states that contain it
(∃i ∈ J1, kK, bi = a). In both part, we treat the cases where the block accessed
b is a or not. Moreover, to treat a eviction, the second half of the proof adds
sub-cases for distinction of states containing a “at the end” of the cache (near
eviction).
– if ∀i ∈ J1, kK, bi 6= a (i.e. a is not in q), then we can distinguish two cases:
• if b = a then:
α(update(q, b)) = α(update([
6=a
b1 ,
6=a
b2 , ...,
6=a
bk ], a))
= α([a, b1, ..., bk−1])
= {}
update(α(q), b) = update(α([
6=a
b1 ,
6=a
b2 , ...,
6=a
bk ]), a)
= update(ε, a)
= {}
So consistency holds.
• if b 6= a then:
α(update(q, b)) = α(update([
6=a
b1 ,
6=a
b2 , ...,
6=a
bk ],
6=a
b ))
= α([
6=a
b′1 ,
6=a
b′2 , ...,
6=a
b′k ]) where ∀i ∈ J1, kK, b′i ∈ {b1, ..., bk, b}
= ε
update(α(q), b) = update(α([
6=a
b1 ,
6=a
b2 , ...,
6=a
bk ]), b)
= update(ε,
6=a
b )
= ε
So property consistency holds.
– if ∃i ∈ J1, kK such that bi = a (a is in the cache), we also distinguish between
the cases b = a and b 6= a:
• if b = a then:
α(update(q, b)) = α(update([
6=a
b1 , ...,
6=a
bi−1, a, bi+1, ..., bk], a))
= α([a, b1, ..., bi−1, bi+1, ..., bk])
= {}
update(α(q), b) = update(α([
6=a
b1 , ...,
6=a
bi−1, a, bi+1, ..., bk]), a)
= update({b1, ..., bi−1}, a)
= {}
So consistency holds.
• if b 6= a, there is different cases depending whether b is in the cache before
or after a and depending if a is the least recently used block:
∗ if there exists j < i such that bj = b (b is in the cache and is younger
than a):
α(update(q, b))
= α(update([b1, ..., bj−1, b, bj+1, ..., bi−1, a, bi+1, ..., bk], b))
= α([b, b1, ..., bj−1, bj+1, ..., bi−1, a, bi+1, ..., bk])
= {b, b1, ..., bj−1, bj+1, ..., bi−1}
= {b1, ..., bi−1}
update(α(q), b)
= update(α([b1, ..., bj−1, b, bj+1, ..., bi−1, a, bi+1, ..., bk]), b)
= update({b1, ..., bj−1, b, bj+1, ..., bi−1}), b)
= update({b1, ..., bi−1}, b)
= {b1, ..., bi−1}
So consistency holds.
∗ if there exists j > i such that bj = b (b is in the cache and is older
than a):
α(update(q, b))
= α(update([b1, ..., bi−1, a, bi+1, ..., bj−1, b, bj+1, ..., bk], b))
= α([b, b1, ..., bi−1, a, bi+1, ..., bj−1, bj+1, ..., bk], b)
= {b, b1, ..., bi−1}
update(α(q), b)
= update(α([b1, ..., bi−1, a, bi+1, ..., bj−1, b, bj+1, ..., bk]), b)
= update({b1, ..., bi−1}, b)
= {b, b1, ...bi−1}
So consistency holds.
∗ if ∀j, bj 6= b and i 6= k (i.e. b is not in the cache and a is not the least
recently used block):
α(update(q, b)) = α(update([b1, ..., bi−1, a, bi+1, ..., bk], b))
= α([b, b1, ..., bi−1, a, bi+1, ..., bk−1])
= {b, b1, ..., bi−1}
update(α(q), b) = update(α([b1, ..., bi−1, a, bi+1, ..., bk]), b)
= update({b1, ..., bi−1, a}, b)
= {b, b1, ..., bi−1}
So consistency holds.
∗ if ∀j, bj 6= b and i = k (i.e. b is not in the cache and a is the least
recently used block):
α(update(q, b)) = α(update([b1, ..., bk−1, a], b))
= α([b, b1, ..., bk−1])
= ε
update(α(q), b) = update(α([b1, ..., bk−1, a]), b)
= update({b1, ..., bk−1}, b)
= ε
So consistency holds.
In all cases, consistency holds.
C Additional Results from Experimental Evaluation
In this section, we give further experimental results and more details on the
results found in the main part of the paper. Please note that for all experiments
any missing plot means that a timeout of one hour has been reached.
Figures 10, 11, 12 show the number of accesses whose classification was refined
to hit or miss by model checking under different cache configurations: varying
the memory block size in Figure 12, the number of ways in Figure 10 and the
number of cache sets in Figure 11. Figures 13, 14, 15 show the share of the total
number of accesses whose classification was refined by model checking for each
benchmark.
Figures 16 and 17 show the usefulness of the definitely-unknown analysis in
terms of the number of MC calls and the cumulative MC execution time.
Figure 18 shows the effect of the AI phase on the number of MC calls.
Figure 19 shows the size of the MC model with/without AI phase. We observe
that the maximum is quite close with and without the AI phase. This influences
the average: as there are fewer MC calls with AI phase, the average may thus be
larger.
l
l
l l
l
l
l l
l
l
ll
l
l
l
lll
l ll
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
1
10
100
fa
c
re
cu
rs
io
n
bi
na
ry
se
ar
ch
pr
im
e
in
se
rts
or
t
bs
or
t
du
ff
co
u
n
tn
eg
at
ive
bi
to
ni
c
iir
m
a
tri
x1
co
m
pl
ex
_
u
pd
at
es
cr
c
lm
s
filt
er
ba
nk
cjp
eg
_w
rbm
p
di
jks
tra
st
jfd
cti
nt
fir
2d
im
fft
bi
tc
ou
nt
lu
dc
m
p
hu
ff_
de
c
m
in
ve
r
n
de
s
a
dp
cm
_d
ec
a
dp
cm
_e
nc
sh
a
co
ve
r
hu
ff_
en
c
a
n
a
gr
a
m
pe
tri
ne
t
fm
re
f
e
pi
c
st
at
em
at
e
g7
23
_e
nc
m
d5
cjp
eg
_tr
a
n
su
pp
pm
po
w
e
rw
in
do
w
qu
ick
so
rt
ba
sic
m
at
h
h2
64
_d
ec
gs
m
_d
ec
m
pe
g2
gs
m
_e
nc
od
e
a
m
m
u
n
iti
on
Program
N
um
be
r o
f a
cc
es
s 
re
fin
ed
 b
y 
M
C
Ways: l 4 ways 8 ways 16 ways
Fig. 10. Number of accesses whose classification was refined to hit or miss by model
checking depending on the number of ways with 16 sets and 4 instructions per blocks.
l
l
l
l
l
l
l
l
l
l
l
l
l ll
l
l
ll ll
l
l ll l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
1
10
100
fa
c
re
cu
rs
io
n
bi
na
ry
se
ar
ch
pr
im
e
in
se
rts
or
t
bs
or
t
du
ff
co
u
n
tn
eg
at
ive
bi
to
ni
c
iir
m
a
tri
x1
co
m
pl
ex
_
u
pd
at
es
cr
c
lm
s
filt
er
ba
nk
cjp
eg
_w
rbm
p
di
jks
tra
st
jfd
cti
nt
fir
2d
im
fft
bi
tc
ou
nt
lu
dc
m
p
hu
ff_
de
c
m
in
ve
r
n
de
s
a
dp
cm
_d
ec
a
dp
cm
_e
nc
sh
a
co
ve
r
hu
ff_
en
c
a
n
a
gr
a
m
pe
tri
ne
t
fm
re
f
e
pi
c
st
at
em
at
e
g7
23
_e
nc
m
d5
cjp
eg
_tr
a
n
su
pp
pm
po
w
e
rw
in
do
w
qu
ick
so
rt
ba
sic
m
at
h
h2
64
_d
ec
gs
m
_d
ec
m
pe
g2
gs
m
_e
nc
od
e
a
m
m
u
n
iti
on
te
st
3
Program
N
um
be
r o
f a
cc
es
se
s 
re
fin
ed
 b
y 
M
C
Sets: l 1 set 8 sets 16 sets 32 sets 64 sets
Fig. 11. Number of accesses whose classification was refined to hit or miss by model
checking depending on the number of sets with 8 ways and 4 instructions per blocks.
ll
l l
l
l
l
l
l
l
ll
l ll
lll
l ll ll
ll
ll l
l
l
l
l
l
l
l l
l
ll
l
l
l
l
l
l
l1
10
100
fa
c
re
cu
rs
io
n
bi
na
ry
se
ar
ch
pr
im
e
in
se
rts
or
t
bs
or
t
du
ff
co
u
n
tn
eg
at
ive
iir
bi
to
ni
c
m
a
tri
x1
co
m
pl
ex
_
u
pd
at
es
cr
c
lm
s
filt
er
ba
nk
di
jks
tra
jfd
cti
nt
cjp
eg
_w
rbm
p
st
fir
2d
im
bi
tc
ou
nt
fft
lu
dc
m
p
hu
ff_
de
c
m
in
ve
r
n
de
s
a
dp
cm
_d
ec
a
dp
cm
_e
nc
sh
a
co
ve
r
hu
ff_
en
c
a
n
a
gr
a
m
pe
tri
ne
t
fm
re
f
m
d5
e
pi
c
g7
23
_e
nc
st
at
em
at
e
pm
cjp
eg
_tr
a
n
su
pp
po
w
e
rw
in
do
w
qu
ick
so
rt
ba
sic
m
at
h
h2
64
_d
ec
gs
m
_d
ec
gs
m
_e
nc
od
e
Program
N
um
be
r o
f a
cc
es
se
s 
re
fin
ed
 b
y 
M
C
Block size: l 4 instructions 8 instructions
Fig. 12. Number of accesses whose classification was refined to hit or miss by model
checking depending on the number of instructions per blocks with 8 ways and 16 sets.
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
ll ll
l
l ll l l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l l
l
l
l0%
5%
10%
15%
fa
c
re
cu
rs
io
n
bi
na
ry
se
ar
ch
pr
im
e
in
se
rts
or
t
bs
or
t
du
ff
co
u
n
tn
eg
at
ive
bi
to
ni
c
iir
m
a
tri
x1
co
m
pl
ex
_
u
pd
at
es
cr
c
lm
s
filt
er
ba
nk
cjp
eg
_w
rbm
p
di
jks
tra
st
jfd
cti
nt
fir
2d
im
fft
bi
tc
ou
nt
lu
dc
m
p
hu
ff_
de
c
m
in
ve
r
n
de
s
a
dp
cm
_d
ec
a
dp
cm
_e
nc
sh
a
co
ve
r
hu
ff_
en
c
a
n
a
gr
a
m
pe
tri
ne
t
fm
re
f
e
pi
c
st
at
em
at
e
g7
23
_e
nc
m
d5
cjp
eg
_tr
a
n
su
pp
pm
po
w
e
rw
in
do
w
qu
ick
so
rt
ba
sic
m
at
h
h2
64
_d
ec
gs
m
_d
ec
m
pe
g2
gs
m
_e
nc
od
e
a
m
m
u
n
iti
on
te
st
3
Program
Sh
ar
e 
of
 a
cc
es
se
s 
re
fin
ed
	 
 
 
 
 
by
 M
C 
(%
 of
 to
tal
 ac
ce
ss
)
Sets: l 1 set 8 sets 16 sets 32 sets 64 sets
Fig. 13. Percentage of accesses whose classification was refined to hit or miss by model
checking depending on the number of sets with 8 ways and 4 instructions per blocks.
ll
l l
l
ll
l ll
l
l
l
l
l
ll
l
l ll
l
l
l
l l
l l
l
ll
l
l
l
l
l
l l
l
l
l
l
l
l
l
l l
l
0%
5%
10%
15%
fa
c
re
cu
rs
io
n
bi
na
ry
se
ar
ch
pr
im
e
in
se
rts
or
t
bs
or
t
du
ff
co
u
n
tn
eg
at
ive
bi
to
ni
c
iir
m
a
tri
x1
co
m
pl
ex
_
u
pd
at
es
cr
c
lm
s
filt
er
ba
nk
cjp
eg
_w
rbm
p
di
jks
tra
st
jfd
cti
nt
fir
2d
im
fft
bi
tc
ou
nt
lu
dc
m
p
hu
ff_
de
c
m
in
ve
r
n
de
s
a
dp
cm
_d
ec
a
dp
cm
_e
nc
sh
a
co
ve
r
hu
ff_
en
c
a
n
a
gr
a
m
pe
tri
ne
t
fm
re
f
e
pi
c
st
at
em
at
e
g7
23
_e
nc
m
d5
cjp
eg
_tr
a
n
su
pp
pm
po
w
e
rw
in
do
w
qu
ick
so
rt
ba
sic
m
at
h
h2
64
_d
ec
gs
m
_d
ec
m
pe
g2
gs
m
_e
nc
od
e
a
m
m
u
n
iti
on
Program
Sh
ar
e 
of
 a
cc
es
se
s 
re
fin
ed
	 
 
 
 
 
by
 M
C 
(%
 of
 to
tal
 ac
ce
ss
)
Ways: l 4 ways 8 ways 16 ways
Fig. 14. Percentage of accesses whose classification was refined to hit or miss by model
checking depending on the number of ways with 16 sets and 4 instructions per blocks.
l
l
l l
l
l
l
l
l
l
l
ll
l
l
ll
l
l ll ll
l
l
ll l l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
0%
2%
4%
6%
fa
c
re
cu
rs
io
n
bi
na
ry
se
ar
ch
pr
im
e
in
se
rts
or
t
bs
or
t
du
ff
co
u
n
tn
eg
at
ive
iir
bi
to
ni
c
m
a
tri
x1
co
m
pl
ex
_
u
pd
at
es
cr
c
lm
s
filt
er
ba
nk
di
jks
tra
jfd
cti
nt
cjp
eg
_w
rbm
p
st
fir
2d
im
bi
tc
ou
nt
fft
lu
dc
m
p
hu
ff_
de
c
m
in
ve
r
n
de
s
a
dp
cm
_d
ec
a
dp
cm
_e
nc
sh
a
co
ve
r
hu
ff_
en
c
a
n
a
gr
a
m
pe
tri
ne
t
fm
re
f
m
d5
e
pi
c
g7
23
_e
nc
st
at
em
at
e
pm
cjp
eg
_tr
a
n
su
pp
po
w
e
rw
in
do
w
qu
ick
so
rt
ba
sic
m
at
h
h2
64
_d
ec
gs
m
_d
ec
gs
m
_e
nc
od
e
Program
Sh
ar
e 
of
 a
cc
es
se
s 
re
fin
ed
	 
 
 
 
 
by
 M
C 
(%
 of
 to
tal
 ac
ce
ss
)
Block size: l 4 instructions 8 instructions
Fig. 15. Percentage of accesses whose classification was refined to hit or miss by model
checking depending on the number of instructions per blocks with 8 ways and 16 sets.
l l
l
l
ll
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l ll
l
l
l
l
l
ll
l
10
1000
fa
c
re
cu
rs
io
n
bi
na
ry
se
ar
ch
pr
im
e
in
se
rts
or
t
bs
or
t
du
ff
co
u
n
tn
eg
at
ive
bi
to
ni
c
iir
m
a
tri
x1
co
m
pl
ex
_
u
pd
at
es
cr
c
lm
s
filt
er
ba
nk
cjp
eg
_w
rbm
p
di
jks
tra
st
jfd
cti
nt
fir
2d
im
fft
bi
tc
ou
nt
lu
dc
m
p
hu
ff_
de
c
m
in
ve
r
n
de
s
a
dp
cm
_d
ec
a
dp
cm
_e
nc
sh
a
co
ve
r
hu
ff_
en
c
a
n
a
gr
a
m
pe
tri
ne
t
fm
re
f
e
pi
c
st
at
em
at
e
g7
23
_e
nc
m
d5
cjp
eg
_tr
a
n
su
pp
pm
po
w
e
rw
in
do
w
qu
ick
so
rt
ba
sic
m
at
h
h2
64
_d
ec
gs
m
_d
ec
gs
m
_e
nc
od
e
Program
#M
C 
ca
lls
Definitely Unknown Analysis lwith without
Fig. 16. Number of calls to the MC with and without the definitely-unknown analysis.
l
l
l l
ll
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
1
100
fa
c
re
cu
rs
io
n
bi
na
ry
se
ar
ch
pr
im
e
in
se
rts
or
t
bs
or
t
du
ff
co
u
n
tn
eg
at
ive
bi
to
ni
c
iir
m
a
tri
x1
co
m
pl
ex
_
u
pd
at
es
cr
c
lm
s
filt
er
ba
nk
cjp
eg
_w
rbm
p
di
jks
tra
st
jfd
cti
nt
fir
2d
im
fft
bi
tc
ou
nt
lu
dc
m
p
hu
ff_
de
c
m
in
ve
r
n
de
s
a
dp
cm
_d
ec
a
dp
cm
_e
nc
sh
a
co
ve
r
hu
ff_
en
c
a
n
a
gr
a
m
pe
tri
ne
t
fm
re
f
e
pi
c
st
at
em
at
e
g7
23
_e
nc
m
d5
cjp
eg
_tr
a
n
su
pp
pm
po
w
e
rw
in
do
w
qu
ick
so
rt
ba
sic
m
at
h
h2
64
_d
ec
gs
m
_d
ec
gs
m
_e
nc
od
e
Program
Ex
e
cu
tio
n 
tim
e 
of
 M
C 
(s)
Definitely Unknown Analysis lwith without
Fig. 17. Total MC time with and without the definitely-unknown analysis.
l l
l l
ll
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
0
250
500
750
1000
fa
c
re
cu
rs
io
n
bi
na
ry
se
ar
ch
pr
im
e
in
se
rts
or
t
bs
or
t
du
ff
co
u
n
tn
eg
at
ive
bi
to
ni
c
iir
m
a
tri
x1
co
m
pl
ex
_
u
pd
at
es
cr
c
lm
s
filt
er
ba
nk
cjp
eg
_w
rbm
p
di
jks
tra
st
jfd
cti
nt
fir
2d
im
fft
bi
tc
ou
nt
lu
dc
m
p
hu
ff_
de
c
m
in
ve
r
n
de
s
a
dp
cm
_d
ec
a
dp
cm
_e
nc
sh
a
co
ve
r
hu
ff_
en
c
a
n
a
gr
a
m
pe
tri
ne
t
fm
re
f
e
pi
c
st
at
em
at
e
g7
23
_e
nc
m
d5
cjp
eg
_tr
a
n
su
pp
pm
po
w
e
rw
in
do
w
qu
ick
so
rt
ba
sic
m
at
h
h2
64
_d
ec
gs
m
_d
ec
gs
m
_e
nc
od
e
Program
#M
C 
ca
lls
AI phase lwith without
Fig. 18. Calls to the MC with and without AI phase (must/may + definitely unknown).
1e+01
1e+03
1e+05
fa
c
re
cu
rs
io
n
bi
na
ry
se
ar
ch
pr
im
e
in
se
rts
or
t
bs
or
t
du
ff
co
u
n
tn
eg
at
ive
bi
to
ni
c
iir
m
a
tri
x1
co
m
pl
ex
_
u
pd
at
es
cr
c
lm
s
filt
er
ba
nk
cjp
eg
_w
rbm
p
di
jks
tra
st
jfd
cti
nt
fir
2d
im
fft
bi
tc
ou
nt
lu
dc
m
p
hu
ff_
de
c
m
in
ve
r
n
de
s
a
dp
cm
_d
ec
a
dp
cm
_e
nc
sh
a
co
ve
r
hu
ff_
en
c
a
n
a
gr
a
m
pe
tri
ne
t
fm
re
f
e
pi
c
st
at
em
at
e
g7
23
_e
nc
m
d5
cjp
eg
_tr
a
n
su
pp
pm
po
w
e
rw
in
do
w
qu
ick
so
rt
ba
sic
m
at
h
h2
64
_d
ec
gs
m
_d
ec
gs
m
_e
nc
od
e
Program
M
C 
m
od
el
 s
ize
 (in
 bd
d n
od
es
)
Mean: with AI phase without AI phase Min−Max: with AI phase without AI phase
Fig. 19. Size of the MC model with and without the AI phase (must/may + definitely
unknown).
