Impact of DM-LRU on WCET: a static analysis approach by Mancuso, Renato et al.
Boston University
OpenBU http://open.bu.edu
Computer Science BU Open Access Articles
2019-07-09
Impact of DM-LRU on WCET: a
static analysis approach
This work was made openly accessible by BU Faculty. Please share how this access benefits you.
Your story matters.
Version Published version
Citation (published version): Renato Mancuso, Heechul Yun, Isabelle Puaut. 2019. "Impact of
DM-LRU on WCET: a Static Analysis Approach." Euromicro Conference
on Real-Time Systems (ECRTS). Stuttgart, Germany, 2019-07-09 -
2019-07-12. https://doi.org/10.4230/LIPIcs.ECRTS.2019.17
https://hdl.handle.net/2144/40665
Boston University
Impact of DM-LRU on WCET: a Static Analysis1
Approach2
Renato Mancuso3
Boston University, USA4
rmancuso@bu.edu5
Heechul Yun6
University of Kansas, USA7
heechul.yun@ku.edu8
Isabelle Puaut9
University of Rennes 1/IRISA, France10
isabelle.puaut@irisa.fr11
Abstract12
Cache memories in modern embedded processors are known to improve average memory access13
performance. Unfortunately, they are also known to represent a major source of unpredictability for14
hard real-time workload. One of the main limitations of typical caches is that content selection and15
replacement is entirely performed in hardware. As such, it is hard to control the cache behavior in16
software to favor caching of blocks that are known to have an impact on an application’s worst-case17
execution time (WCET).18
In this paper, we consider a cache replacement policy, namely DM-LRU, that allows system19
designers to prioritize caching of memory blocks that are known to have an important impact20
on an application’s WCET. Considering a single-core, single-level cache hierarchy, we describe an21
abstract interpretation-based timing analysis for DM-LRU. We implement the proposed analysis in22
a self-contained toolkit and study its qualitative properties on a set of representative benchmarks.23
Apart from being useful to compute the WCET when DM-LRU or similar policies are used, the24
proposed analysis can allow designers to perform WCET impact-aware selection of content to be25
retained in cache.26
2012 ACM Subject Classification Computer systems organization → Real-time systems; Theory of27
computation → Caching and paging algorithms28
Keywords and phrases real-time, static cache analysis, abstract interpretation, LRU, deterministic29
memory, static cache locking, dynamic cache locking, cache profiling, WCET analysis30
Digital Object Identifier 10.4230/LIPIcs.ECRTS.2019.2031
1 Introduction32
Most modern embedded processors include cache(s) to improve average performance by33
reducing average memory access cost. However, a well-known downside of using caches is34
that it makes timing analysis difficult because software has little, if any, control over whether35
a certain memory block is in the cache or not, as it is determined by the hardware—the cache36
replacement policy and the state of the cache. This is problematic because analyzing precise37
and tight worst-case timing is necessary for real-time systems. While there are timing analysis38
techniques for well-known cache replacement policies [42], they cannot take advantage of39
programmer’s insights (e.g., important data used in time-critical loops), potentially resulting40
in pessimistic timing.41
On the other hand, a scratchpad memory is similar to a cache as it offers high-speed42
temporary storage for a processor, but the key difference is that it is entirely managed by43
software. For real-time systems, the fact that software, not hardware, has full control over44
its management is highly beneficial because accurate timing analysis is possible. However,45
the downside of scratchpad is that it is generally more difficult to use than cache due to46
its high programming complexity [3]. Alternatively, some cache designs support selective47
© Renato Mancuso, Heechul Yun and Isabelle Puaut;
licensed under Creative Commons License CC-BY
31st Euromicro Conference on Real-Time Systems (ECRTS 2019).
Editor: Sophie Quinton; Article No. 20; pp. 20:1–20:23
Leibniz International Proceedings in Informatics
Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany
20:2 Impact of DM-LRU on WCET: a Static Analysis Approach
cache locking, which enables programmers to lock certain cache-lines in the cache at a48
fine-granularity (typically a cache line) [2, 7, 13]. A locked cache-line stays in the cache until49
it is explicitly unlocked by the programmer, which guarantees predictable timing. However,50
because the cache size is limited, the programmer must carefully select which cache-lines to be51
locked [5, 40]. Dynamic cache-locking techniques [39, 48] can help alleviate the size limitation52
problem of static cache-locking, but at the cost of increased complexity (for selecting locked53
cache lines) and overhead (to change cache contents dynamically).54
In this paper, we consider a new cache architecture, which can leverage programmers’ high-55
level insights on access frequency of memory blocks, and propose an abstract interpretation-56
based static analysis method to reason on the worst-case execution time (WCET) of applica-57
tions. Our approach is based on a new memory abstraction, called Deterministic Memory58
(DM). Deterministic Memory enables classification of a program’s address space into two59
distinct memory types—DM and non-DM [10], where the DM type indicates predictability is60
more important while the non-DM type indicates average performance is more important.61
The DM abstraction allows effective and extensible software/hardware co-designs, some of62
which are demonstrated in the context of providing efficient hardware isolation in multi-63
core [10]. In this work, we instead focus on a single-core with a private cache, and study how64
static guarantees on cache hits/misses can be derived for a DM-aware LRU cache replacement65
policy, which we call DM-LRU.66
We first describe the DM-LRU cache replacement algorithm, which is a single-core67
adaptation of the DM-aware cache initially proposed in [10]. Next, we generalize an abstract68
interpretation-based analysis for LRU caches to reason on the worst-case behavior of DM-69
LRU. We integrated DM-LRU support in Heptane [23], an academic static WCET analysis70
tool, in order to evaluate the effectiveness of DM-LRU in lowering tasks’ WCET. Our results71
show that with DM-LRU WCET improvements up to 23.7% can be achieved, compared to72
vanilla LRU. The WCET improvements are comparable to static and dynamic cache locking73
techniques while significantly lowering programming complexity. Our contributions are as74
follows:75
We extend LRU abstract interpretation-based analysis to perform static WCET timing76
analysis for DM-LRU.77
We implement DM-LRU support in the Heptane static WCET analysis tool.78
We provide experimental evaluation results showing the WCET benefits and complexity79
reduction of the DM-LRU based approach.80
We propose a WCET-driven heuristic approach to select content to be preferentially81
cached using DM-LRU.82
The remainder of the paper is organized as follows. Section 2 introduces necessary83
background on caches and the deterministic memory abstraction. Next, the DM-LRU policy84
is described in Section 3 and the proposed static timing analysis is described in Section 4.85
A comprehensive example on how to apply the proposed analysis is presented in Section 5.86
Comparison and differences with cache locking techniques are briefly highlighted in Section 6,87
while the WCET of a set of representative benchmarks is evaluated in Section 7. Section 888
discuss related work and we conclude in Section 9.89
2 Background90
In this section, we provide necessary background on memory abstractions, cache replacement91
algorithms, and cache timing analysis.92
R. Mancuso, H. Yun and I. Puaut 20:3
Deterministic 
memory
Best-effort
memory
Figure 1 High-level application’s memory view, where DM and BE memory coexist.
2.1 Deterministic Memory Abstraction93
Traditionally, operating systems and hardware have provided a simple uniform memory94
abstraction to applications. While the simple abstraction is convenient for programmability,95
its downside is that programmer’s insights on memory characteristics (e.g., time-criticality of96
certain data structures) cannot be explicitly expressed to enable better resource management.97
Recently, a new memory abstraction, called Deterministic Memory abstraction, was98
proposed to explore the possibilities of more expressive memory abstractions [10]. In essence,99
the abstraction allows a programmer to associate (tag) a single bit of information to each100
memory block in the system, which classifies the memory block as either “deterministic101
memory” (DM) or “best-effort memory” (BE). Figure 1 shows an example address space102
of a task using both deterministic and best-effort memory. In [10], the memory tagging is103
implemented at the page granularity, although more fine-granularity tagging is also possible104
(e.g., [45]).105
Once a task’s memory blocks are tagged, the information can then be used by the operating106
system and the hardware to apply different resource management policies depending on the107
memory tag information. In [10], the DM abstraction is used to achieve hardware isolation108
among the cores in multicore, focusing on effective isolation of shared cache and DRAM.109
2.2 DM-LRU Cache Replacement Policy110
In this paper, we consider a deterministic memory-aware private cache design and show how111
such a design enables tighter static WCET cache timing analysis. We assume the cache112
controller has a mean to distinguish whether a certain cache-line corresponds to deterministic113
memory or best-effort one. This can be implemented as an additional bit in the auxiliary tag114
store of each cache-line, as in [10], or as a set of separately located architectural hardware115
range registers as in [27]. The cache implements an extended least recently used (LRU) cache116
replacement algorithm, which defines two eviction classes using the DM/BE abstractions117
and applies LRU-based replacement to DM lines and to BE lines separately. Allocation of a118
DM line can cause eviction of a BE line, but the opposite is not allowed. Note that prior119
work that implements a similar cache replacement policy exists [27]. In this paper, we call120
the extended LRU as Deterministic Memory-aware Least Recently Used, or DM-LRU for121
short. A more formal definition of DM-LRU is given in Section 3.122
Figure 2 illustrates the difference between traditional LRU, DM-LRU, and static locking.123
For simplicity, the example considers a single set of a 4-way set-associative cache. In the124
first step, only a and d are cached, and 0 lines are allocated for DM blocks under DM-LRU.125
Moreover, blocks b and e are set as DM blocks under DM-LRU, and pre-allocated in cache126
in case of static locking. The figure tracks the evolution of the cache state for the same127
access sequence b, c, d, a, e, b. A miss for a DM block triggers an increase of the number of128
ways allocated for the DM class. This is depicted in step 2 (miss on b) and 6 (miss on e).129
ECRTS 2019
20:4 Impact of DM-LRU on WCET: a Static Analysis Approach
Figure 2 Comparison between traditional LRU, DM-LRU, and a statically locked LRU cache
over the same access pattern b, c, d, a, e, b, where b and e are DM memory blocks, or statically locked.
Traditional LRU simply ignores the DM/BE tag of the considered memory blocks. First,130
note that DM-LRU results in fewer misses compared to LRU, as the DM marked memory131
block b was not evicted by the best-effort memory accesses. Also note that while realizing132
the same number of hits in the example compared to static locking, two important remarks133
are required. First, the figure does not include the time spent to prefetch and lock the b and134
e blocks. Second, static locking causes additional misses for non-locked blocks compared to135
DM-LRU. This exemplifies the on-demand nature of DM-LRU, which is able to retain in136
cache blocks as they become needed during a task’s execution. We discuss the analogies and137
differences between DM-LRU and static/dynamic locking more extensively in Section 6.138
Intuitively, it is thanks to the on-demand allocation and differential treatment of DM139
memory blocks that DM-LRU enables tighter worst-case cache timing analysis, as we show140
in the rest of the paper.141
2.3 Cache Analysis via Abstract Interpretation142
In this work, we extend abstract interpretation-based analysis to reason on the hit/miss143
classification of memory accesses when a DM-LRU cache controller is implemented in144
hardware. Analysis via abstract interpretation was originally proposed for LRU caches [11]145
and better formalized and extended to FIFO and Pseudo-LRU in [41, 14]. An excellent146
survey on the topic was proposed in [32]. We reuse the notation in [14, 32], while some details147
are omitted due to space constraints. Since this work focuses on a DM-aware extension of148
LRU, we introduce some of the background related to abstract interpretation-based LRU149
analysis.150
Imagine taking a snapshot of the cache state at a given point in time. In this case, one151
could highlight the state of the cache in terms of: (i) which blocks are currently in cache,152
and (ii) what is the age of each block. In LRU, the age of a block, say block a, captures153
the number of memory accesses (to other blocks than a) that were performed since the last154
access to a. For instance, in the six steps in Figure 2, the LRU age for a is in the following155
sequence: 0, 1, 2, 3, 0, 1, and 2. If a has an LRU age greater than or equal to the number of156
ways (4 in our example), then a is not cached.157
If the ages of all the cached blocks are known, the cache is in a concrete state. From a158
R. Mancuso, H. Yun and I. Puaut 20:5
concrete state, it is possible to produce a new concrete state that follows each new memory159
access (state update), as shown in Figure 2. In a typical program, however, execution may160
follow different paths. This means that at a given point in time, multiple concrete states are161
possible, depending on the execution path taken by the program in its control-flow graph162
(CFG).163
Instead of keeping track of all the possible concrete states at any point of the CFG,164
abstract interpretation keeps track of two main pieces of information: (i) the upper-bound165
and (ii) the lower-bound on the age of any memory block among all the possible con-166
crete states. Analysis on the age upper-bound and lower-bound is carried on separately.167
The former is referred to as must-analysis, while the latter goes under the name of may-168
analysis. A state that summarizes the upper-bound (resp., lower-bound) of each block169
in a set of possible concrete states is called an abstract state. For instance, consider a170
must-analysis abstract state of the form: q¯ = [{}, {a, b}, {}, {d, e}]. This corresponds to171
all the concrete states where blocks a, b have age at most 1, and d, e at most 3. The full172
concretization of q¯ is the set: {[a, b, d, e], [b, a, d, e], [a, b, e, d], [b, a, e, d]}. Similarly, consider173
the may-analysis abstract state q = [{}, {}, {a, b}, {}]. A concretization of q is the set174
{[⊥,⊥,⊥,⊥], [⊥,⊥, a,⊥], [⊥,⊥,⊥, a], [⊥,⊥, b,⊥], [⊥,⊥,⊥, b], [⊥,⊥, a, b], [⊥,⊥, b, a]}, where175
⊥ is a generic unknown block.176
Given a must-analysis abstract state, it is possible to determine —i.e., classify— a177
memory access as always-hit (H). These are accesses that result in hits regardless of the path178
taken in the CFG. Similarly, given a may-analysis abstract state, it is possible to perform179
classification of always-miss memory accesses. If neither classification applies, the block is180
simply non-classified (NC). NC, often indicated as >, represents the case in which some181
execution paths lead to a miss while others lead to a hit for the same memory access.182
Note that for architectures without timing anomalies [31, 20], must-analysis is sufficient183
to safely compute the WCET of an application. In fact in this case NC accesses can be184
simply treated as misses. We developed and implemented both must- and may-analysis for185
DM-LRU, but we hereby focus in greater detail on must-analysis. Additional details about186
may-analysis are provided in the appendix.187
3 Cache Model and Terminology188
In this section we discuss the cache model adopted to represent the behavior of DM-LRU, and189
we introduce key concepts required to follow the proposed abstract interpretation analysis.190
3.1 DM-LRU Model191
Algorithm 1 shows the full pseudo-code of the DM-LRU cache replacement algorithm. The192
algorithm is defined for a generic A-way set-associative cache with S sets. The index of a set193
is indicated with s ∈ {0, . . . , S − 1}. In the algorithm, DetMasks denotes the bitmask of the194
set s’s cache lines that contain deterministic memory. Consider a DM request (DM = 1) that195
resulted in a cache miss—see step 1 or 6 in Figure 2. The algorithm first tries to evict a BE196
cache line, if such a line exists (Line 3-4). This also causes an additional bit to be asserted197
in the DetMasks bitmap. If no BE can be evicted (i.e., all lines are deterministic ones), it198
chooses one of the deterministic lines (the older one in the LRU stack) as the victim (Line199
6). On the other hand, consider the case where a BE memory block is requested (DM 6= 1),200
resulting in a miss—steps 1 and 2 in Figure 2. DM-LRU evicts one of the best-effort cache201
lines, but not any of the deterministic cache lines (Line 9).202
We assume a single-core, single-level set-associative cache. We indicate with A the203
associativity of the cache. Since DM-LRU operates independently on each set, it is possible204
to describe our analysis on a single set without loss of generality. Hereafter, we consider205
ECRTS 2019
20:6 Impact of DM-LRU on WCET: a Static Analysis Approach
Input :DetMasks - deterministic ways of Set s
Input :A - cache associativity
Output : victim - the victim way to be replaced or NULL if no replacement possible
1 if DM == 1 then
2 if (¬DetMasks) 6= NULL then
// evict a best-effort line first
3 victim = LRU(¬DetMasks)
4 DetMasks |= 1 victim
5 else
// evict a deterministic line
6 victim = LRU(DetMasks)
7 end
8 else
9 if (¬DetMasks) 6= NULL then
// evict a best-effort line
10 victim = LRU(¬DetMask)
11 else
// no BE line can be allocated
12 victim = NULL
13 end
14 end
15 return victim
Algorithm 1: Deterministic memory-aware cache line replacement algorithm.
a single cache set. At any point in time, D is the number of cache lines allocated to DM206
memory blocks for the considered cache set. D is the number of bits set to “1” in the207
DetMasks for the set under analysis. We indicate with B the number of lines that have not208
been allocated for DM memory. It holds that D +B = A. Note that if D < A, and a DM209
line that is currently not cached as a DM line is accessed, then the new DM line is allocated210
and D is increased by one. This may trigger the eviction of the least recently used BE block,211
as per Algorithm 1.212
3.2 Terminology and notations213
We indicate with B the set of memory blocks that map to the cache set under analysis.214
A generic memory block bCL ∈ B is comprised of an address b and an eviction class215
CL = {DM,BE}. The set of all the possible concrete states of a DM-LRU cache is denoted216
as QDM−LRUA , where each state q ∈ QDM−LRUA is defined as follows:217
q := {D, [bDM0 , . . . , bDMD−1], [bCLD , . . . bCLA−1]}, (1)218
where D ∈ [0, A] and bCLi ∈ B. Note that the first D cache lines are allocated as DM219
cache lines, hence these are necessarily DM memory blocks. The remaining A−D blocks are220
currently allocated BE memory blocks. Throughout this paper we will use the shorthand221
notation bi ∈ B for blocks whose eviction class is obvious from context or unimportant. For222
blocks allocated as BE, we assume BE class unless specified otherwise.223
An important concept is the age of a memory block under DM-LRU, defined as follows.224
I Definition 1 (DM-LRU Age). The age of a DM memory block aDM is defined as the225
number of distinct DM blocks accessed since the last access to aDM ; the age of a BE memory226
block b is set to the current value of D whenever bBE is accessed. It is then defined as D+K,227
where K is the number of misses to DM blocks, or accesses to distinct BE blocks since the228
last access to bBE.229
Following Definition 1, the index of a given block bCLi ∈ q is also the age of the block230
in DM-LRU. The age of a block bDMi allocated as DM can increase if: (1) a new DM line231
is allocated (with age 0); or (2) a line bDMj already allocated as DM with age greater than232
bi is accessed. Conversely, the age of a BE block bBEi can increase if: (1) a new DM line is233
allocated (with age 0); (2) a new BE line is allocated (with age D); or (3) a line bBEj already234
in cache with age greater than bBEi is accessed.235
R. Mancuso, H. Yun and I. Puaut 20:7
Also note that Definition 1 remains consistent for the case in which a block bBE is236
accessed but cannot be allocated because all the sets have been reserved for DM lines. This237
phenomenon goes under the name of DM takeover, and can be resolved by imposing a hard238
cap on the maximum number of DM lines that can be allocated. The analysis for a DM-LRU239
with an allocation cap is almost identical to an unrestricted DM-LRU, and only introduces240
uninteresting subcases. For simplicity, we hereby focus on the analysis for unrestricted241
DM-LRU. We demonstrate that preventing DM takeover is indeed necessary and beneficial242
in Section 7.243
4 DM-LRU Analysis244
In this section we detail our abstract interpretation-based analysis [14, 32] for DM-LRU,245
i.e. when the cache controller implements the policy defined in Algorithm 1. We discuss246
must-analysis in detail. As previously mentioned, may-analysis is not strictly required for247
architectures without timing anomalies. As such we only provide the intuition behind it and248
defer the details to the appendix. We do not provide a persistence analysis for DM-LRU.249
Persistence analysis is useful to determine if memory accesses inside loops can result in hits250
after the first iteration. Instead, for our evaluations, we unroll the first iteration of each loop,251
i.e., we perform virtual unrolling, virtual inlining (VIVU) [34, 32].252
4.1 Must-analysis253
Must-analysis is performed considering abstract cache states. In this case, must-analysis254
keeps track of the upper bound on the number of allocated DM blocks indicated with255
D ∈ {0, . . . , A}, and the upper-bound on the DM-LRU age of each addressable memory block256
b ∈ B. The abstract domain DMLruvA is defined as:257
DMLruvA := {0, . . . , A} × B → {0, . . . , A− 1,∞}. (2)258
Intuitively, the domain associates a current eviction class (DM or BE) and an age upper259
bound (0, . . . , A or ∞) to a memory block b ∈ B mapping to the set under analysis. We use260
the notation q¯(b) to indicate the upper-bound on the age of b in q¯. To represent a generic261
abstract state q¯ ∈ DMLruvA we use a compact notation that highlights the distinction262
between DM and BE allocations. For instance, the notation263
q¯ = [{}, {a, b}], [{c}, {d}] ∈ DMLruvA (3)264
denotes an abstract state q¯ where D ≤ 2, B ≥ 2, A = 4. Hence, blocks a and b have265
upper-bound q¯(a) = q¯(b) = 1 on their DM-LRU age. Similarly, c, d are BE blocks with266
q¯(c) = 2 and q¯(d) = 3, respectively.267
Given an abstract state q¯ ∈ DMLruvA, the Boolean operator DMv(q¯, b) returns true268
only if the block b ∈ B must exist as a DM-allocated block in q¯. Formally269
DMv(q¯, bCL) :=
{
true if CL = DM ∧ q¯(b) <∞
false otherwise.
(4)270
271
272
For instance, considering q¯ defined as in Equation 3, we obtain DMv(q¯, a) = true,273
DMv(q¯, d) = false, and so on. We use the simpler notation DMv(b) when the state is274
implicit. The operator BEv(q¯, b) is simply defined as BEv(q¯, b) := ¬DMv(q¯, b). To prevent275
additional clutter in our notation, DMv(q¯, bDM ) evaluates to true if and only if the DM276
block bDM must be allocated in cache in q¯. As such, if the generic DM block bDM has an277
upper-bound on its DM-LRU age greater than A− 1, then BEv(q¯, bDM ) = true.278
An abstract state transformer for the DMLruvA domain is an operator that takes in input279
an abstract state q¯ ∈ DMLruvA and any number of additional parameters, and returns in280
ECRTS 2019
20:8 Impact of DM-LRU on WCET: a Static Analysis Approach
output a transformed state q¯′ ∈ DMLruvA. We consider and define two abstract transformers281
for DMLruvA: an update transformer Uv(q¯, a), and a join transformer Jv(q¯, p¯). We use the282
operator λb. to represent an age update operation carried on each b ∈ B when considering a283
transformation from state q¯ to q¯′. This operator can be formally defined as:284
λb. f(q¯(b)) := ∀b ∈ B, q¯′(b)← f(q¯(b)) (5)285
Must-analysis Update286
The update abstract transformer for the must-analysis Uv(q¯, a) is used to go from an initial287
abstract state, to a new abstract state after a new memory access has been performed.288
Uv(q¯, a) takes in input an initial abstract state q¯ and a memory block a ∈ B, and returns289
the abstract state that results from accessing a. For ease of notation, we split the definition290
of Uv in two parts: the logic that corresponds to the update operation when a DM block291
aDM is accessed, indicated with UvD ; and the update transformation when a BE block aBE292
is accessed, namely UvB . U
v
D is defined in Equation 6.293
UvD (q¯, a
DM ) :=294
D′ ←
{
D + 1 if D < A ∧BEv(a) (a.1)
D if D = A ∨DMv(a) (a.2)295
λb.

0 if b = a (b)
q¯(b) if b 6= a ∧
∣∣∣∣∣∣
∣∣∣∣∣∣
BEv(b) ∧DMv(a) (c.1)
DMv(b) ∧ q¯(a) ≤ q¯(b) (c.2)
BEv(b) ∧BEv(a) ∧ q¯(a) ≤ q¯(b) (c.3)
q¯(b) + 1 if b 6= a ∧ q¯(a) > q¯(b)∧∣∣∣∣∣∣∣∣DMv(b) ∧ q¯(b) < D′ − 1 (d.1)BEv(b) ∧BEv(a) ∧ q¯(b) < A− 1 (d.2)
∞ if b 6= a ∧ q¯(a) > q¯(b)∧∣∣∣∣∣∣∣∣DMv(b) ∧ q¯(b) ≥ D′ − 1 (e.1)BEv(b) ∧BEv(a) ∧ q¯(b) ≥ A− 1 (e.2)
(6)296
297
298
Here, D′ (B′, resp.) is the new value of D (B, resp.) after the update. The conditions299
following the || operator are to be considered in logical “or” with each other.300
The update abstract transformer UvB for a best-effort memory access a can be defined as301
follows:302
UvB (q¯, a
BE) :=303
λb.

D if b = a ∧D < A (a)
q¯(b) if b 6= a ∧
∣∣∣∣∣∣∣∣DMv(b)BEv(b) ∧ q¯(a) ≤ q¯(b) (b)
q¯(b) + 1 if b 6= a ∧BEv(b) ∧ q¯(a) > q¯(b) ∧ q¯(b) < A− 1 (c)
∞ if
∣∣∣∣∣∣∣∣ b = a ∧D ≥ Ab 6= a ∧BEv(b) ∧ q¯(a) > q¯(b) ∧ q¯(b) ≥ A− 1 (d)
(7)304
305
306
To clarify the update operation, consider the abstract state q¯ = [{}, {b, f}], [{c}, {d}],307
where D = 2. Assume that deterministic block aDM is accessed, which has age upper-308
bound ∞ in q¯, to obtain q¯′ = Uv(q¯, a) = UvD (q¯, a). First, the value of D′ is computed as309
R. Mancuso, H. Yun and I. Puaut 20:9
D′ = D+ 1 = 3 (a.1); next, b, f both satisfy the condition q¯(a) > q¯(b) = q¯(f) = 1. Moreover,310
we have that DMv(b) = DMv(f) = true, and that q¯(b) = q¯(f) = 1 < D′ − 1 = 2. This311
corresponds to condition (d.1) in Equation 6. Hence, the age of b, f in the resulting state is312
q¯′(b) = q¯′(f) = 2. Similarly, block c and d satisfy condition (d.2) and (e.2), respectively. The313
resulting updated abstract state is: q¯′ = [{a}, {}, {b, f}], [{c}].314
An example for the abstract transformer UvB defined in Equation 7 is provided in Section 5.315
I Theorem 2 (Correctness of must-analysis update). Consider a generic abstract state316
p¯ = Uv(q¯, aCL) obtained from the must-analysis update state transformer when accessing317
a generic block aCL from an initial abstract state q¯. Then for any block b ∈ B, p¯(b) is an318
upper-bound on the DM-LRU age of b.319
Proof Sketch. A proof can be constructed by considering two main sub-cases: (1) when320
CL = DM for the block being accessed; and (2) the case when CL = BE. Due to space321
constraints, we provide an intuition for the former case, as the latter follows from the same322
reasoning. When considering CL = DM , the new state p¯ is obtained as p¯ = UvD (q¯, aDM ), as323
per Equation 6.324
First let us consider the rule on the update of D. If q¯(a) =∞ then a is not necessarily325
in cache and accessing a increases the upper-bound on the number of allocated DM blocks,326
as long as the associativity A has not been exceeded, i.e. D < A. In this case, note that327
BEv(q¯, a) = true and condition Equation 6 (a.1) applies. D does not change in any other328
case (a.2). After the update, block a will have age upper-bound equal to 0 (b).329
Next, consider all the blocks b 6= a that had age upper-bound of infinity in q¯ — i.e.330
q¯(b) = ∞, and BEv(q¯, b) = true. When a is accessed, their age upper-bound should not331
change. If q¯(a) =∞ then condition (c.3) applies. If q¯(a) 6=∞ then DMv(q¯, a) = true and332
condition (c.1) applies.333
Furthermore, consider all the blocks bDM , b 6= a that must be allocated as DM blocks in334
q¯, i.e. such that DMv(q¯, b) = true. If q¯(a) = ∞, the upper-bound on their DM-LRU age335
will have to increase by 1 (d.1). If however the value of q¯(b) + 1 exceeds the updated value of336
D, namely D′, then the block may be evicted and the new upper-bound on its DM-LRU age337
p¯(b) =∞ (e.1). The same cases apply when q¯(a) <∞ and q¯(a) > q¯(b).338
On the other hand, if a has an age upper-bound that is same as or lower than b’s, i.e.339
q¯(a) ≤ q¯(b), then a concrete state where DM-LRU age of a is strictly larger than that of b340
cannot exist. As such, the upper-bound on the DM-LRU age of b will not change, as per341
condition (c.2).342
Lastly, consider all the blocks bBE , b 6= a that must be allocated as BE blocks in q¯, i.e.343
such that BEv(q¯, b) = true and q¯(b) <∞. The only case in which q¯(a) > q¯(b) is if q¯(a) =∞.344
When a is accessed, the upper-bound on the age of b will have to increase by 1 (d.2), unless345
by doing so the associativity A is exceeded. In the latter case, p¯(b) =∞ (e.2). J346
Must-analysis Join347
The join abstract transformer Jv(q¯, p¯) is used to compute a new abstract state at the merging348
point of two or more execution paths. There are strong similarities between the transformer349
defined hereby and what used in traditional LRU must-analysis [14]. At a high level, the350
joined state will consider as must-cached only those blocks in the intersection of the joining351
states, each with the maximum age in any of the two states. For the new state, D is taken352
as the maximum between the value of D in the joining states. Equation 8 formalizes the353
Jv(q¯, p¯) abstract transformer:354
Jv(q¯, p¯) := D ← max{Dq¯, Dp¯}, λb.max{q¯(b), p¯(b)} (8)355
If we were to join q¯ = [{}, {b, f}], [{c}, {d}] with q¯′ = [{a}, {}, {b, f}], [{c}], the resulting356
state would be q¯′′ = Jv(q¯, q¯′) = [{}, {}, {b, f}], [{c}].357
ECRTS 2019
20:10 Impact of DM-LRU on WCET: a Static Analysis Approach
Figure 3 Fragment of process CFG. At the end of the fragment, all the cache blocks in the figure
may be cached.
I Theorem 3 (Correctness of must-analysis join). Consider an abstract state s¯ = Jv(q¯, p¯)358
obtained from the must-analysis join state transformer from two initial abstract states q¯ and359
p¯. Then for any block b ∈ B, s¯(b) is an upper-bound on the DM-LRU age of b.360
Proof Sketch. A proof can simply follow from the definition of the Jv operator in Equation 8.361
By hypothesis q¯ and p¯ carry the upper-bound on the age of a generic block b along two disjoint362
execution sub-paths. After the two sub-paths join, the maximum between q¯(b) and p¯(b) is a363
safe upper-bound on the DM-LRU age of b in the resulting abstract state s¯. Moreover, an364
upper-bound on the number of allocated DM blocks in s¯ is the maximum between Dq¯ and365
Dp¯. J366
Must-analysis Classification367
Every time an access is performed, it is possible to classify a memory access using a368
classification function that will either return M for cache miss, H for cache hit, or > in case369
neither M nor H classification can be made given the current abstract state. In order to370
classify memory accesses, for a given q¯ abstract state we define two helper sets D¯ and B¯371
representing the deterministic and best-effort memory blocks that have finite upper bound372
on their DM-LRU age:373
D¯ := {bCL ∈ B | CL = DM ∧ q¯(b) <∞}374
B¯ := {bCL ∈ B | CL = BE ∧ q¯(b) <∞} (9)375376
The classification function of the must analysis is defined as:377
C
v(q¯, aCL) :=

H if q¯(a) <∞ (a)
M if
∣∣∣∣∣∣CL = DM ∧ a 6∈ D¯ ∧ |D¯|= D
CL = BE ∧ a 6∈ B¯ ∧ |B¯|= B (b)
> otherwise (c)
(10)378
379
We provide a complete step-by-step example on how must-analysis can be applied to an380
application’s CFG in Section 5.381
4.2 May-analysis382
The complete may-analysis is provided in the appendix (Section 10). We hereby provide a383
sketch of the approach followed in the analysis.384
The goal of may-analysis is to track the lower-bound on the age of memory blocks. Given385
a may-analysis abstract state it is possible to classify a memory access as always leading to a386
miss. Let us consider the example in Figure 3 and reason on the lower bound on the age of387
R. Mancuso, H. Yun and I. Puaut 20:11
each block for a 4-way fully associative cache. For block aDM , the best case is represented by388
the execution pattern 1-5-4. In this case, the block has DM-LRU age 0. A similar situation389
occurs for block bDM and path 1-3-4. For blocks f and g, the best-case is represented by the390
paths 2-6-8, and 2-7-8, respectively. This leads the two blocks to have a lower-bound of 2 on391
their DM-LRU age. Similarly, blocks c, d, and g have lower-bound 0, 1, and 3, respectively.392
We can represent the resulting may-analysis state obtained following the derivation above393
as: [{a, b}], [{c}, {d}, {f, e}, {g}]. What happens if another access to a occurs after path 4394
and 8 join? Then the best-case for block b is still 1-3-4, but its age lower-bound will be 1.395
At the same time, because at least one DM block was allocated regardless of the taken path,396
the minimum lower-bound on the age of any BE block has to be 1. Also note that regardless397
of the execution path taken, block g will be evicted. The result is the following may-analysis398
abstract state: [{a}, {b}], [{}, {c}, {d}, {f, e}].399
5 Analysis Example400
In this section we provide a description of how DM-LRU must-analysis can be applied to a401
CFG once the target of each memory access is known. The original CFG of the considered C402
program code generated by the Heptane tool is shown in Figure 4. The program consists of403
a single loop with four iterations, where the first iteration has been unrolled. The program404
accesses 7 memory locations. These are B = {a, b, c, d, e, f, g} and are visible in the various405
basic blocks as operands of load/store instructions.406
Figure 5 shows the same CFG as in Figure 4, but where only basic blocks in which407
memory accesses are being performed are kept. Moreover, basic blocks with multiple memory408
accesses are depicted as sequences of blocks, each with a single memory access. The nodes409
are annotated with their corresponding abstract states. We apply must-analysis starting410
from the entry node a. We compare the behavior of traditional LRU analysis and DM-LRU411
when blocks a and f have been declared as DM. We consider a fully-associative cache with412
4 ways. For DM-LRU analysis, the cache state before the first access q¯0 = [], [{}, {}, {}, {}]413
(D0 = 0); for LRU analysis it is [{}, {}, {}, {}]. Under DM-LRU, when block aDM is accessed,414
the performed operation is q¯1 = Uv(q¯0, aDM ) = UvD (q¯0, a). Following Equation 6, we have415
D1 = D0 + 1, then condition (a) is satisfied by a, all the other blocks b ∈ B satisfy condition416
(d.2). As such, we have q¯1 = [{a}], [{}, {}, {}], as reported in the figure.417
Let us now follow the upper branch with access sequence d → e → g (all of them are418
best-effort memory accesses). For each memory access, we apply Equation 7 to obtain a new419
abstract state. After accessing e, the resulting abstract state is: q¯2 = [{a}][{e}, {d}, {}]. Let420
us now show more clearly how we obtain q¯3 = Uv(q¯2, gBE) = UvB (q¯2, g), when we next access421
g. Considering all blocks in B and using Equation 7 we know: block a satisfies condition (b.1)422
and its age remains the same; b and c satisfy (d.2) and their age remains ∞; block e satisfies423
(c) and its age increases by 1, from 1 to 2; the age of block d increases from 2 to 3; and424
finally, block g (being accessed) satisfies condition (a) and its age is set to D2 = 1. The final425
state is q¯3 = [{a}], [{g}, {e}, {d}], as shown in the figure above node g. The same procedure426
applies to the lower branch of the CFG, and we obtain the state q¯4 = [{a}], [{c}, {b}, {}],427
after we access c.428
Before accessing f , we need to join states q¯3 and q¯4 derived above. In this case, we apply429
Equation 8 to obtain q¯5 = Jv(q¯3, q¯4). It follows that D5 = 1. Moreover, the only block430
present in both states q¯3 and q¯4 is a. All other blocks in B will have age ∞ in q¯5. As such we431
have q¯5 = [{a}], [{}, {}, {}]. Next, accessing fDM yields q¯6 = UvD (q¯5, f) = [{f}, {a}], [{}, {}],432
as shown in the figure. This is because D6 = D5 + 1, and because a satisfies condition (c.1)433
in Equation 6. The same reasoning can be applied to obtain the remaining states depicted in434
the figure.435
Consider now the state q¯6 and apply the must-analysis classifier before accessing aDM ,436
ECRTS 2019
20:12 Impact of DM-LRU on WCET: a Static Analysis Approach
main
loop [3]
@0x8150 (BB)
@0x80e8 (BB)
LOAD a (DM)@0x815c (BB)
@0x8138 (BB)
LOAD f (DM)
@0x80f8 (BB)
LOAD b
LOAD c
@0x8120 (BB)
LOAD d
LOAD e
LOAD g
@0x80d8 (BB)
@0x8150 (BB)
@0x80e8 (BB)
LOAD a (DM)
@0x8138 (BB)
LOAD f (DM)
@0x80f8 (BB)
LOAD b
LOAD c
@0x8120 (BB)
LOAD d
LOAD e
LOAD g
Figure 4 Original CFG of considered example as rendered by the Heptane tool, with annotated
memory accesses (LOAD). Note that VIVU has been performed on the loop.
R. Mancuso, H. Yun and I. Puaut 20:13
Figure 5 An example of must analysis under DM-LRU (orange states), compared to traditional
LRU (blue states). If a and f are marked as DM, their accesses inside the loop can be classified as
always hits.
i.e. compute Cv(q¯6, a) as in Equation 10. First, the sets D¯6 and B¯6 can be computed using437
Equation 9 as D¯6 = {a, f}, and B¯6 = {}. Hence condition (a) is satisfied and access to a is438
classified as H (hit). Conversely, no access can be classified as hit under LRU.439
6 Analogies and Differences with Cache Locking440
Cache locking refers to a technique where cache blocks that are deemed important for441
an application’s timing are pinned (locked) in cache. Similar to DM-LRU, cache locking442
represents a way to partially override the best-effort replacement strategy offered by the443
hardware. And like DM-LRU, specialized hardware support is required to perform locking.444
With respect to WCET analysis, the big advantage provided by cache locking is that all those445
accesses for locked cache blocks can be immediately classified as hits. While cache locking446
was commonly supported in previous-generation embedded systems, the current trend in447
embedded SoCs is toward cache controllers that offer little or no management primitives.448
Despite the strong similarities, some profound differences exist between cache locking449
and DM-LRU. Leveraging cache locking implies injection of additional logic —in either the450
application, the compiler, and/or the OS— to perform a series of prefetch&lock operations.451
On the contrary, a system featuring a DM-LRU cache only requires that memory blocks are452
tagged appropriately at task load time.453
In case of static locking, prefetch&lock can be performed at initialization time. As such,454
the extra logic required to perform locking does not impact the task’s WCET. Conversely,455
with dynamic cache locking, the locked cache content is modified at runtime. Depending on456
the available hardware support, this operation may not be directly possible in user-space,457
requiring instead a costly switch to kernel-space. Regardless, an online prefetch&lock routine458
can pollute the rest of the cache, resulting in overheads that may largely offset any benefit.459
In other words, additional system-level assumptions are required to make a meaningful460
comparison with dynamic locking. For this reason, we do not compare DM-LRU against461
dynamic locking.462
Interestingly enough, however, the proposed DM-LRU analysis can be re-used to analyze463
dynamic locking schemes if additional system parameters are available. In a nutshell, consider464
a 4-way fully associative cache. Next, assume that the locked content is switched whenever465
a given branch in the CFG is taken. Then, consider the case where the new content to466
be locked is comprised of blocks a, b, c. A special node on the branch can be added with467
associated a modified update abstract transformer Lockv. This is such that the resulting468
must-analysis abstract state q¯ after the update is: q¯ = Lockv({a, b, c}) = [{a}, {b}, {c}][{}].469
ECRTS 2019
20:14 Impact of DM-LRU on WCET: a Static Analysis Approach
7 Evaluation470
The DM-LRU analysis presented in the previous sections provides a way to understand how471
the WCET of applications varies as memory blocks addressed in applications are declared472
as DM. We now apply DM-LRU analysis to a set of realistic embedded benchmarks. In473
this section, we first briefly describe our implementation. Next, we investigate three main474
aspects: (1) what is the degree of WCET improvement that can be achieved via DM-LRU475
when compared to LRU? (2) Is there an advantage in imposing a limit to the number of DM476
lines that can be simultaneously allocated, i.e. in preventing DM takeover? (3) how does477
DM-LRU compares to static cache locking?478
In our evaluation we focus on the degree of WCET improvement that DM-LRU can479
provide compared to LRU. However, because supporting DM-LRU implies changes to the480
hardware cache memory and controller, it is also important to determine if a DM-LRU481
implementation can be efficiently carried out. In short, only one additional bit to distinguish482
between DM and BE lines is required per cache line. Additionally, compact changes1 are483
required to the cache controller to restrict victim selection based on the classification (DM484
or BE) of a new line being allocated. No additional logic is required to appropriately set the485
DM bits at line fetch. Additional considerations on the incurred hardware costs to support486
DM memory are also provided in [10].487
7.1 Implementation488
We have implemented support for DM-LRU inside a state of the art static WCET analysis tool,489
namely Heptane [23]. Heptane implements Implicit Path Enumeration Technique (IPET) [29]490
and performs analysis for many cache architectures: e.g., LRU, FIFO, Pseudo-LRU, multi-491
level non-inclusive caches, and shared caches. In our setup, we consider a single-level of492
cache, divided into an instruction (I) cache, and a data (D) cache. For simplicity, we assume493
in all our experiments that both caches are selected to have the same number of sets and494
ways. Heptane supports two architectures: ARMv7 and MIPS. The ARMv7 target was used495
for this paper.496
We have modified the Heptane tool to support two variants of DM-LRU, as well as497
static locking. Most importantly, we have extended the support for abstract interpretation-498
based cache analysis to implement the must- and may-analysis presented in the previous499
sections. The performed changes allow backward compatibility with the original set of500
policies supported by the tool. Next, we have integrated the logic to differentiate between BE501
memory and DM memory. For this purpose, we have added a table of DM addresses—the502
DM Table—that can be specified by an external tool, mimicking the selection of DM blocks503
by the OS at binary load time. Furthermore, we have added appropriate logic in Heptane504
to output per-memory-block statistics in terms of references, hits, and misses, as computed505
during WCET analysis. These statistics are then used to build a DM-block selection heuristic.506
Finally, we have modified Heptane’s CFG extraction routines to perform VIVU—i.e., to507
recursively peel the first iteration of every loop.508
We have developed and employed a simple heuristic to determine which memory block-509
s/addresses should be marked as DM and inserted into the DM Table. The heuristic initially510
performs WCET analysis without selecting any DM line. Next, it analyzes the per-memory-511
block statistics and selects as DM the block with the largest number of misses. At this point,512
WCET analysis is performed again with the new DM Table containing a single entry. Using513
1 Whenever a line eviction has to occur, the DM/BE bits of all the lines in the considered set form a
bitmask. Victim selection for a BE access is then performed by excluding all those lines that have a bit
set to 1 in the DM bitmask.
R. Mancuso, H. Yun and I. Puaut 20:15
the per-memory-block statistics of the latest run, a new DM block is selected in addition to514
the previously selected block. The same steps are performed until no more addresses can be515
selected as DM. Note that when no lines are selected as DM, the behavior of the cache is516
indistinguishable from vanilla LRU. Similarly, when the entire memory of an applications is517
selected as DM, no differences exist with LRU. In practice, however, we saw no differences518
between DM-LRU and LRU when more than 3× S ×A lines are selected as DM, where S519
and A is the number of sets and ways of the cache, respectively. In our experiments, we520
acquired analytical results for a number of DM lines in the range [1, 3× S ×A].521
7.2 Setup522
We compare two variants of DM-LRU and static cache locking against LRU. A description523
of the three scenarios follows.524
1. Unrestricted DM-LRU ("DM-nolim"): in this variant, no restriction is imposed on525
the maximum number of cache sets that can be reserved for DM lines. It follows that526
the only constraint for the allocation of DM lines is the cache associativity itself. The527
analysis for unrestricted DM-LRU is the one presented in the previous sections.528
2. Limited DM-LRU ("DM-cap"): in this variant, a hard cap in the maximum number529
of ways is imposed on the expansion of DM lines. This represents a solution to the530
aforementioned problem of DM takeover. Imposing a cap of 0 makes DM-cap to be531
identical to vanilla LRU. Similarly, imposing a cap of A makes DM-cap to be identical532
to DM-nolim. In our experiments, we explore all the possible values of cap in the range533
[1, A].534
3. Static locking ("Static"): this case is used to draw a comparison between the considered535
DM-LRU variants and static locking. In case of static locking, selection of lines to statically536
allocate is performed following the same heuristic used for DM lines selection. Similar to537
DM-cap, we impose how many ways can be dedicated to statically locked content (locked538
ways). The maximum number of allocated line is then S × locked. Note that the main539
performance difference between DM-cap and Static lies in the additional flexibility that540
DM-cap provides. In DM-cap, in fact, more lines than S × cap can be selected, while it is541
not allowed in static locking.542
For all the considered variants, we explore a number of cache configurations. Specifically,543
we vary the associativity A of the I/D caches in the set {2, 4, 8, 16}. We vary the number544
of cache sets S such that S ∈ {2, 4, 8, 16, 32}. As previously mentioned, for DM-nolim and545
DM-cap, we progressively select up to 3× S ×A DM lines following the heuristic described546
above. In each system instance, we perform WCET analysis using the modified Heptane547
tool. Then, we keep track of which configuration—S, A, DM-lim cap, locked ways, number548
of DM lines—for each of the three scenarios leads to the best reduction in WCET compared549
to the vanilla LRU case.550
For our benchmarks, we use a subset of realistic benchmarks from the Mälardalen suite [19].551
Unfortunately, vanilla HEPTANE is not able to perform WCET analysis for some of the552
benchmarks in the suite. As such, our evaluation only includes those benchmarks that553
are correctly handled by HEPTANE. Notably, the aforementioned changes to implement554
DM-LRU analysis did not impact the set of benchmarks that can be correctly analyzed by555
the tool.556
7.3 Results557
Figure 6 provides an overview of the obtained results. A cluster of bars is provided for each of558
the considered benchmarks. Reading the plot from top to bottom, the first bar corresponds559
to the WCET under LRU. All the results in the figure are normalized to the LRU case. The560
ECRTS 2019
20:16 Impact of DM-LRU on WCET: a Static Analysis Approach
Normalized WCETs
0.0 0.5 1.0
bs
crc
fibc
all
lcdn
um
min
ver
0.0 0.5 1.0
prim
e
sqrt
bso
rt10
0
exp
int
ludc
mp
0.0 0.5 1.0
qurt
stat
ema
te
inse
rtso
rt
mat
mul
t
ns
0.0 0.5 1.0
sele
ct
ud
jfdc
tint
min
max
fft
LRU DM-nolim DM-cap Static
Figure 6 Computed WCETs for vanilla LRU (LRU), unrestricted DM (DM-nolim), DM limited
to a subset of ways (DM-cap), and static locking (Static).
second bar represents the best WCET improvement that was observed under DM-nolim. The561
WCET improvement is calculated as: WCETDM-nolimWCETLRU , where the WCETs under DM-nolim562
and under LRU are obtained in the same system configuration. A similar calculation was563
performed to derive the remaining two bars, i.e. for the DM-lim and Static cases.564
What emerges from the plot is that in 16 out of 20 cases, DM-nolim is able to achieve565
WCET reductions compared to vanilla LRU. Notably, in case of bsort100 and prime, it is566
possible to achieve a WCET reduction of around 23.73% and 23.47%, respectively. It can567
also be noted that DM-cap outperforms DM-nolim. Moreover, DM-cap performs generally568
better than static locking. For instance, the best WCET reduction achieved via static locking569
for the jfdctint benchmark is 26.09% under DM-cap (with a S = 4, A = 8, 19 DM lines,570
cap = 1). But the best WCET reduction under static locking is only 14.64%, which is571
achieved for a cache with parameters S = 2, A = 16 and 15 ways occupied by statically572
locked lines. Similar results can be observed for the benchmarks matmult and fft.573
The reason for the performance improvement that can be obtained with DM-cap is574
twofold. On the one hand, the problem of DM takeover is solved. This prevents the case that575
all the accesses to BE lines result in misses. On the other hand, for applications that exhibit576
changes in working sets, static locking can be sub-optimal. Conversely, under DM-cap, is is577
possible to mark lines belonging to different working sets as DM. In this case, at working578
set changes over time, those DM lines belonging to a previous working set will be naturally579
evicted, without suffering pollution from BE lines.580
A more detailed overview of the obtained experimental results is provided in Table 1. In581
the table, the first column reports the name of the benchmark under analysis. If multiple582
configurations are of interest, multiple rows are shown for a given benchmark. The second583
column reports the cache configuration in terms of sets S and ways W for the results on584
each row. Next, the WCET obtained with LRU is reported in the following column, followed585
by the best WCET obtained for the same configuration under DM-nolim and the relative586
improvement (due to the space limitation, the number of DM lines that were selected has587
been omitted in the table.) Similarly, the best result obtained under DM-cap is reported588
next, and the value of cap under which the result was achieved is reported in the adjacent589
column. Finally, the last two columns report the WCET (and the relative improvement) for590
static locking with the given cache configuration and number of locked ways reported in the591
last column.592
8 Related Work593
Memory Tagging and Hardware Support. In this work, we assume that hardware594
R. Mancuso, H. Yun and I. Puaut 20:17
Benchmark S×A LRU DM-nolim DM-cap cap Static locked
bs 2×2 6613 5513 (-16.63%) 5513 (-16.63%) 2 5513 (-16.63%) 2
crc 4×2 2492320 2425920 (-2.66%) 2330620 (-6.49%) 1 2330620 (-6.49%) 1
fibcall 2×2 14191 14191 (-0.00%) 14191 (-0.00%) 1 14191 (-0.00%) 1
lcdnum 4×2 16291 14791 (-9.21%) 14791 (-9.21%) 2 14791 (-9.21%) 2
2×4 16191 16191 (-0.00%) 14391 (-11.12%) 2 15291 (-5.56%) 2
minver 4×2 126558 115758 (-8.53%) 109958 (-13.12%) 1 109958 (-13.12%) 1
prime 2×4 611425 467925 (-23.47%) 467925 (-23.47%) 3 467925 (-23.47%) 3
sqrt 2×4 54983 47552 (-13.52%) 47252 (-14.06%) 3 52552 (-4.42%) 2
2×4 54983 47552 (-13.52%) 47252 (-14.06%) 3 47583 (-13.30%) 6
bsort100 2×2 12434700 9484580 (-23.72%) 9484580 (-23.72%) 1 9484580 (-23.72%) 1
expint 2×4 759551 709651 (-6.57%) 709651 (-6.57%) 4 709651 (-6.57%) 4
ludcmp 16×2 638233 564633 (-11.53%) 564633 (-11.53%) 2 564633 (-11.53%) 2
qurt 2×8 217555 212160 (-2.48%) 173755 (-20.13%) 6 173755 (-20.13%) 6
4×4 217555 220355 (–1.29%) 171155 (-21.33%) 3 171155 (-21.33%) 3
statemate 2×2 616218 612918 (-0.54%) 576718 (-6.41%) 1 576718 (-6.41%) 1
8×8 383718 382818 (-0.23%) 359118 (-6.41%) 6 359118 (-6.41%) 6
insertsort 2×2 80126 70126 (-12.48%) 70126 (-12.48%) 1 70126 (-12.48%) 1
matmult 2×2 7191620 6568220 (-8.67%) 5555520 (-22.75%) 1 6391620 (-11.12%) 1
ns 4×2 193481 193481 (-0.00%) 193481 (-0.00%) 1 193481 (-0.00%) 1
2×2 530781 534781 (–0.75%) 406681 (-23.38%) 1 406681 (-23.38%) 1
select 4×2 170766 162266 (-4.98%) 157966 (-7.50%) 1 157966 (-7.50%) 1
2×4 170766 162866 (-4.63%) 150966 (-11.59%) 3 150966 (-11.59%) 3
ud 4×2 226843 223243 (-1.59%) 223243 (-1.59%) 2 225243 (-0.71%) 2
2×2 302443 354143 (–17.09%) 283843 (-6.15%) 1 283843 (-6.15%) 1
jfdctint
2×16 150234 128734 (-14.31%) 111034 (-26.09%) 2 128234 (-14.64%) 15
4×8 150234 130334 (-13.25%) 111134 (-26.03%) 1 147134 (-2.06%) 1
4×8 150234 130334 (-13.25%) 111134 (-26.03%) 1 130334 (-13.25%) 7
minmax 2×2 4034 4034 (-0.00%) 4034 (-0.00%) 1 4034 (-0.00%) 1
2×4 4034 4034 (-0.00%) 3934 (-2.48%) 1 4034 (-0.00%) 1
fft
32×2 1683830 1623930 (-3.56%) 1623430 (-3.59%) 1 1623430 (-3.59%) 1
4×4 2488230 2494360 (–0.25%) 2140830 (-13.96%) 1 2443230 (-1.81%) 1
4×4 2488230 2494360 (–0.25%) 2140830 (-13.96%) 1 1716630 (-4.62%) 2
Table 1 Summary of notable experimental results under four strategies: (1) vanilla LRU
("LRU"); (2) unrestricted DM-LRU ("DM-nolim"); (3) restricted DM-LRU ("DM-cap"); and (4)
static locking ("Static").
allows us to encode (tag) extra information (e.g., importance) on memory locations at a fine-595
granularity. The basic idea of memory tagging has first explored in the security community, to596
prevent memory corruption (e.g., buffer overflow) [6, 38] and to enforce data flow integrity [45]597
and capability protection [51]. Efficient hardware designs for word-granularity single-bit and598
multi-bit memory tagging have been investigated [24] and several real SoC prototypes have599
been built [45, 4], demonstrating the feasibility. In the real-time systems community, several600
works explored the use physical memory address based differentiated hardware designs (mostly601
cache) in a more coarse-grained manner (i.e., memory segments, page, and task granularity).602
Kumar et al, proposed a criticality-aware cache design, called Least Critical (LC), which603
includes a memory criticality-aware extension to LRU replacement policy [27]. The LC604
cache’s replacement policy is similar to the replacement policy we assumed in this work605
(Algorithm 1), while its memory tagging mechanism, which uses a fixed number of specialized606
range registers, does not allow flexible and fine-grained memory tagging. Therefore, our607
static analysis method can be directly applicable to analyze the LC cache. PRETI [28] also608
proposes a criticality-aware cache design but it focuses on shared cache for SMT hardware,609
while we focus on private caches. More recently, OS-level page-granularity memory tagging610
and supporting multicore architecture designs (including a new cache design) have been611
explored to provide efficient hardware isolation (incl. cache isolation) in multicore [10].612
ECRTS 2019
20:18 Impact of DM-LRU on WCET: a Static Analysis Approach
Static Cache Analysis. There exists a broad literature on static cache analysis [32, 50].613
With respect to existing literature, this work is closely related to approaches that propose614
abstract interpretation-based cache analysis. This approach was initially proposed in [1, 12].615
These works illustrate LRU analysis and hit/miss classification using may- and must-analysis.616
The work in [12] also proposes a persistence analysis based on abstract states, which was found617
to be unsafe and for which a fix was proposed in [8, 25]. We base our DM-LRU extension on618
the may- and must-analysis proposed in [12], but use the improved formalization in [14]. In619
order to perform access classification in case of loops we use an approach similar to virtual620
inlining & virtual unrolling (VIVU) originally proposed in [34]. A large body of works has621
considered cache replacement policies other than LRU. These include FIFO [14, 15, 18],622
MRU [17], Pseudo-LRU [16]. Comparatively less work has been produced to analyze non-623
inclusive [36, 21] as well as inclusive [22] multi-level caches. With respect to these works,624
the proposed methodology set itself apart because it focuses on the impact on the WCET625
of designer-driven selection of frequently accessed memory blocks. In this sense, proposed626
approach can be used to analyze caches that support the definition of touch-and-lock cache627
lines, under the assumption that no more than A blocks are simultaneously locked on any628
set, where A is the associativity of the cache.629
Cache Locking and Scratchpad Memory. Some COTS cache designs [2, 7, 13]630
support selective cache locking, which prevents evictions for certain programmer selected631
cache-lines. Exploting the feature, various static and dynamic cache locking schemes for both632
instruction and data caches have been investigated [5, 40, 39, 48, 35]. In [47, 48], for instance,633
cache locking statements are inserted in the task’s execution flow at compilation time, when634
the uncertainty about the memory locations being accessed negatively impacts the static635
WCET analysis. Some recent works combined cache locking with cache partitioning to636
improve task WCET in multicore [30, 43, 33]. As an alternative to cache, scratchpad memory637
has received significant attention in the real-time systems community for its predictability638
benefits [46, 9, 49, 44]. More recently, a technique called invalidation-driven allocation639
(IDA) [26] was proposed to achieve the same level of determinism of a locked cache in640
spite of lack of hardware-assisted locking primitives. IDA can be used as long as precise641
invariants on the size of an application’s working set hold. To overcome its high programming642
complexity, however, many researchers proposed various compiler-based techniques. In [44],643
for instance, a sophisticated compiler-based technique is proposed to break each task into644
intervals and at the beginning of each interval, the required memory blocks of the interval645
are prefetched onto a scratchpad memory via a DMA controller without blocking the task646
execution. Dividing a task into a sequence of well-defined memory and computation phases647
was originally proposed in [37, 52]. In both cache locking and scratchpad memory based648
techniques, a common limitation is the overhead of explicitly executing additional instructions649
(prefetch, lock/unlock, or data movement to/from scratchpad). Furthermore, these additional650
instructions are context sensitive in the sense that they must be executed before actual accesses651
occur, and if they are executed too early, they can negatively impact both performance and652
WCET. In contrast, our approach is context insensitive in the sense that, once DM blocks653
are flagged, actual allocation and replacement are automatically performed by the hardware654
(cache controller) without additional instruction execution overhead.655
9 Conclusion656
In this paper, we presented the DM-LRU cache replacement policy and proposed an abstract657
interpretation-based analysis for DM-LRU. We implemented the proposed analysis and DM-658
LRU support in the Heptane static WCET analysis tool. Using the Heptane, we evaluated659
the WCET impacts of our DM-LRU based approach on a number of benchmarks. The results660
show that our DM-LRU approach can provide lower task WCETs with less performance661
R. Mancuso, H. Yun and I. Puaut 20:19
Figure 7 Fragment of process CFG that leads to abstract DM-LRU state q =
[{a, b}], [{c}, {d}, {e, f}, {g}].
overhead and programming complexity, compared to the standard LRU and cache locking662
based approaches.663
Acknowledgements664
We are especially grateful to Daniel Grund for making his research thesis [14] promptly665
available to us. This research is supported in part by NSF CNS 1718880. Any opinions,666
findings, and conclusions or recommendations expressed in this publication are those of the667
authors and do not necessarily reflect the views of the NSF.668
10 Appendix: May Analysis669
In the DM-LRU analysis framework, may-analysis is once again performed by considering670
abstract cache states. Recall that may-analysis keeps track of the lower-bound on the age671
of each addressable memory block. There are a number of differences compared to the672
analytical tools used for must analysis. In may-analysis it is necessary to keep track of both673
D ∈ {0, . . . , A} and B ∈ {0, . . . , A}. Here, the meaning of D and B changes. In this case,674
D represents the maximum lower-bound of any possibly cached DM block. Conversely, B675
captures the minimum lower-bound on the DM-LRU age of any BE block. It may be the case676
that B +D > A in order to correctly abstract the age lower-bound resulting from multiple677
execution paths. It must hold however that A ≤ D +B ≤ 2A. It follows that the abstract678
domain for may-analysis DMLruwA is defined as:679
DMLru
w
A
:= {0, . . . , A} × {0, . . . , A} × B → {0, . . . , A− 1, A}. (11)680
An abstract state q ∈ DMLruwA is then represented as two sets of memory blocks, for instance:681
q = [{a, b}], [{c}, {d}, {e, f}, {g}] ∈ DMLruwA. In this example, we have D = 1, B = 4, A = 4.682
It follows that the upper-bound on the number of DM memory blocks is 1, and that blocks a683
and b have at least DM-LRU age 0, and may be marked as deterministic blocks. On the684
other hand, c is a best-effort memory block with DM-LRU age at least 0. It should not come685
as a surprise that in some states D +B > A. Consider the execution depicted in Figure 7686
that produces q. When execution reaches the end of the figure, there could be 0 or 1 DM687
blocks allocated in cache. Hence the upper-bound on the number of DM blocks has to be688
D = 1. On the other hand, the upper bound on the number of BE blocks is B = 4.689
The operator DMw(q, a) takes an abstract state q and a block a, and returns true if a690
may be allocated as a DM block in q. For ease of notation, we simply use DMw(a) when691
the considered abstract state is obvious. We define the operator DMw(q, a) as follows:692
DMw(q, bCL) :=
{
true if CL = DM ∧ q(b) < A
false otherwise.
(12)693
694
The update abstract transformer UwD for a DM memory access a can be defined as follows:695
U
w
D
(q, a) :=696
ECRTS 2019
20:20 Impact of DM-LRU on WCET: a Static Analysis Approach
D
′ ←
{
D + 1 if D < A ∧ BEw(a)
D + 1 if D < A ∧ ∃xDM 6= a : q(x) = q(a) = D − 1
D otherwise
, (13)697
B
′ ←
{
B − 1 if B > 0 ∧ q(a) ≥ A− B
B if B = 0 ∨ q(a) < A− B , (14)698
λb.

0 if b = a (a)
q(b) if b 6= a∧∣∣∣∣∣∣∣∣DMw(b) ∧ q(a) < q(b)BEw(b) ∧ q(a) < A− B (b)
q(b) + 1 if b 6= a∧∣∣∣∣∣∣∣∣DMw(b) ∧ q(a) ≥ q(b) ∧ q(b) < D′ − 1BEw(b) ∧ q(a) ≥ A− B ∧ q(b) < A− 1 (c)
A if b 6= a∧∣∣∣∣∣∣∣∣DMw(b) ∧ q(a) ≥ q(b) ∧ q(b) ≥ D′ − 1BEw(b) ∧ q(a) ≥ A− B ∧ q(b) ≥ A− 1 (d)
(15)699
700
701
Where D′ (B′, resp.) is the new value of D (B, resp.) after the update. Similarly, the702
update abstract transformer UwB for a best-effort memory access a can be defined as follows:703
U
w
B
(q, a) := (16)704
λb.

A− B if b = a (a)
q(b) if b 6= a ∧
∣∣∣∣∣∣DMw(b)
BEw(b) ∧ q(a) < q(b) (b)
q(b) + 1 if b 6= a ∧ BEw(b) ∧ q(a) ≥ q(b) ∧ q(b) < A− 1 (c)
A if b 6= a ∧ BEw(b) ∧ q(a) ≥ q(b) ∧ q(b) ≥ A− 1 (d)
(17)705
706
707
To clarify how the Uw operation transforms a given state , consider the abstract state708
q = [{a, b}], [{c}, {d}, {e, f}, {g}], where D = 1, B = 4. Assume that DM block h is accessed,709
whose DM-LRU age is currently A or higher. First, the value of D′ (B′, resp.) is computed710
as D′ = D + 1 (B′ = B − 1, resp.); next, {a, b} both satisfy the fourth condition in UwD711
—Equation 15, first case of (c); block c, d, e and f satisfy the fifth condition; g the seventh.712
The resulting updated abstract state is: q′ = [{h}, {a, b}], [{}, {c}, {d}, {e, f}]. Note that in713
the resulting state B = 3, hence the least lower-bound on any BE block is A−B = 1.714
May-analysis Join: The join abstract transformer for DM-LRU may-analysis is sym-715
metric to the join abstract transformer used for DM-LRU must-analysis. The joined state716
will contain all the blocks in the union of the joining states, each with the minimum age717
in any of the two states. Furthermore, D is taken as the maximum between the value of718
D in the joining states. Similarly, B is taken as the maximum between the value of B in719
the joining states. As such, after a join, it always holds that D + B ≤ 2A. Equation 18720
formalizes the Jw(q¯, p¯) abstract transformer:721
J
w(q¯, p¯) := D ← max{Dq¯, Dp¯}, B ← max{Bq¯, Bp¯}, λb.min{q¯(b), p¯(b)}. (18)722
To clarify the join operation, consider the state q = [{a, b}], [{c}, {d}, {e, f}, {g}] obtained723
in Figure 7, and the state q′ = [{h}, {a, b}], [{}, {c}, {d}, {e, f}] obtained as q′ = Uw(q, hDM )724
(i.e. by accessing the DM block h). If we were to join q with q′, the resulting state would be725
q′′ = [{a, b, h}, {}], [{c}, {d}, {e, f}, {g}].726
May-analysis Classification: It is possible to classify a memory access using a classi-727
fication function that will either return M for cache miss, or > in case access to a memory728
block cannot be guaranteed to be a miss given the current abstract state. The classification729
function of the may analysis is defined as:730
C
w(q, aCL) :=
{
M if q(a) = A
> otherwise. (19)731
732
R. Mancuso, H. Yun and I. Puaut 20:21
References733
1 M. Alt, C. Ferdinand, F. Martin, and R. Wilhelm. Cache behavior prediction by abstract734
interpretation. In Proceedings of the Third International Symposium on Static Analysis, SAS735
’96, pages 52–66, Berlin, Heidelberg, 1996. Springer-Verlag.736
2 ARM. PL310 Cache Controller Technical Reference Manual, Rev: r0p0, 2007.737
3 R. Banakar, S. Steinke, B. Lee, M. Balakrishnan, and P. Marwedel. Scratchpad memory: design738
alternative for cache on-chip memory in embedded systems. In Int. Symp. Hardware/Software739
Codesign (CODES+ISSS), pages 73–78. ACM, 2002.740
4 A. Bradbury, G. Ferris, and R. Mullins. Tagged memory and minion cores in the lowrisc soc.741
Memo, University of Cambridge, 2014.742
5 M. Campoy, A. P. Ivars, and J. Busquets-Mataix. Static use of locking caches in multitask743
preemptive real-time systems. In Real-Time Embedded Systems Workshop (Satellite of the744
IEEE Real-Time Systems Symposium), pages 1–6, 2001.745
6 S. Chen, J. Xu, N. Nakka, Z. Kalbarczyk, and R. K. Iyer. Defeating memory corruption746
attacks via pointer taintedness detection. In Dependable Systems and Networks (DSN), pages747
378–387. IEEE, 2005.748
7 N. Corp. Variable SMP – A Multi-Core CPU Architecture for Low Power and High Performance.749
Technical report, Nvidia, 2011.750
8 C. Cullmann. Cache persistence analysis: Theory and practice. ACM Trans. Embed. Comput.751
Syst., 12(1s):40:1–40:25, Mar. 2013.752
9 J.-F. Deverge and I. Puaut. WCET-directed dynamic scratchpad memory allocation of data.753
In Euromicro Conference on Real-Time Systems (ECRTS), pages 179–190. IEEE, 2007.754
10 F. Farshchi, P. K. Valsan, R. Mancuso, and H. Yun. Deterministic memory abstraction and755
supporting multicore system architecture. In Euromicro Conf. Real-Time Syst. (ECRTS).756
IEEE, 2018.757
11 C. Ferdinand, F. Martin, R. Wilhelm, and M. Alt. Cache behavior prediction by abstract758
interpretation. Sci. Comput. Program., 35(2-3):163–189, Nov. 1999.759
12 C. Ferdinand and R. Wilhelm. Efficient and precise cache behavior prediction for real-760
timesystems. Real-Time Syst., 17(2-3):131–181, Dec. 1999.761
13 Freescale. e500mc Core Reference Manual, 2012.762
14 D. Grund. Static Cache Analysis for Real-Time Systems: LRU, FIFO, PLRU. epubli, 2012.763
15 D. Grund and J. Reineke. Precise and Efficient FIFO-Replacement Analysis Based on Static764
Phase Detection. In Euromicro Conference on Real-Time Systems (ECRTS), pages 155–164,765
July 2010.766
16 D. Grund and J. Reineke. Toward precise PLRU cache analysis. In International Workshop767
on Worst-Case Execution Time Analysis (WCET), pages 23–35, 2010.768
17 N. Guan, M. Lv, W. Yi, and G. Yu. WCET Analysis with MRU Caches: Challenging LRU for769
Predictability. In Real Time and Embedded Technology and Applications Symposium (RTAS),770
pages 55–64, April 2012.771
18 N. Guan, X. Yang, M. Lv, and W. Yi. FIFO Cache Analysis for WCET Estimation: A772
Quantitative Approach. In Design, Automation and Test in Europe (DATE), pages 296–301,773
San Jose, CA, USA, 2013. EDA Consortium.774
19 J. Gustafsson, A. Betts, A. Ermedahl, and B. Lisper. The Mälardalen WCET benchmarks –775
past, present and future. In B. Lisper, editor, Procedings of the 10th International Workshop776
on Worst-Case Execution Time Analysis (WCET’2010), pages 137–147, Brussels, Belgium,777
July 2010. OCG.778
20 S. Hahn and J. Reineke. Design and analysis of sic: A provably timing-predictable pipelined779
processor core. In Real-Time Systems Symposium (RTSS), pages 469–481. IEEE, 2018.780
21 D. Hardy and I. Puaut. WCET Analysis of Multi-level Non-inclusive Set-Associative Instruction781
Caches. In Real-Time Systems Symposium (RTSS), pages 456–466, Nov 2008.782
22 D. Hardy and I. Puaut. Wcet analysis of instruction cache hierarchies. J. Syst. Archit.,783
57(7):677–694, Aug. 2011.784
ECRTS 2019
20:22 Impact of DM-LRU on WCET: a Static Analysis Approach
23 D. Hardy, B. Rouxel, and I. Puaut. The Heptane Static Worst-Case Execution Time Estimation785
Tool. In 17th International Workshop on Worst-Case Execution Time Analysis (WCET786
2017), volume 8 of International Workshop on Worst-Case Execution Time Analysis, page 12,787
Dubrovnik, Croatia, June 2017.788
24 A. Joannou, J. Woodruff, R. Kovacsics, S. W. Moore, A. Bradbury, H. Xia, R. N. Watson,789
D. Chisnall, M. Roe, B. Davis, et al. Efficient tagged memory. In International Conference on790
Computer Design (ICCD), pages 641–648. IEEE, 2017.791
25 L. Ju, S. Chakraborty, and A. Roychoudhury. Accounting for cache-related preemption delay in792
dynamic priority schedulability analysis. In Design, Automation and Test in Europe (DATE),793
pages 1623–1628, San Jose, CA, USA, 2007. EDA Consortium.794
26 T. Kloda, M. Solieri, R. Mancuso, N. Capodieci, P. Valente, and M. Bertogna. Deterministic795
memory hierarchy and virtualization for modern multi-core embedded systems. In 2019 IEEE796
Real-Time and Embedded Technology and Applications Symposium (RTAS), Montreal, Canada,797
April 2019.798
27 N. C. Kumar, S. Vyas, R. K. Cytron, C. D. Gill, J. Zambreno, and P. H. Jones. Cache design799
for mixed criticality real-time systems. In Computer Design (ICCD), pages 513–516. IEEE,800
2014.801
28 B. Lesage, I. Puaut, and A. Seznec. Preti: Partitioned real-time shared cache for mixed-802
criticality real-time systems. In Real-Time and Network Systems (RTNS), pages 171–180.803
ACM, 2012.804
29 Y. S. Li and S. Malik. Performance analysis of embedded software using implicit path805
enumeration. IEEE Transactions on Computer-Aided Design of Integrated Circuits and806
Systems, 16(12):1477–1487, Dec 1997.807
30 T. Liu, Y. Zhao, M. Li, and C. J. Xue. Task assignment with cache partitioning and locking808
for wcet minimization on mpsoc. In 2010 39th Int. Conf. Parallel Processing, pages 573–582,809
Sept 2010.810
31 T. Lundqvist and P. Stenström. Timing anomalies in dynamically scheduled microprocessors. In811
Proceedings of the 20th IEEE Real-Time Systems Symposium, RTSS ’99, pages 12–, Washington,812
DC, USA, 1999. IEEE Computer Society.813
32 M. Lv, N. Guan, J. Reineke, R. Wilhelm, and W. Yi. A survey on static cache analysis for814
real-time systems. Leibniz Transactions on Embedded Systems, 3(1):05–1, 2016.815
33 R. Mancuso, R. Dudko, E. Betti, M. Cesati, M. Caccamo, and R. Pellizzoni. Real-time cache816
management framework for multi-core architectures. In Real-Time and Embedded Technology817
and Applicat. Symp. (RTAS). IEEE, 2013.818
34 F. Martin, M. Alt, R. Wilhelm, and C. Ferdinand. Analysis of loops. In International Conference819
on Compiler Construction (CC), pages 80–94, London, UK, UK, 1998. Springer-Verlag.820
35 S. Mittal. A survey of techniques for cache locking. Transactions on Design Automation of821
Electronic Systems (TODAES), 21(3):49:1–49:24, May 2016.822
36 F. Mueller. Timing predictions for multi-level caches. In In ACM SIGPLAN Workshop on823
Language, Compiler, and Tool Support for Real-Time Systems, pages 29–36, 1997.824
37 R. Pellizzoni, E. Betti, S. Bak, G. Yao, J. Criswell, M. Caccamo, and R. Kegley. A predictable825
execution model for COTS-based embedded systems. In Real-Time and Embedded Technology826
and Applicat. Symp. (RTAS), pages 269–279. IEEE, 2011.827
38 K. Piromsopa and R. J. Enbody. Secure bit: Transparent, hardware buffer-overflow protection.828
Transactions on Dependable and Secure Computing, 3(4):365–376, 2006.829
39 I. Puaut. Wcet-centric software-controlled instruction caches for hard real-time systems. In830
Euromicro Conference on Real-Time Systems (ECRTS), pages 10–pp. IEEE, 2006.831
40 I. Puaut and D. Decotigny. Low-complexity algorithms for static cache locking in multitasking832
hard real-time systems. In Real-Time Systems Symposium (RTSS), pages 114–123. IEEE,833
2002.834
41 J. Reineke. Caches in WCET analysis: predictability, competitiveness, sensitivity. epubli,835
2008.836
42 J. Reineke, D. Grund, C. Berg, and R. Wilhelm. Timing predictability of cache replacement837
policies. Real-Time Syst., 37(2):99–122, November 2007.838
R. Mancuso, H. Yun and I. Puaut 20:23
43 A. Sarkar, F. Mueller, and H. Ramaprasad. Static task partitioning for locked caches in839
multicore real-time systems. ACM Trans. Embed. Comput. Syst., 14(1):4:1–4:30, Jan. 2015.840
44 M. R. Soliman and R. Pellizzoni. WCET-Driven dynamic data scratchpad management with841
compiler-directed prefetching. In Euromicro Conference on Real-Time Systems (ECRTS),842
volume 76, pages 24:1–24:23, 2017.843
45 C. Song, H. Moon, M. Alam, I. Yun, B. Lee, T. Kim, W. Lee, and Y. Paek. HDFI: hardware-844
assisted data-flow isolation. In Symposium on Security and Privacy (SP), pages 1–17. IEEE,845
2016.846
46 V. Suhendra, T. Mitra, A. Roychoudhury, and T. Chen. WCET centric data allocation to847
scratchpad memory. In Real-Time Systems Symposium (RTSS), pages 10–pp. IEEE, 2005.848
47 X. Vera, B. Lisper, and J. Xue. Data cache locking for higher program predictability. SIG-849
METRICS Perform. Eval. Rev., 31(1):272–282, June 2003.850
48 X. Vera, B. Lisper, and J. Xue. Data cache locking for tight timing calculations. ACM Trans.851
Embed. Comput. Syst., 7(1):4:1–4:38, Dec. 2007.852
49 J. Whitham and N. Audsley. Studying the applicability of the scratchpad memory management853
unit. In Real-Time and Embedded Technology and Applications Symposium (RTAS), pages854
205–214. IEEE, 2010.855
50 R. Wilhelm, J. Engblom, A. Ermedahl, N. Holsti, S. Thesing, D. Whalley, G. Bernat, C. Fer-856
dinand, R. Heckmann, T. Mitra, F. Mueller, I. Puaut, P. Puschner, J. Staschulat, and857
P. Stenström. The worst-case execution-time problem - overview of methods and survey of858
tools. ACM Trans. Embedded Comput. Syst. (TECS), 7(3), 2008.859
51 J. Woodruff, R. N. Watson, D. Chisnall, S. W. Moore, J. Anderson, B. Davis, B. Laurie, P. G.860
Neumann, R. Norton, and M. Roe. The cheri capability model: Revisiting risc in an age of861
risk. In International Symposium on Computer Architecture (ISCA), 2014.862
52 G. Yao, R. Pellizzoni, S. Bak, E. Betti, and M. Caccamo. Memory-centric scheduling for863
multicore hard real-time systems. Real-Time Syst., 48(6):681–715, 2012.864
ECRTS 2019
