Cache-Persistence-Aware Response-Time Analysis for Fixed-Priority Preemptive Systems by Rashid, Syed Aftab et al.
  
 
 
 
 
Cache-Persistence-Aware Response-Time 
Analysis for Fixed-Priority Preemptive 
Systems 
 
 
 
 
Conference Paper 
CISTER-TR-160503 
 
 
Syed Aftab Rashid 
Geoffrey Nelissen 
Damien Hardy 
Benny Akesson 
Isabelle Puaut 
Eduardo Tovar 
 
Conference Paper CISTER-TR-160503 Cache-Persistence-Aware Response-Time Analysis for  ... 
© CISTER Research Center 
www.cister.isep.ipp.pt   
1 
 
Cache-Persistence-Aware Response-Time Analysis for Fixed-Priority Preemptive 
Systems 
Syed Aftab Rashid, Geoffrey Nelissen, Damien Hardy, Benny Akesson, Isabelle Puaut, Eduardo Tovar 
CISTER Research Center 
Polytechnic Institute of Porto (ISEP-IPP) 
Rua Dr. António Bernardino de Almeida, 431 
4200-072 Porto 
Portugal 
Tel.: +351.22.8340509, Fax: +351.22.8321159 
E-mail:  
http://www.cister.isep.ipp.pt 
 
Abstract 
A task can be preempted by several jobs of higherpriority tasks during its response time. Assuming the worst-
casememory demand for each of these jobs leads to pessimistic worstcase response time (WCRT) estimations. 
Indeed, there is a bigchance that a large portion of the instructions and data associatedwith the preempting task 
τj are still available in the cache when τj releases its next jobs. Accounting for this observation allowsthe 
pessimism of WCRT analysis to be significantly reduced,which is not considered by existing work. 
The four main contributions of this paper are: 1) The conceptof persistent cache blocks is introduced in the 
context of WCRT analysis, which allows re-use of cache blocks to be captured, 2) A cache-persistence-aware WCRT 
analysis for fixed-prioritypreemptive systems exploiting the PCBs to reduce the WCRTbound, 3) An multi-set 
extension of the analysis that furtherimproves the WCRT bound, and 4) An evaluation showing thatour cache-
persistence-aware WCRT analysis results in up to 10% higher schedulability than state-of-the-art approaches. 
 
Cache-Persistence-Aware Response-Time Analysis
for Fixed-Priority Preemptive Systems
Syed Aftab Rashid∗, Geoffrey Nelissen∗, Damien Hardy‡, Benny Akesson∗, Isabelle Puaut‡, Eduardo Tovar∗
∗ CISTER/INESC TEC, ISEP, Polytechnic Institute of Porto, Portugal
‡ University of Rennes 1/IRISA, France
Abstract—A task can be preempted by several jobs of higher
priority tasks during its response time. Assuming the worst-case
memory demand for each of these jobs leads to pessimistic worst-
case response time (WCRT) estimations. Indeed, there is a big
chance that a large portion of the instructions and data associated
with the preempting task τj are still available in the cache when
τj releases its next jobs. Accounting for this observation allows
the pessimism of WCRT analysis to be significantly reduced,
which is not considered by existing work.
The four main contributions of this paper are: 1) The concept
of persistent cache blocks is introduced in the context of WCRT
analysis, which allows re-use of cache blocks to be captured,
2) A cache-persistence-aware WCRT analysis for fixed-priority
preemptive systems exploiting the PCBs to reduce the WCRT
bound, 3) An multi-set extension of the analysis that further
improves the WCRT bound, and 4) An evaluation showing that
our cache-persistence-aware WCRT analysis results in up to 10%
higher schedulability than state-of-the-art approaches.
I. INTRODUCTION
The existing gap between the processor and main memory
operating speeds necessitates the use of intermediate cache
memories to accelerate the average access time to instructions
and data required by the processor. The introduction of cache
memories in modern computing platforms causes big varia-
tions in the execution time of each instruction, depending on
whether the instruction and data it requires are already loaded
in the cache (cache hit) or not (cache miss).
Recent years have focused on analyzing the impact of
preemptions on the worst-case execution time (WCET) and
worst-case response time (WCRT) of tasks in preemptive sys-
tems. Indeed, the preempted tasks may suffer additional cache
misses if its memory blocks are evicted from the cache during
the execution of preempting tasks. These evictions cause extra
accesses to main memory, which result in additional delays in
the task execution. This extra cost is usually referred to as
cache-related preemption delays (CRPDs).
Many different approaches have been proposed to counter
the effect of preemptions. Some (e.g., [1], [2]) use non-
preemptive or limited-preemption scheduling schemes to elim-
inate or reduce the number of preemptions. Others [3]–[9] use
information about the tasks’ memory access patterns to bound
and incorporate preemption costs into the WCRT analysis.
These approaches differ in whether they consider the memory
access patterns of the preempted task [4], the preempting
tasks [3], [5], or both [5]–[9]. Regardless of this distinction,
they all still result in pessimistic WCRT bounds due to the
fact that they only consider the effect of preemptions on the
memory demand of the preempted task, but not the variation
in memory demand of the preempting tasks. Instead, they all
assume that every job of a high priority task τj preempting a
low priority task τi will ask for its maximum memory demand,
i.e., its worst-case memory demand in isolation. Although
this may be true for the first job released by the preempting
task τj , subsequent jobs of τj may re-use most of the data
and instructions that were already loaded in the cache during
the execution of its previous jobs. Analyses that exploit this
observation have been proposed for both direct-mapped [10]
and set-associative caches [11]. However, these are limited to
non-preemptive task sets under static scheduling and do not
apply to preemptive systems with commonly used priority-
based scheduling schemes.
This work addresses this issue by proposing a novel anal-
ysis that captures the re-use of cache blocks between job
executions, to reduce the negative impact of preemptions on
the WCRT bound. The approach presented in this work is
orthogonal to the state-of-the-art methods used for CRPD
calculations and can be integrated with any of these methods.
The four main contributions of the paper are as follows: 1)
We introduce the concept of persistent cache blocks (PCBs)
in the context of WCRT analysis. PCBs are cache blocks that,
once loaded into the cache by a task τi will never be evicted
when τi runs in isolation. This concept allows us to capture
the re-use of cache blocks between executions of the same task
and reduce the memory demand for subsequent jobs of a task,
making its memory demand variable, 2) A cache-persistence-
aware WCRT analysis for fixed-priority preemptive systems
that exploits the variable memory demand of preempting tasks
to reduce the WCRT bound, 3) An extension of the proposed
WCRT analysis to a multi-set approach that further improves
the WCRT bound by considering the total memory demand
of the preempting tasks over a task response time rather
than the worst-case memory demand of each independent job,
and 4) An experimental evaluation showing that our cache-
persistence-aware WCRT analysis results in upto 10% higher
schedulability than state-of-the-art approaches.
The rest of the paper is organized as follows. Section II
presents the system model, while Section III discusses the
state-of-the-art in CRPD calculation. Section IV then motivates
our approach and introduces the key concept of persistent
cache blocks. The basic cache-persistence-aware WCRT anal-
ysis is presented in Section V and is then extended to a multi-
set approach in Section VI. Section VII explains how the
inputs for our approach is obtained using static analysis, before
experimental results are presented in Section VIII. Lastly,
conclusions are drawn in Section IX.
1
II. SYSTEM MODEL
This work targets single-core platforms with a single level
(L1) data/instruction cache. The cache is assumed to be direct-
mapped, which means that each memory block in the main
memory can be mapped to only one block in the cache.
We consider sporadic tasks with constrained deadlines
where each task has a fixed priority. Any priority assignment
scheme (e.g., Rate Monotonic [12]) is acceptable. We also
assume that the tasks are independent and do not suspend
themselves during their execution. A task τi is defined by a
triplet (Ci, Ti, Di), where Ci is the WCET of τi, Ti is its
minimum inter-arrival time and Di is the relative deadline of
each instance (or job) of τi. We assume that the tasks have
constrained deadlines, i.e., Di ≤ Ti. Similarly to [13], we
further decompose each task WCET into separate terms for
processing demand and memory demand, respectively. Here,
we use two terms, namely, the worst-case processing demand
Pi and the worst-case memory demand MD i. Pi denotes
the worse-case execution time of τi considering that every
memory access is a cache hit. Consequently, it only accounts
for execution requirements of the task and does not include
the time needed to fetch data and instructions from the main
memory. MD i is the worst-case memory demand of any job of
task τi, that is, it is the maximum time during which any job
of τi is performing memory operations. The values for Ci, Pi
and MD i are calculated assuming τi executes in isolation. It is
also important to note that the worst-case processing demand
and the worst-case memory demand may not necessarily be
experienced on the same execution path of τi, as a result it
holds that Ci ≤ Pi +MD i.
The WCRT of task τi is denoted by Ri and is defined as
the longest time between the arrival and the completion of
any of its jobs. A task τi is said to be schedulable if Ri ≤
Di. Similarly, a task set is schedulable if all of its tasks are
schedulable.
In this work, we consider that preemption costs only refer
to additional cache reloads due to those preemptions. Other
overheads, e.g. due to context switches, scheduler invocations
and pipeline flushes are assumed to be included in the WCET.
The worst-case reload time of a cache block is denoted by
dmem .
We define the following task sets:
• hp(i): the set of tasks with higher priority than τi.
• hep(i): the set of tasks with priorities higher than or equal
to τi.
• aff(i, j): the set of tasks with priorities higher than or equal
to the priority of τi (including τi), but strictly lower than
that of τj . This set contains the intermediate priority tasks,
which may affect the response time of τi, but may also be
preempted by τj .
III. BACKGROUND
This section discusses state-of-the-art methods in more
detail and establish the formal framework on which we later
build our analysis. As previously mentioned, when a task τi is
preempted by a higher priority task τj , it is likely that τj will
evict memory blocks of τi from the cache. On resumption, τi
might consequently have to reload cache blocks from the main
memory along with its normal memory requirements. This
CRPD caused by τj on τi is denoted by γi,j . Several methods
have been proposed in the literature to compute γi,j . In one
of the earlier works, Lee et al. [4] introduced the concept
of useful cache block (UCB), and defined it as , “a memory
block m is called a useful cache block (UCB) at program
point P , if it is cached at P and will be reused at program
point Q that may be reached from P without eviction of m”.
This definition was later improved by Altmeyer et al. [14],
however in this work we only use the basic concept provided
in [4]. Lee et al. [4] used the maximum number of UCBs
among all the tasks in aff(i, j) to upper bound the preemption
cost γi,j . Busquets et al. [3] and Tomiyama et al. [5] rather
used the notion of evicting cache block (ECB), i.e, any cache
block accessed during the execution of the task and which
can then evict the memory block cached by another task, to
upper bound the preemption cost that can be caused by each
preempting task. Other approaches by Tan and Mooney [7],
Staschulat et al. [6] and Altmeyer et al. [8] used both the
UCBs of the preempted tasks and ECBs of the preempting
tasks in order to come up with more precise bounds on the
preemption cost. Notably, the ECB and UCB-union and the
multi-set approaches presented in [8] and [9] dominate all the
existing approaches for CRPD calculation. We first detail the
ECB-union approach and then the UCB-union multi-set. The
formulations for the UCB-union and ECB-union multi-set can
be found in [9].
The ECB-union approach [8] uses the ECBs of all tasks
in hep(j) maximized over the UCBs of tasks in aff(i, j) to
calculate the preemption cost γi,j . The resulting value for the
preemption cost, denoted as γecbi,j , is given by
γecbi,j = dmem × max
∀k∈aff(i,j)
(∣∣∣UCBk ∩ ( ⋃
∀l∈hep(j)
ECB l
)∣∣∣
)
(1)
where dmem is the time required to reload one memory block
from the main memory to the cache, and UCBk and ECB j
are the sets of UCBs and ECBs of task τk and τj , respectively.
The preemption cost can then be accounted for in the WCRT
analysis using the following formulation:
Recbi = Ci +
∑
∀j∈hp(i)
⌈
Recbi
Tj
⌉
× (Cj + γ
ecb
i,j ) (2)
When combined, the ECB and UCB-union approaches
provide a reasonably precise upper bound on the preemption
cost. However, it can also lead to over-estimations in different
situations, as shown in [9]. This is due to the fact that both
ECB and UCB-union approaches do not take into account the
actual number of preemptions of each low and intermediate
priority task. For instance, with these approaches it is assumed
that a high priority task τj can preempt any task τk ∈ aff(i, j)
the same number of times it can preempt τi. This can only be
true if τk = τi, and will lead to over-estimation in all other
cases where the cost of τj preempting τk is higher than the
preemption cost of τj on τi.
To reduce this pessimism associated to the ECB and UCB-
union approaches, Altmeyer et al. [9] proposed two new so-
2
lutions, namely, the UCB-union multi-set and the ECB-union
multi-set approaches. These multi-set versions of the UCB-
union and ECB-union approaches additionally take into ac-
count the maximum number of jobs Ej(Ri)
def
=
⌈
Ri
Tj
⌉
that each
higher priority task τj can release during the response time of
τi and the number of preemptions of each low and intermediate
priority task by τj , i.e., Ej(Rk)Ek(Ri)
def
=
⌈
Rk
Tj
⌉
×
⌈
Ri
Tk
⌉
.
Under this framework, the WCRT equation becomes:
Rmuli = Ci +
∑
∀j∈hp(i)
⌈
Rmuli
Tj
⌉
× Cj +
∑
∀j∈hp(i)
γmuli,j (3)
where γmuli,j accounts for the total preemption cost that can be
caused by all jobs of τj released during the response time of
τi. We detail the UCB-union multi-set approach and refer the
reader to [9] for the ECB-union multi-set formulation. Using
the UCB-union multi-set approach γmuli,j is upper bounded by
γucb−mi,j defined as follows:
γucb−mi,j = dmem ×
∣∣Mucbi,j ∩Mecbi,j ∣∣ (4)
where Mucbi,j and M
ecb
i,j are multi-sets defined as
Mucbi,j =
⋃
∀k∈aff(i,j)

 ⋃
Ej(Rk)Ek(Ri)
UCBk

 (5)
and
Mecbi,j =
⋃
Ej(Ri)
ECB j (6)
Here, Mucbi,j is a multi-set comprising sets of UCBs of
all low and intermediate priority tasks ∈ aff(i, j) added
Ej(Rk)Ek(Ri) times, i.e., the maximum number of times τj
can preempt each τk during the response time of τi. Similarly,
Mecbi,j is a multi-set comprising the set of ECBs of all jobs of
τj executing within the response time of τi. The final value
of the preemption cost γucb−mi,j comes from the intersection of
both these multi-sets.
The construction of the ECB-union multi-set approach is
analogous to the UCB-union multi-set approach. Note that the
ECB-union multi-set approach dominates the ECB-union ap-
proach [8], while the UCB-union multi-set approach dominates
the UCB-union approach [7]. Yet, it is shown in [9] that the
ECB-union and UCB-union multi-set approaches are incom-
parable. For a more detailed description of the formulation of
Equations (2) to (6), the reader is referred to [9].
IV. PROBLEM DEFINITION
In this section, first we provide a basic example to affirm
the motivation behind this work. Later, using this example as
a base we provide some useful definitions that will be used in
rest of the paper.
A. Motivational Example
As presented in the previous section, the impact of a high
priority task τj on the WCRT of any lower priority task τi
can be estimated in a fairly accurate manner by analyzing
the mapping of UCBs and ECBs in the cache. However, the
Fig. 1: Schedule and cache contents for a taskset {τ1, τ2} with
C1 = 100, C2 = 400, MD1 = 60, MD2 = 80, ECB1 =
{5, 6, 7, 8, 9, 10}, ECB2 = {1, 2, 3, 4, 5, 6}, UCB1 = {6, 7},
UCB2 = {5, 6}, PCB1 = {5, 6, 7, 8, 10} and PCB2 =
{1, 2}. The schedule assumes that τ1 releases its first job with
an offset of 100 time units.
impact of τi on the memory demand of τj is ignored during
the WCRT analysis of τi. Yet, high priority tasks may often
execute more than one job during the response time of a lower
priority task. Therefore, to accurately estimate the WCRT of
a low priority task τi, one must consider the impact of the
preempted tasks on the memory demand of each job released
by the preempting tasks. In the literature, this is dealt with
by assuming that the memory demand for each job of a high
priority task τj executing within the response time of a low
priority task τi is always maximum, i.e, equal to the maximum
memory demand MDj . Following that assumption, the total
memory overhead MOi that must be accounted by τi during
its worst-case response time is upper bounded by the following
equation derived in [15].
MOi =MDi +
∑
∀j∈hp(i)
⌈
Ri
Tj
⌉
× (MDj + γi,j) (7)
There is a significant level of pessimism involved in Equa-
tion (7), as we will demonstrate using the example below.
Example 1. Consider the two tasks τ1 and τ2 (where τ1 has
a higher priority than τ2) presented in Figure 1. We assume
that the time dmem needed to access the main memory and
load a memory block to the cache is equal to 10 time units
and that the memory demand of τ1 and τ2 are MD1 = 60
and MD2 = 80
1, respectively. We also assume that memory
block {9} accessed by τ1 contains data that must be updated
at the beginning of the execution of each of its jobs. Figure 1
depicts a possible schedule together with the evolution of the
cache contents over time. The memory blocks that must be
loaded/reloaded from the main memory after each preemption
or resumption are shown in bold with a bigger font size in
Figure 1.
1Note that because the same cache block may be used by several memory
blocks of the same task τi, the worst-case memory demand MDi of τi may
be larger than the number of ECBs of τi multiplied by dmem.
3
Initially, the cache is empty and τ2 loads all its ECBs from
the main memory as soon as it starts to execute. When τ1
preempts τ2 for the first time, it also loads all its ECBs into the
cache with a memory demand ofMD1 = 60. Since there is an
overlap between the mapping of ECBs of τ1 and the mapping
of UCBs of τ2 in the cache, τ1 evicts some of the useful cache
blocks of τ2. In turn, when τ2 resumes its execution, it has to
account for γ2,1 = 2 × dmem = 20, in order to load cache
blocks {5, 6} again from main memory. However, when the
second job of τ1 preempts τ2, one can notice that it no longer
needs to reload all of its ECBs. In fact, most of the memory
blocks needed by τ1 are still in the cache. As a consequence,
τ1 must only reload memory blocks {5, 6}, which have been
evicted by τ2, as well as memory block {9} that must be
reloaded for each new job execution of τ1. The same scenario
happens for all jobs released by τ1, except the first one. The
actual memory demand for the second and third job of τ1 is
hence much less (i.e., 30) than MD1 = 60, illustrating that it
is not constant across all job executions.
In the presented example, memory blocks {5, 6, 7, 8, 10} are
called persistent cache blocks (PCBs), as they are never evicted
from the cache once loaded when τ1 executes in isolation.
In contrast, cache block {9} is a non-persistent cache block
(nPCB). nPCBs may be cache blocks that are shared by several
memory blocks of the same task, or simply data (e.g., sensor
readings, value on an input port, global shared data) that must
be reloaded before each access. One must note that PCBs and
nPCBs are different from the notions of UCBs and ECBs in
the sense that it does not matter if they are referenced more
than once during a single execution of a task. However, a PCB
must never be evicted from the cache by the task itself once
it is fetched from main memory.
The state-of-the-art does not consider PCBs while calcu-
lating the memory overhead suffered by a task τi in case
of preemptions. This results in pessimistic memory overhead
evaluations and hence pessimistic WCRT computations. This
can easily be shown using the example in Figure 1. If τ2’s
memory overhead is computed using Equation (7), one would
get:
MO2 =MD2+3×MD1+3×γ2,1 = 80+3×60+3×20 = 320
Equation (7) considers the worst-case memory demand, i.e.,
MD1 for each job of τ1 that executes during the response time
of τ2. As we have shown in Example 1, the actual memory
demand of the second and third job of τ1 is in fact much less.
Considering the PCBs of τ1 while calculating the memory
overhead MO2, the resulting value is given as:
MO2 =MD2 +MD1 + 2× (MD1 − |PCB1| × dmem)
+ 3× γ2,1
= 80 + 60 + 2× (60− 5× 10) + 3× 20 = 220
This simple example highlights the necessity to consider
PCBs when calculating the memory demand and hence the
WCRT of a task.
B. Problem Formalization
The previous example casually introduced the notions of
PCB and nPCB. We now formally define those two types of
cache blocks associated to the execution of a task τi.
Definition 1 (Persistent cache block). A memory block of a
task τi is persistent if once loaded by τi, it will never be
invalidated or evicted from the cache when τi executes in
isolation.
Definition 2 (Non-persistent cache block). A non-persistent
cache block (nPCB) of task τi is an ECB that is not a PCB.
That is, it is a memory block that may need to be reloaded at
some point during the execution of τi (in the same or different
jobs), even when τi executes in isolation.
The sets of PCBs and nPCBs associated to a task τi are
denoted by PCB i and nPCB i, respectively. It follows from
the two previous definitions that each cache block associated
to a task τi (ECB i) is either a PCB or a nPCB, hence the
following two relations:
PCB i ∪ nPCB i = ECB i (8)
PCB i ∩ nPCB i = ∅ (9)
By Definition 1, if τi executes in isolation, a PCB is loaded
only once from the main memory and hence contributes only
once to the total memory demand of τi. Even though all the
ECBs of τi (i.e., PCBs and nPCBs) contribute to its worst-case
memory demand in isolation (i.e., MD i), only the nPCBs,
a subset of ECB i, must be loaded by more than one job
of τi. Considering the worst-case memory demand for each
job released by higher priority tasks than τi when computing
the WCRT of τi, as is implicitly the case in Equations (2)
and (3), is thus pessimistic. Therefore, we define the residual
memory demand of a task τi as the worst-case memory demand
of τi assuming that all the PCBs of τi are already in the
cache memory and therefore result in cache hits when being
accessed.
Definition 3 (Residual memory demand). The residual mem-
ory demand MDri of task τi is the worst-case memory demand
over all the jobs of τi when all its PCBs are already loaded
in the cache memory. Therefore, MDri only accounts for the
accesses to the nPCBs of τi and can occur during any job
execution of τi.
An upper bound on the total memory demand MD i(t) of a
task τi within a time window of length t when τi executes in
isolation is proven in the following lemma.
Lemma 1. If a task τi executes in isolation, then its total
memory demand MD i(t) within a time window of length t is
upper bounded by MˆD i(t) where
MˆD i(t)
def
= min
{⌈
t
Ti
⌉
MD i ;⌈
t
Ti
⌉
MD
r
i+ | PCB i | ×dmem
}
(10)
Proof. We prove that
⌈
t
Ti
⌉
MD i and
⌈
t
Ti
⌉
MD
r
i+ | PCB i |
×dmem are both upper bounds on the total memory demand
4
MD i(t) of τi. Thus, the minimum of those bounds is also an
upper bound on MD i(t).
1) τi can release at most
⌈
t
Ti
⌉
jobs in a time window of length
t. By definition of MD i, each of these jobs has a worst-
case memory demand MD i. Therefore,
⌈
t
Ti
⌉
MD i is an
upper bound on the total memory demand of τi.
2) Recall from Equations (8) and (9) that PCB i ∪ nPCB i =
ECB i and PCB i ∩nPCB i = ∅. Characterizing the worst-
case contribution of the PCBs and nPCBs to the total
memory demand is therefore sufficient to quantify the
worst-case contribution of all the cache blocks of τi to
MD i(t). Since by Definition 1, the persistent cache blocks
must be loaded only once, the maximum contribution of the
cache blocks in PCB i to MD i(t) is | PCB i | ×dmem (i.e.,
the total number of PCBs times the worst-case memory
access time). By Definition 3, the worst-case contribution
of nPCBs to the memory demand of each job released by
τi is MD
r
i . Since a maximum of
⌈
t
Ti
⌉
jobs are released
by τi in a time window of length t, an upper bound on
the total contribution of the nPCBs to MD i(t) is given by⌈
t
Ti
⌉
MD
r
i . Adding the contributions of nPCBs and PCBs,
we get that
⌈
t
Ti
⌉
MD
r
i+ | PCB i | ×dmem is an upper
bound on the total memory demand of τi.
Although Equation (10) provides an upper bound on the
total memory demand of τi in isolation, the total memory
demand of τi when executing concurrently with other tasks
can be much larger. Indeed, as can be observed in Example 1,
the PCBs of a task τj can be evicted due to the execution of
any task (i.e. tasks in hep(i) \ τj) between the execution of
two successive jobs of τj . This requires the effect of all tasks
in hep(i)\τj on the memory demand of τj ∈ hp(i) during the
WCRT of τi to be taken into account. We refer to this extra
memory demand caused by the eviction of PCBs of τj by
the tasks in hep(i) \ τj as cache-persistence reload overhead
(CPRO) and denote it by ρj,i. CPRO is formally defined as:
Definition 4 (Cache-persistence reload overhead). Cache-
persistence reload overhead, denoted by ρj,i, is the maximum
memory overhead of any task τj due to eviction of its PCBs
resulting from the execution of all tasks in hep(i) \ τj , while
τj is executing during the response time of τi.
V. CPRO UNION APPROACH
In this section, we present a simple approach similar to
the ECB-union to calculate the CPRO (i.e. ρj,i). We further
demonstrate how ρj,i can be incorporated in the WCRT
analysis of a task τi. Later, in Section VI, we extend this
simple union approach into a multi-set variant to remove some
of the pessimism associated with this analysis.
A. Computation of Cache-Persistence Reload Overhead
As discussed in Section IV-B, ρj,i accounts for the extra
memory demand of each job of τj ∈ hp(i) due to evictions of
its persistent cache blocks by other tasks running concurrently
on the processor.
As one can see in Figure 1, the PCBs of a task τj ∈ hp(i)
can be evicted by the ECBs of any other task running on
the platform between two successive jobs of τj . The memory
demand overhead ρi,j can thus be upper bounded by the
intersection of the set PCB j of all PCBs of τj with all
cache blocks (i.e., ECBs) that can be loaded by any other
task between two executions of τj . This observation leads to
the following theorem.
Theorem 1. The cache-persistence reload overhead imposed
by the eviction of PCBs of a job of task τj ∈ hp(i) on the
worst-case response time of a task τi is upper bounded by
ρj,i = dmem ×
∣∣∣∣PCB j ∩ ( ⋃
∀τk∈hep(i)\τj
ECBk
)∣∣∣∣ (11)
Proof. Since a fixed-priority scheduling algorithm is used,
only tasks with priorities higher than or equal to the priority
of τi (i.e., tasks in hep(i)) can execute during the response
time of τi. Therefore, any task τk ∈ hep(i) \ τj can execute
between two subsequent jobs of τj and hence evict some or
all the PCBs of τj .
The worst-case memory interference of any task τk ∈
hep(i) \ τj on τj is when it reloads all its cache blocks (i.e.,
its ECBs) between two subsequent jobs of τj . Therefore, the
largest set of memory blocks loaded by tasks in hep(i) \ τj
between two jobs of τj is given by
⋃
∀τk∈hep(i)\τj
ECBk.
The set of persistent cache blocks that must be reloaded
by τj during each job execution is thus upper bounded
by the intersection between τj’s PCBs (i.e., PCB j) and⋃
∀τk∈hep(i)\τj
ECBk.
Since each cache block reload takes at most dmem time
units, the CPRO due to the eviction of PCBs of τj by tasks
in hep(i) \ τj is upper bounded by
dmem ×
∣∣∣∣PCB j ∩ ( ⋃
∀τk∈hep(i)\τj
ECBk
)∣∣∣∣
Having defined an expression to calculate ρj,i, we now
define ρj,i(t), i.e., the total cache-persistence reload overhead
on τj in a time window of length t due to the eviction of its
PCBs by tasks in hep(i)\ τj . ρj,i(t) tells us by how much the
memory demand of τj can vary in comparison to its memory
demand in isolation (i.e., MDj(t)) due to the interference
generated by the other tasks executing concurrently with τj .
Using Theorem 1, ρj,i(t) can be easily computed as stated in
Lemma 2 below.
Lemma 2. The total CPRO ρj,i(t) on the execution time of
τj due to the eviction of its PCBs by tasks in hep(i) \ τj in a
time interval of length t is upper bounded by ρˆj,i(t) where
ρˆj,i(t)
def
=
(⌈
t
Tj
⌉
− 1
)
× ρj,i (12)
Proof. It directly follows from the fact that τj releases at most⌈
t
Tj
⌉
jobs in a time interval of length t. As a result, at most
5
(⌈
t
Tj
⌉
− 1
)
evictions can happen between two subsequent
jobs of τj . Since by Theorem 1, the CPRO suffered by a job
of τj is upper bounded by ρj,i, the total overhead ρj,i(t) is
upper bounded by
(⌈
t
Tj
⌉
− 1
)
× ρj,i.
B. WCRT Analysis
After showing how the extra memory demand overhead
ρj,i of a high priority task τj can be computed, we now
describe how it can be integrated into the WCRT analysis
of any lower priority task τi. As mentioned in Section III, the
WCRT analysis for fixed-priority preemptive systems was first
presented in [16], [17] without considering memory overheads
due to preemptions. It was then extended in several works
(e.g., [3], [8], [9]) to account for the cache-related preemption
delays. Some of the most prominent approaches resulted in
Equations (2) and (3), previously presented in Section III.
Although these approaches are beneficial, their WCRT
analysis still rely exclusively on the WCET Cj of high priority
tasks when computing the worst-case response time of a low
priority task τi. That is, it assumes that each job of a task
τj ∈ hp(i) executing within the response time of τi asks for its
worst-case memory demandMDj . As discussed in Section IV,
this assumption is pessimistic. In fact, due to the existence of
persistent cache blocks, once τj loads all its ECBs (i.e., PCBs
and nPCBs), subsequent jobs of τj will only need to reload
nPCBs and some of the PCBs that may have been evicted
due to the execution of tasks in hep(i) \ τj . As a result, for
subsequent jobs of τj the memory demand will be significantly
lower than MDj . To exploit this variable memory demand, we
present a more elaborate formulation of the WCRT analysis.
We propose that for any task τi the WCRT of task τi is upper
bounded by the smallest positive value Ri such that
Ri = Ci +
∑
∀j∈hp(i)
(Pj(Ri) +MDj(Ri)+
ρj,i(Ri) + γi,j(Ri)) (13)
In this WCRT formulation, we separately account for the
maximum processing demand Pj(Ri) and memory demand
MDj(Ri) (in isolation) that can be claimed by each higher
priority task τj within the response time Ri of τi. The
terms ρj,i(Ri) and γi,j(Ri) denote the total cache-persistence
reload overhead due to the eviction of PCBs of τj by tasks
in hep(i) \ τj , and the total cache-related preemption delay
due to the preemptions caused by τj within the response
time of τi, respectively. The terms (Pj(Ri) + MDj(Ri))
assume values obtained in isolation, while the two last terms
(ρj,i(Ri) + γi,j(Ri)) account for the overheads introduced by
the eviction of cache blocks by other tasks sharing the cache.
As already discussed in Section III, γi,j(Ri) is upper
bounded by γmuli,j . Furthermore, as proven in Lemmas 1 and 2,
MDj(Ri) and ρj,i(Ri) are upper bounded by Equations (10)
and (12), respectively. Finally, because each task τj releases
at most
⌈
t
Tj
⌉
jobs in a time window of length t, Pj(Ri) is
smaller than or equal to
⌈
Ri
Tj
⌉
Pj .
Replacing each term with its given bound, we get that
Ri ≤Ci +
∑
∀j∈hp(i)
⌈
Ri
Tj
⌉
Pj +
∑
∀j∈hp(i)
MˆDj(Ri)+
∑
∀j∈hp(i)
ρˆj,i(Ri) +
∑
∀j∈hp(i)
γmuli,j (14)
In systems where the number of PCBs is high and the cache
interference is low, the value provided by MˆDj(Ri)+ ρˆj,i(Ri)
should always be smaller than
⌈
Ri
Tj
⌉
MD i, and therefore we
should often have
⌈
Ri
Tj
⌉
Pj + MˆDj(t) + ρˆj,i(Ri) smaller than⌈
Ri
Tj
⌉
Cj . In this case, Equation (14) will result in a tighter
WCRT bound than Equation (3). However, in some situations,
since MˆDj(t) and ρˆj,i(Ri) are upper bounds and not exact
values, this formulation can result in an over-estimation of
the interference generated by τj on τi. In order to counter
this effect, and knowing that Equation (3) is already an
upper bound on the WCRT of τi, we further improve Equa-
tion (14) by always taking the minimum between
⌈
Ri
Tj
⌉
Cj
and
⌈
Ri
Tj
⌉
Pj + MˆDj(t) + ρˆj,i(Ri) as the total interference
caused by τj on τi (see Equation (15) below). Following
this simple modification to Equation (14), Equation (15) will
always return a value that is smaller than or equal to the
solution to Equation (3). Our approach hence dominates the
UCB union multi-set approach defined in [9].
Runi =Ci +
∑
∀j∈hp(i)
min
{⌈
Runi
Tj
⌉
Cj ;
⌈
Runi
Tj
⌉
Pj+
MˆDj(R
un
i ) + ρˆj,i(R
un
i )
}
+
∑
∀j∈hp(i)
γmuli,j (15)
Note that Equation (15) is recursive. However, a solution can
be found using simple fixed-point iteration on Runi initiating
Runi to Ci. The iteration stops as soon as R
un
i does not evolve
anymore or Runi > Di, in which case the task is deemed
unschedulable.
VI. CPRO MULTI-SET APPROACH
The formulation in Equations (11) and (12) considers the
ECBs of all tasks τk ∈ hep(i) \ τj as interfering with every
job of τj released within the response time of τi. This is
pessimistic. Indeed, considering two different tasks τk and τl
pertaining to hep(i) \ τj , the number of times τl can execute
between two successive jobs of τj is not necessarily equal to
the number of times τk can execute between two successive
jobs of τj . This situation is discussed in Example 2.
Example 2. Let τ1 = (1, 4, 4), τ2 = (4, 30, 30) and τ3 =
(10, 50, 50), where τ1 has the highest priority and τ3 the
lowest. Figure 2 presents a possible schedule that generates
the worst-case response time of τ3. As one can see, τ1
releases 5 jobs during the response time of τ3. As a result,
Equation (15) upper bounds the total cache overheads on the
PCBs of τ1 with 4 times ρ1,3. That is, it assumes that both
τ2 and τ3 execute and reload all their ECBs between every
6
Fig. 2: Illustration of the pessimism associated with Equa-
tion (12) using the task set {τ1, τ2 τ3} when τ1 and τ2 releasing
their first jobs with an offset.
two successive jobs of τ1. As can be seen in Figure 2, this
is pessimistic. In fact, τ2 execute only twice between jobs of
τ1! Its impact on the total CPRO of τ1 is therefore clearly
overestimated.
In order to reduce the pessimism associated with the com-
putation of ρj,i, we must consider the actual number of times
each task τk ∈ hep(i)\τj can execute between two successive
jobs of τj . For this reason, this section presents a multi-set
variant of Equation (12). The resulting quantity is an upper
bound on ρˆj,i(t) denoted by ρ
mul
j,i (t).
A. Computation of ρmulj,i (t)
In this section, we first characterize the maximum number
of times a task τk ∈ hep(i) \ τj can execute between two
successive jobs of τj . To do so, we separately analyze the tasks
in hep(j) \ τj (Lemma 3) and aff(i, j) (Lemma 4). We then
use this information to upper bound the total cache-persistence
reload overhead ρj,i(t) in Theorem 2.
Lemma 3. The maximum number of times a task τk ∈
hep(j) \ τj can execute between two successive jobs of τj
within the response time Ri of τi is upper bounded by Ek(Ri).
Proof. Remember that the maximum number of jobs that each
higher priority task τk can release during the response time of
a task τi is given by Ek(Ri)
def
=
⌈
Ri
Tk
⌉
. Furthermore, because
τk has a higher or equal priority than τj , τj cannot preempt τk.
Hence, the maximum number of time τk can execute between
two successive jobs of τj within a time window of length Ri
is upper bounded by its number of released jobs Ek(Ri) (see
Figure 3 for an example).
Lemma 4. The maximum number of times a task τk ∈
aff(i, j) can execute between two successive jobs of τj within
the response time Ri of τi is upper bounded by
(Ej(Rk) + 1)× Ek(Ri) (16)
Proof. Ej(Rk)
def
=
⌈
Rk
Tj
⌉
provides the maximum number of
jobs that τj can release during the response time of a task
τk. Each of these released jobs may preempt the execution
of τk. Considering an arrival pattern such that τk started to
execute just before the first arrival of τj preempting τk (see
Figure 3), the maximum number of times a job of τk may
execute between two successive jobs of τj is then given by
(Ej(Rk)+1). Since Ek(Ri) jobs of τk are released within the
response time of τi, the maximum number of times τk may
Fig. 3: Illustration of the maximum number of times the tasks
in aff(i, j) and hep(j)\τj can execute between two successive
jobs of τj . When calculating ρ2,3, τ1 ∈ hep(2)\τ2 can release
maximally 3 jobs (with each job loading all its ECBs in the
worst case). In contrast, the one job released by τ3 ∈ aff(3, 2)
can execute and load its ECBs maximum 4 times.
execute between two successive jobs of τj within the response
time of τi is upper bounded by (Ej(Rk) + 1)×Ek(Ri).
Using Lemmas 3 and 4, one can derive an upper bound on
ρj,i(t). This upper bound is denoted by ρ
mul
j,i (t) and is defined
in the following theorem.
Theorem 2. The total cache-persistence reload overhead
ρj,i(Ri) on τj due to the eviction of its PCBs by tasks in
hep(i)\τj during the response time Ri of τi is upper bounded
by
ρmulj,i
def
= dmem ×
∣∣∣Mecbj,i ∩Mpcbj,i ∣∣∣ (17)
where Mecbj,i and M
pcb
j,i are multi-sets defined as
Mpcbj,i =
⋃
Ej(Ri)−1
PCB j (18)
and
Mecbj,i =M
ecb−aff
j,i ∪M
ecb−hp
j,i (19)
with
Mecb−affj,i =
⋃
∀k∈aff(i,j)

 ⋃
(Ej(Rk)+1)Ek(Ri)
ECBk

 (20)
and
Mecb−hpj,i =
⋃
∀l∈hep(j)\τj

 ⋃
El(Ri)
ECB l

 (21)
Proof. The proof is based on the three following facts:
1. τj releases at most
⌈
t
Tj
⌉
jobs in a time window of length t.
At most
(⌈
t
Tj
⌉
− 1
)
evictions can therefore happen between
two subsequent jobs of τj . The largest set of PCBs of τj that
can be evicted between successive jobs of τj released during
the response time of τi is therefore given by the multi-set
Mpcbj,i =
⋃
Ej(Ri)−1
PCB j .
2. By Lemma 3, the maximum number of times a task
τl ∈ hep(j) \ τj can execute between two successive jobs
of τj during the response time of τi is upper bounded by
El(Ri). Hence, the largest set of ECBs that can be loaded by
τl and interfere with the PCBs of τj is given by
⋃
El(Ri)
ECB l
7
(assuming that τl reloads all its ECBs at each of its execution).
This results in that the largest set of ECBs loaded by the tasks
in hep(j) \ τj between successive executions of τj is upper
bounded by Mecb−hpj,i =
⋃
∀l∈hep(j)\τj
( ⋃
El(Ri)
ECB l
)
.
3. By Lemma 4, the maximum number of times a task τk ∈
aff(i, j) can execute between two successive jobs of τj during
the response time of τi is upper bounded by (Ej(Rk) + 1)×
Ek(Ri). Hence, the largest set of ECBs that can be loaded
by τk between successive jobs of τj during the response time
of τi is given by
⋃
(Ej(Rk)+1)Ek(Ri)
ECBk (assuming that τk
reloads all its ECBs whenever it resumes its execution). This
results in that the largest set of ECBs loaded by the tasks in
aff(i, j) between successive executions of τj is upper bounded
by Mecb−affj,i =
⋃
∀k∈aff(i,j)
( ⋃
(Ej(Rk)+1)Ek(Ri)
ECBk
)
.
Therefore, by 2. and 3. the largest set of ECBs that can
interfere with the PCBs of τj during the response time of τi
is upper bounded by Mecbj,i =M
ecb−aff
j,i ∪M
ecb−hp
j,i .
Finally, the largest set of PCBs of τj that can be evicted
by the tasks in hep(i) \ τj within the response time of τi
is upper bounded by the intersection of Mpcbj,i with M
ecb
j,i .
Since reloading a cache block takes at most dmem time units,
the total cache-persistence reload overhead ρj,i(Ri) is upper
bounded by dmem ×
∣∣∣Mecbj,i ∩Mpcbj,i ∣∣∣.
B. Improving the Accuracy of Mecbj,i
Theorem 2 provides a good upper bound on the total cache-
persistence reload overhead ρj,i(Ri) during the response time
of τi. However, Equations (20) and (21) still consider that each
job released by the tasks τk ∈ hep(i)\τj reload all their ECBs
(i.e., PCBs and nPCBs) whenever they resume their execution.
Even though this assumption may be valid for the tasks τl ∈
hep(j) \ τj , since each of their jobs contributes only once to
Mecbj,i (hence assuming that each job of τl accesses all its cache
blocks during its execution), it is quite pessimistic for the tasks
τk ∈ aff(i, j). Indeed, by Lemma 4 and Equation (20), each
job of a task τk ∈ aff(i, j) is assumed to contribute (Ej(Rk)+
1) times to Mecbj,i . However, a PCB of task τk will be accessed
at most once during each job execution unless this PCB is
also a UCB (in which case it may be used at several program
points of the task). The nPCBs must always be considered
to be loaded several times during each job execution though.
Indeed, since they are not persistent, it means that several
memory blocks of τk are mapped to that same cache block,
which can therefore be accessed more than once during each
job execution.
It results from this discussion that Mecbj,i can be more
accurately modeled by the following equation:
Mecbj,i =M
ecb−aff ′
j,i ∪M
ecb−hp
j,i (22)
with
Mecb−aff
′
j,i =
⋃
∀k∈aff(i,j)



 ⋃
Ek(Ri)
(PCBk \UCBk)

⋃

 ⋃
(Ej(Rk)+1)Ek(Ri)
(
nPCBk ∪ (PCBk ∩ UCBk)
)


(23)
where (PCBk ∩ UCBk) is the set of PCBs of τk that are
also UCBs, and (nPCBk ∪ (PCBk ∩ UCBk)) is therefore
the set of ECBs that may be loaded more than once by
each job of τk. All the other ECBs (those that are not
in (nPCBk ∪ (PCBk ∩ UCBk)) and are thus in (PCBk \
UCBk) are loaded at most once per job of τk and are therefore
accounted separately in the first term of Equation (23).
C. WCRT Analysis
Using the exact same argumentation as in Section V-B, the
worst-case response time of task τi can be upper bounded by
the smallest positive value Rmuli such that:
Rmuli = Ci+
∑
∀j∈hp(i)
min
{⌈
Rmuli
Tj
⌉
Cj ;
⌈
Rmuli
Tj
⌉
Pj+
MˆDj(R
mul
i ) + ρ
mul
j,i (R
mul
i )
}
+
∑
∀j∈hp(i)
γmuli,j
(24)
It is important to note that, by construction, the WCRT
formulation of Eq. (24) using the improved variant of the
multi-set approach dominates the WCRT given by standard
multi-set approach (Eq. (15)) which in turn dominates the
simple union approach presented in Section V-A.
VII. STATIC ANALYSIS
Having presented our proposed cache-persistence-aware
WCRT analysis, we proceed by explaining how the required
input quantities, defined in Section IV-B, are obtained using
standard static analysis techniques integrated in WCET esti-
mation tools.
Static cache analysis techniques use abstract interpretation
to determine the worst-case behavior with respect to caches
for each memory reference. The outcome of such techniques
is a classification of references (e.g. always-hit when the
reference will always result in a cache hit, always-miss when
the reference will always result in a cache miss, first-miss
when all successive occurrences of a reference but the first
one will result in hits). The classification of each reference
allows to determine if a reference will never require a memory
access (always-hit) or may require an access to memory. For
the purpose of this paper, the method presented in [18] is used.
Most WCET estimation tools use IPET (Implicit Path
Enumeration Technique) for WCET calculation. IPET is based
on an Integer Linear Programming (ILP) formulation of the
WCET calculation problem [19]. This formulation reflects the
program structure and the possible execution flows using a set
of linear constraints. The WCET estimate for a task is obtained
by maximizing the following objective function:
8
∑
b∈BasicBlocks
Eb × fb (25)
Eb (constant in the ILP problem) is the timing information
of basic block b. fb (variables in the ILP system, to be
instantiated by the ILP solver) correspond to the number of
times basic block b is executed.
For a task τi, quantities Pi and MD i are calculated using
IPET by setting constant Eb accordingly for all basic blocks
of τi. For the computation of Pi, only the execution time
of instructions is included in Eb, ignoring memory accesses.
Conversely, when computing MD i, only memory accesses (as
detected by static cache analysis) are included in Eb and the
execution times of instructions are ignored.
For the particular case of direct-mapped caches, determining
PCB i and ECB i is straight-forward. A memory block of task
τi belongs to PCB i if it is the only one to map to a given
cache block. ECB i is simply the set of memory blocks of task
τi. Determining UCB i is achieved using the method presented
in [4]. Finally, determining MDri is very similar to MD i. IPET
is applied with an execution cost of 0 and considering memory
accesses, but in contrast to the computation of MDi, only
memory accesses for cache blocks in nPCB i are considered.
VIII. EXPERIMENTS
In this section, we evaluate the effectiveness of our proposed
approaches in comparison to state-of-the-art techniques. We
conducted different experiments by varying the task utiliza-
tions, number of tasks and the size of cache.
The different inputs previously defined in Section VII were
computed using the Heptane2 static WCET estimation tool.
Heptane produces upper bounds on the execution times of
hard real-time applications. It computes WCETs using static
analysis at the binary code level. In this paper, all experiments
were conducted on C-code compiled with gcc 4.1 with no opti-
mization for MIPS R2000/R3000. The default linker memory
layout is used, i.e. functions are represented sequentially in
memory, and unless explicitly stated, no alignment directive
is used. Without loss of generality, all instructions are assumed
to execute in 1 cycle (cache access included). Each memory
access, regardless of its source, results in a penalty of dmem =
100 cycles. By default a direct-mapped instruction cache of
size 2 KB with a line size of 32 B is considered.
We have integrated the results obtained from Heptane using
static analysis with the MRTA framework developed by Alt-
meyer et al. [15] for multi-core response time analysis. The
MRTA tool provides a compositional framework for timing
verification in multi-core systems by explicitly modeling the
interferences of the different components. We modified the
MRTA tool to account for the new task parameters introduced
in this paper. We have added a module in the MRTA frame-
work that enables the calculation of the total CPRO ρj,i(Ri)
using the multi-set approaches detailed in Section VI-B. Also,
as we only consider a single-core system, the preemption
overhead calculation and the WCRT analysis are altered
2https://team.inria.fr/alf/software/heptane/
TABLE I: Task parameters for a selection of benchmarks from
the Ma¨lardalen Benchmark Suite [20]
Name Ci Pi MD i MD
r
i ECB i PCB i UCB i nPCB i
lcdnum 3440 984 2740 192 20 20 20 0
insertsort 7574 5974 2343 752 16 16 10 0
bs 1399 203 1223 34 11 11 9 0
bsort100 712289 710289 90893 88907 20 20 15 0
ludcmp 45135 27036 21511 11629 98 30 43 68
fdct 17350 6550 11525 11525 106 22 58 84
ud 28427 20627 10415 10415 75 53 31 22
nsichneu 316409 22009 294400 294400 1377 0 110 1377
statemate 190496 10586 180110 180110 275 0 81 275
accordingly. All the experiments were performed using the
Ma¨lardalen benchmark suite [20].
All the experiments are performed by randomly generating
a large number of task sets and determining the schedulability
of those tasksets using Equations (2) (denoted by ECB-Union
in the plots), (3) (denoted by UCB-Union Multiset) and (24)
(denoted by CPRO). Each task within the task set is randomly
assigned parameters from the Ma¨lardalen benchmarks. A
subset of them is shown in Table I. Note that due to space
limitations, it is not possible to show the details of all the
benchmarks in Table I.
Also it should be clear from the numbers in Table I that
the benchmark suite comprises tasks with both small and big
memory footprint (that fill the entire cache), consequently
removing any bias in the results.
With the exception of parameters defined in Table I, We
used the following other parameters in our experiments:
• The default number of tasks in each task is 10.
• Task utilizations were generated using UUnifast [21].
• Each task was randomly assigned one benchmark from
the Ma¨lardalen benchmark suite [20] with values of Ci,
Pi, MD i, MD
r
i along with sets of UCB , ECB , PCB
and nPCB obtained from the values given in Table I.
• Task periods are set according the WCET assigned to
each task from the benchmarks and the randomly gener-
ated utilization, i.e., Ti = Ci/Ui.
• Task deadlines are implicit with priorities assigned in
deadline monotonic order.
1) Total Utilization: To evaluate how our proposed CPRO
based WCRT analysis (i.e. Eq. (24)) performs in terms of
schedulability in comparison to the ECB-union [8] and UCB-
union multi-set approaches [9], we generated 100 task set at
each utilization with task set utilizations varied from 0.1 to
1 in steps of 0.05. Each task set comprised 10 tasks, with
benchmark parameters generated for a 2kB cache with 64
cache sets. The WCRT analysis is performed for all three
approaches using the same task sets. A task set is deemed
unschedulable if the calculated WCRT for any task within the
task set is greater than its deadline.
Figure 4a shows an average number of task sets that
were schedulable using the three analyzed approaches. It is
important to note that we only show a cropped version of
the plot starting from a utilization of 0.5 mainly because for
task set utilizations less than 0.5 all approaches produced
identical results. The ECB-union approach of Altmeyer et
al. [8] performs the worst. This is mainly due to the fact
9
 0
 10
 20
 30
 40
 50
 60
 70
 80
 90
 100
 0.5  0.55  0.6  0.65  0.7  0.75  0.8  0.85  0.9  0.95  1
Sc
he
du
la
bl
e 
Ta
sk
se
ts
Average Utilization
WCRT analysis with CPRO
UCB-Union Multiset
ECB-Union Approach
(a) Varying total utilizations
 0
 0.2
 0.4
 0.6
 0.8
 1
 5  10  15  20
Av
er
ag
e 
Sc
he
du
la
bi
lity
Number of Tasks
WCRT analysis with CPRO
UCB-Union Multiset
ECB-Union Approach
(b) Varying the number of tasks
 0.4
 0.5
 0.6
 0.7
 0.8
 0.9
 1
 32  64  128  256  512
Av
er
ag
e 
Sc
he
du
la
bi
lity
Number of Cache Sets
WCRT analysis with CPRO
UCB-Union Multiset
ECB-Union Approach
(c) Varying the number of cache sets
Fig. 4: Schedulability ratio on randomly generated task sets based on the Ma¨lardalen Benchmark
that they only use WCET (effectively the worst-case memory
demand) of tasks during the WCRT analysis along with the
CRPD cost defined by Equation (1), which is very pessimistic.
As a result, a high number of tasks tend to be unschedulable
especially at higher utilizations. However, the UCB-union
multi-set approach [9] performs better in comparison to the
ECB-union approach.
This is clearly due to the fact that the UCB-union multi-
set approach also takes into account the actual number of
preemptions of each task when calculating the CRPD. Yet,
it can be seen from the presented results that our proposed
WCRT analysis with CPRO (Eq. (24)) outperforms the other
approaches. In fact, we can have substantial gains in term
of schedulability in comparison to UCB-union multi-set ap-
proach, for example at a utilization of 0.85, we gain around
10% of schedulability.
2) Number of Tasks: In preemptive systems, the number
of tasks adversely affects the schedulability of the task set.
Increasing the number of tasks will lead to more preemptions,
resulting in increased memory overhead due to cache evictions.
We varied the number of tasks from 5 to 25 increasing 5
tasks in each step. All the parameters other than the number
of tasks have the same values as used in the previous section.
Figure 4b shows the results of our experiment. We can see that
the average schedulability (varying from 0.1 to 1 by step of
0.05) for all approaches decreases when the number of tasks
increases. Indeed, this is due to an incresing number of cache
evictions and reloads. On the other hand, we also observe
that our CPRO-based WCRT analysis performs significantly
better in comparison to the other two approaches. The average
schedulability for our approach at each point in Figure 4b is
up to 10% higher than the UCB-union multi-set and the ECB-
union approach. Consequently, this proves the robustness of
our approaches against the number of tasks.
3) Cache Size: The cache size is an important factor that
can affect the schedulability of tasks. If the cache is large
enough to accommodate all the tasks without any cache
reuse no additional memory accesses are required. In fact,
in this case all the ECBs of a task will be PCBs and will
never be evicted from the cache. Another case is when the
cache is very small and each task can fill the entire cache
during its execution. Consequently, this will result in higher
memory demand for each job of the task. To evaluate the
impact of cache size on the performance of the analyses, we
varied the number of cache sets from 32 to 512, keeping
all other task parameters constant as in the case of the
schedulability analysis described in Section VIII-1. Figure 4c
shows the resulting average schedulability for each approach
as a function of the number of cache sets. As the cache line
size is kept constant (i.e. 32 B), increasing the number of
cache sets effectively increase the cache size. We can see
that our proposed CPRO-based WCRT dominate the other
two approaches. In fact, by increasing cache size the overall
schedulability also increased from 0.76 (with 32 cache sets) to
0.81 (with 512 cache sets) with our approach. This is due to
the fact that with a bigger cache the number of PCBs for each
task will also increase (hence reducing the residual memory
demand). Whereas, for the other two approaches (consistently
with [9]) the schedulability decreases due to an increase in the
number of ECBs resulting in higher preemption overheads.
IX. CONCLUSION
This paper build upon the observation that a task can re-use
cache contents between different jobs. A method is presented
to capture these persistent cache blocks (PCBs) resulting
in variable memory demand for different jobs from a task.
The notion of cache-persistence reload overhead (CPRO) is
introduced and different approaches are presented to calcu-
late CPRO. These approaches are orthogonal to the state-
of-the-art methods used for CRPD calculation and can be
integrated with any of these methods. A WCRT analysis is
then presented that exploits this variable memory demand to
reduce the preemption cost of higher priority tasks under fixed-
priority preemptive scheduling, thereby reducing the WCRT
and improving schedulability.
We evaluated the performance of our approach against
two prominent approaches from the state-of-the-art in terms
of schedulability. Experiments were performed by varying
different parameters with most of the values taken from the
Ma¨lardalen benchmarks. Experimental results show that our
proposed WCRT analysis with CPRO dominates the UCB-
union multi-set and the ECB-union approach with an average
improvement of 10% in terms of schedulability.
In future work, we aim to extend this approach to multi-
level set associative caches. We would like to evaluate our
10
approach against methods such as cache coloring and cache
locking. We also plan to extend our analysis to multicore
platforms.
Acknowledgments. This work was partially supported by
National Funds through FCT/MEC (Portuguese Foundation
for Science and Technology) and co-financed by ERDF (Eu-
ropean Regional Development Fund) under the PT2020 Part-
nership, within project UID/CEC/04234/2013 (CISTER); also
by FCT/MEC and the EU ARTEMIS JU within project(s)
ARTEMIS/0003/2012 - JU grant nr. 333053 (CONCERTO)
and ARTEMIS/0001/2013 - JU grant nr. 621429 (EMC2).
REFERENCES
[1] G. C. Buttazzo, M. Bertogna, and G. Yao, “Limited preemptive schedul-
ing for real-time systems. a survey,” Industrial Informatics, IEEE
Transactions on, vol. 9, no. 1, pp. 3–15, 2013.
[2] K. Jeffay, D. F. Stanat, and C. U. Martel, “On non-preemptive scheduling
of period and sporadic tasks,” in RTSS’91. IEEE, 1991, pp. 129–139.
[3] J. V. Busquets-Mataix, J. J. Serrano, R. Ors, P. Gil, and A. Wellings,
“Adding instruction cache effect to schedulability analysis of preemptive
real-time systems,” in RTAS’96. IEEE, 1996, pp. 204–212.
[4] C.-G. Lee, J. Hahn, Y.-M. Seo, S. L. Min, R. Ha, S. Hong, C. Y. Park,
M. Lee, and C. S. Kim, “Analysis of cache-related preemption delay
in fixed-priority preemptive scheduling,” Computers, IEEE Transactions
on, vol. 47, no. 6, pp. 700–713, 1998.
[5] H. Tomiyama and N. D. Dutt, “Program path analysis to bound
cache-related preemption delay in preemptive real-time systems,” in
Proceedings of the eighth international workshop on Hardware/software
codesign. ACM, 2000, pp. 67–71.
[6] J. Staschulat, S. Schliecker, and R. Ernst, “Scheduling analysis of real-
time systems with precise modeling of cache related preemption delay,”
in ECRTS’05. IEEE, 2005, pp. 41–48.
[7] Y. Tan and V. Mooney, “Timing analysis for preemptive multitasking
real-time systems with caches,” ACM (TECS), vol. 6, no. 1, p. 7, 2007.
[8] S. Altmeyer, R. Davis, C. Maiza et al., “Cache related pre-emption delay
aware response time analysis for fixed priority pre-emptive systems,” in
RTSS’11. IEEE, 2011, pp. 261–271.
[9] S. Altmeyer, R. I. Davis, and C. Maiza, “Improved cache related pre-
emption delay aware response time analysis for fixed priority pre-
emptive systems,” Real-Time Systems, vol. 48, no. 5, pp. 499–526, 2012.
[10] F. Nemer, H. Casse´, P. Sainrat, and A. Awada, “Improving the worst-
case execution time accuracy by inter-task instruction cache analysis,” in
Industrial Embedded Systems, 2007. SIES’07. International Symposium
on. IEEE, 2007, pp. 25–32.
[11] F. Nemer, H. Casse, P. Sainrat, and J. Bahsoun, “Inter-task WCET
computation for a-way instruction caches,” in Industrial Embedded
Systems, 2008. SIES 2008. International Symposium on, June 2008, pp.
169–176.
[12] C. L. Liu and J. W. Layland, “Scheduling algorithms for multiprogram-
ming in a hard-real-time environment,” JACM, vol. 20, no. 1, pp. 46–61,
1973.
[13] R. Pellizzoni, E. Betti, S. Bak, G. Yao, J. Criswell, M. Caccamo, and
R. Kegley, “A predictable execution model for COTS-based embedded
systems,” in Real-Time and Embedded Technology and Applications
Symposium (RTAS), 2011 17th IEEE, April 2011, pp. 269–279.
[14] S. Altmeyer and C. M. Burguie`re, “Cache-related preemption delay
via useful cache blocks: Survey and redefinition,” Journal of Systems
Architecture, vol. 57, no. 7, pp. 707–719, 2011.
[15] S. Altmeyer, R. I. Davis, L. Indrusiak, C. Maiza, V. Nelis, and J. Reineke,
“A generic and compositional framework for multicore response time
analysis,” in RTNS’15. ACM, 2015, pp. 129–138.
[16] M. Joseph and P. Pandya, “Finding response times in a real-time system,”
The Computer Journal, vol. 29, no. 5, pp. 390–395, 1986.
[17] N. Audsley, A. Burns, M. Richardson, K. Tindell, and A. J. Wellings,
“Applying new scheduling theory to static priority pre-emptive schedul-
ing,” Software Engineering Journal, vol. 8, no. 5, pp. 284–292, 1993.
[18] H. Theiling, C. Ferdinand, and R. Wilhelm, “Fast and precise WCET
prediction by separated cache and path analyses,” Real-Time Systems,
vol. 18, no. 2-3, pp. 157–179, 2000.
[19] Y.-T. S. Li and S. Malik, “Performance analysis of embedded software
using implicit path enumeration,” in LCTES ’95: Proceedings of the
ACM SIGPLAN 1995 workshop on Languages, compilers, & tools for
real-time systems, R. Gerber and T. Marlowe, Eds., vol. 30, no. 11, New
York, NY, USA, Nov. 1995, pp. 88–98.
[20] J. Gustafsson, A. Betts, A. Ermedahl, and B. Lisper, “The Ma¨lardalen
WCET benchmarks: Past, present and future,” in OASIcs-OpenAccess
Series in Informatics, vol. 15. Schloss Dagstuhl-Leibniz-Zentrum fuer
Informatik, 2010.
[21] E. Bini and G. C. Buttazzo, “Measuring the performance of schedula-
bility tests,” Real-Time Systems, vol. 30, no. 1-2, pp. 129–154, 2005.
11
