Abstract-Caches are widely used in modern computer systems to bridge the increasing gap between processor speed and memory access time. On the other hand, presence of caches, especially data caches, complicates the static worst case execution time (WCET) analysis. Access pattern analysis (e.g., cache miss equations) are applicable to only a specific class of programs, where all array accesses must have predictable access patterns. Abstract interpretation-based methods (must/persistence analysis) determines possible cache conflicts based on coarse-grained memory access information from address analysis, which usually leads to significantly pessimistic estimation. In this paper, we first present a refined persistence analysis method which fixes the potential underestimation problem in the original persistence analysis. Based on our new persistence analysis, we propose a framework to combine access pattern analysis and abstract interpretation for accurate data cache analysis. We capture the dynamic behavior of a memory access by computing its temporal scope (the loop iterations where a given memory block is accessed for a given data reference) during address analysis. Temporal scopes as well as loop hierarchy structure (the static scopes) are integrated and utilized to achieve a more precise abstract cache state modeling. Experimental results shows that our proposed analysis obtains up to 74% reduction in the WCET estimates compared to existing data cache analysis.
I. INTRODUCTION
Worst-case Execution Time (WCET) is a key metric for real-time embedded software. Static WCET analysis provides a safe bound on the maximum execution time of a program on a target platform over all possible program inputs. For cost-sensitive domains like automotive electronics, the WCET estimation must be tight for cost-effective design and resource dimensioning. However, modern processors contain performance enhancing features such as caches and pipeline whose run-time timing behavior is hard to predict statically. This makes micro-architectural modeling (building timing models for micro-architectural features such as caches) a key component of WCET analysis.
Timing models of instruction caches for WCET analysis have been well-studied [18] . On the other hand, static timing analysis of data cache behavior remains a major challenge for WCET analysis methods and tools. Accurate data cache modeling is of paramount importance for tight WCET analysis of data-intensive routines. However, the run-time computed access address (which data locations are accessed by different instances of an instruction) and dynamic cache behavior make it difficult to develop a tight yet flexible and scalable static analysis. Conservatively assuming that every memory access results in a cache miss yields a safe but pessimistic WCET estimate.
Different static data cache analysis techniques have been developed so far. Access pattern-based techniques (e.g., cache miss equation framework in [10] ) achieve tight estimation, but are applicable to programs that contain only regular accesses with predictable patterns. On the other hand, abstract interpretation-based data cache analysis techniques ( [9] , [16] ) work on general programs but suffer from large over-estimation. In this paper, we seek to combine the strengths of these two approaches. We observe that the over-estimation in existing abstract interpretation-based data cache analysis stems from the globally defined abstract domain. In particular, a coarse-grained address analysis is adopted to compute a set of memory blocks possibly referenced by a memory access, while temporal property of the access is ignored (e.g., a memory block can be accessed in only certain iterations of a loop execution). The approximation in the address analysis causes substantial over-estimation in WCET estimates. Furthermore, traditionally the abstract interpretation computes fixed point of the abstract cache state conservatively for the entire program execution (disregarding cache behavior in specific program scopes), leading to large over-estimation.
In this work, we propose a general and accurate static data cache analysis method by combining access pattern analysis and abstract interpretation. For abstract cache state computation, we extend the cache behavior categorization of "persistence" as in the persistence analysis of [9] to capture the access pattern information. In our new persistence analysis framework, we also fix an error in the original persistence analysis which may result in underestimation of the cache misses. Our contributions include the following.
First of all, given a data reference D and its access pattern, we derive not only the set of possible accessed memory blocks, but also their temporal scopes. The temporal scope of a memory block m captures the loop iterations in the program where m may get accessed. Our proposed data cache analysis decides whether a memory block is persistent within its temporal scope. In particular, two memory blocks accessed in mutually exclusive temporal scopes do not conflict with each other within their scopes, even though they are mapped to the same cache set.
Secondly, we also consider the static scopes in our analysis. Similar to the multi-level analysis proposed in [2] for instruction cache, we maintain a copy of abstract data cache states for each loop nesting level of the program execution. As a result, certain memory blocks can be classified as persistent within a local scope of program execution (though it can not be guaranteed to be persistent globally).
Thirdly, we utilize scope-aware persistence while computing the number of data cache misses. In original persistence analysis, a data reference is classified as globally persistent throughout the program execution. However, our persistence analysis framework can guarantee that a data reference is persistent within certain temporal and static scopes.
Last but not the least, we have integrated our proposed framework into the open-source Chronos WCET analyzer ( [13] ). The experimental results show that our proposed scope-aware persistence analysis produces up to 74% tighter WCET estimation comparing to multi-level persistence analysis framework without temporal scope information [2] .
II. RELATED WORK
Abstract interpretation methods have been successfully applied to instruction cache analysis for WCET estimation [18] , [2] . A globally defined abstract cache state (ACS) is calculated via fixed-point computation, which conservatively captures the worst-case cache behavior at each program point (e.g., basic block boundary). However, existing approaches using abstract interpretation for data cache analysis (e.g., must analysis [16] and persistence analysis [9] ) suffer from significant over-estimation. The major source of the over-estimation arises from the fact that the definition and computation of ACS are insensitive to local program behavior. In particular, an array reference may access different memory blocks in different loop iterations, which must be captured in the analysis for a tight estimation. To overcome this problem, Sen and Srikant [16] proposes virtual loop unrolling, which makes the analysis computationally expensive. Moreover, in the presence of input-dependent branches, even with loop unrolling, no memory block can be guaranteed to be loaded to the cache for later reuse by must analysis. Lesage, Hardy and Puaut [12] applies persistence analysis to multi-level data caches.
In many real programs the access pattern of an array follows an uniform affine pattern. The cache miss equation (CME) framework [10] and Presburger Arithmetic formulation [4] have been applied to analyze array access patterns for data cache analysis. The CME framework computes the reuse vector of affine accesses and generates a set of Diophantine equations to characterize whether a reuse can be realized, or interfered with due to cache conflict. The solutions of this equation set are the possible conflict points. White et al. [19] proposes a framework to detect loop-affine array accesses at binary code level. Ramaprasad and Mueller [15] extends the CME framework to analyze scalar accesses and more general loop-nest. The data cache analysis with Presburger Arithmetic framework is exact and can handle certain non-linear access pattern; however, it has superexponential complexity in the worst case. Furthermore, these approaches cannot handle programs with input-dependent branches and unpredictable data accesses. It is also hard to combine such frameworks into a comprehensive WCET analysis considering other micro-architecture features, such as instruction cache [18] or unified cache analysis [5] .
Staschulat and Ernst [17] identifies single data sequence (SDS) where both control flow and accessed memory blocks are input independent. In such cases, cache performance can be determined by simple simulation and no analysis is needed. For non-SDS data references, persistence analysis is used to bound the worst-case cache conflicts. Similar to [9] , the persistence analysis does not capture array access patterns and leads to very pessimistic analysis results.
III. NOTATIONS AND ASSUMPTIONS
In our cache analysis, we consider a memory hierarchy containing separated L1 instruction and data caches. We use the following notations to represent the instruction/data cache configuration and accessibility.
• Capacity C: size of the cache in number of bytes • Block (line) size B: number of contiguous bytes to be loaded from memory to cache on each memory access.
• Associativity A: A-way set associative cache means that information stored at some addresses in memory could be loaded into any of A locations in the cache (depends on the cache replacement policy).
• Cache set F = f 1 , . . . , f (C/B)/A : A cache set f i is a sequence of cache blocks (lines) CL = l 1 , . . . , l A which contains all the A ways that can be addressed with the same index. set(m) returns the cache set memory block m maps to. We assume LRU (Least Recently Used) replacement policy is used to determine relative age of a memory block in the A-way associative cache set. Among common cache replacement policies, LRU is the most predictable policy thus more suitable for safety critical real-time systems [6] . Given a concrete cache state c at a program point p, the concrete set state s i describes the state of cache set c[
We assume write-through with no-write-allocate policy for a memory store instruction in our discussion of data cache analysis. However, our data cache analysis framework is applicable to different write policies with minor amendments in the analysis (discussed in Section VI-B). We consider the static and temporal scope information of data references at the assembly code level in our analysis. Finally, we would like to clarify that our proposed persistence analysis (Section VI) is "multi-level" in the sense that an independent analysis is performed at each loop nesting level (also referred as the static scope), which should not be confused with analysis of the multi-level caches (e.g., the L2, L3 caches).
IV. PERSISTENCE ANALYSIS IN [8] AND [9] In this section, we briefly discuss the safety issue and pessimism of the the original persistence analysis in [8] , [9] . The detailed description and proofs on how to fix the underestimation error in the original persistence analysis can be found in the technical report [11] .
A. Overview
In persistence analysis, a memory block m is guaranteed to be persistent if no other memory references can evict m from the cache during program execution. Therefore, m incurs one cache miss when it is first accessed, and all future accesses to it are guaranteed cache hits. In comparison with reuse-based approaches such as CME [10] and must analysis [16] , persistence analysis does not require first bringing memory blocks to the cache for subsequent reuse. Hence, it does not require a detailed access sequence analysis. Moreover, it can guarantee cache hit in the presence of inputdependent branches and unpredictable access addresses.
Persistence analysis is based on a fixed-point computation of the abstract cache state (ACS)ĉ = ŝ 1 , ...,ŝ n/A for each program execution point, whereŝ i is the abstract set states for cache set f i . In the LRU replacement policy, the abstract set state captures the upper bound of the positions (the relative ages) of the memory blocks that can possibly reside in the corresponding concrete cache set. An abstract line statê s i (l a ) contains memory blocks that have maximal relative age of a in the abstract set stateŝ i , where 1 ≤ a ≤ A. For example,ŝ i (l 2 ) = {m a } denotes that memory blocks m a could reside in cache set f i with maximal relative age of 2 at a program execution point. Furthermore, an additional abstract line stateŝ i (l ) is introduced to each abstract set state to keep track of memory blocks that have been referenced before but evicted out from the cache by other later memory references.
The analysis traverses the program's control flow graph (CFG) and manipulate the ACSs via update and join functions. The update function takes an input ACSĉ in and a set of memory blocks M possibly accessed at the current program location, and produces an output ACSĉ out which captures the worst-case cache behavior (maximal relative ages if LRU is used) due to the accesses in M . If a program point has more than one incoming edges in the CFG, the join function is applied to compute the input ACS of this point by combining the output ACSs of all its predecessors. In the original persistence analysis, the relative age of a memory block in the joined ACS is set to be the maximum relative age of all its occurrences in the predecessor's ACSs.
B. Safety Issue
We first discuss and fix the safety issue for the original persistence analysis in section IV-B and IV-C. . Therefore, the update function will not age memory blocks with maximum relative age equal or older than b such as a and c in the ACS. However, when b is inŝ in B5 , there may exist concrete set states that do not contain b (e.g. only a and c are in the concrete cache state corresponds to the path B0 → B2 → B4). In these concrete set states, access of b will increase the relative ages of a and c. Therefore, the original persistence analysis may underestimate the relative ages of a and c. A more detailed discussion is presented in the technical report [11] .
C. Fixing the Persistence Analysis
To fix the underestimation in the original persistence analysis, we propose to keep track of the memory blocks that may be younger (more recently accessed) than m for each memory block m during the analysis. We define the Younger Set (YS) as follows.
Definition 1: (Younger Set): For an abstract set stateŝ at program point p, the younger set YS(ŝ, m) of m captures a superset of all memory blocks that may have smaller relative ages (younger) than m at p in some possible program execution that reaches p.
In our revised persistence analysis, we maintain YS for all memory blocks at each program execution point. For an access of memory block m and corresponding abstract , c) = {a, b}, since both a and b are accessed (in basic block B1 and B3, respectively) after an access of c (in B2 of the previous iteration(s)). As a result, our revised persistence analysis correctly captures the scenario that memory block c could be possibly evicted during the program execution.
The details of our revised persistence analysis, and its safety proofs are presented in the technical report [11] .
D. Pessimism in Data Cache Persistence Analysis
While our revised persistence analysis fixes the underestimation error, it still suffers from the overestimation issues as in the original persistence analysis in the context of data cache analysis. Figure 2 (a) presents our motivating example which has four array references in two nested loops L1 (induction variable i) and L2 (induction variable j). We assume a data cache with block size is 32-Byte (contains 8 'int' elements or 16 'short int' elements), four cache line f 0 ...f 3 , and associativity A = 2. Neither the CME frameworks [10] , [15] nor must analysis [16] works well for this example, due to the input-dependent accesses (in basic block B2) and branches (in basic block B5). Furthermore, traditional abstract interpretation-based analysis techniques ( [9] , [16] ) capture only global properties of memory accesses, which may lead to significant overestimation. In the given example, since more than 2 (the associativity) memory blocks are mapped to each cache line within the outer loop L1 (Figure 2(d) ), the original persistence analysis cannot guarantee any cache hits -leading to a large over-estimation.
V. SCOPE-AWARE ADDRESS ANALYSIS
Central to our scope-aware data cache analysis is the notion of temporal scope that characterizes the behavior of a data reference over different loop iterations. Furthermore, we parameterize the definition and operations of temporal scopes with the static scope information on loop nesting. We will discuss how our proposed persistence analysis can utilize such information for more accurate abstract domain construction in Section VI.
Definition 2: (Temporal scope) A temporal scope for memory block m which is possibly accessed by a data reference D is defined as
where reside(D) is the set of loops where D resides in. For each of such loops For a data reference D, address analysis calculates set of memory blocks possibly accessed by D. We follow the register expansion framework in [19] to identify address expression for each data reference at binary-code level. For each register used to specify address of load/store instruction, we recursively perform register expansion to trace the source registers and the computation performed, until it traces back to a defined constant c, an unpredictable value ⊥, or a loop induction variable V . Readers are referred to [19] for details of address expression detection.
Given the address expression of a data reference D, set of possibly accessed memory blocks and their corresponding temporal scopes are automatically derived as follows.
⊥×4 + m 0 (BaseA) • In case the address expression is a constant, it corresponds to a scalar access to a fixed memory block. The same memory block is accessed in any loop iteration, so that its temporal scope covers all iterations. In Figure  3 
where outer(L) is the immediate outer loop of L. Thus, two temporal scopes overlap at loop level L only if the access intervals for L and all outer loops containing L are not mutually exclusive.
In Figure 3 
VI. PROPOSED DATA CACHE ANALYSIS
To reduce the pessimism of the original data cache persistence analysis as discussed in Section IV-D, we integrate access pattern analysis into the abstract interpretation framework for accurate WCET analysis. We extend the definition of memory block persistence in [9] . In our analysis, we capture memory block persistence at different loop nesting levels of the program execution (the static scopes), and utilize the computed temporal scope information for a scopeaware analysis. Our framework is built on our correct version of persistence analysis as described in Section IV-C. The soundness proofs are presented in the technical report [11] .
A. Scope-aware Persistence Analysis
The core idea of our scope-aware persistence analysis is to categorize the persistence of memory blocks in the calculated temporal scopes (Section V), instead of the globally defined persistence in [9] . For a data reference D, the temporal scope m D identifies a set of loops (where D resides in) and a loop iteration interval for each of the loops where D may access m. The scope-aware analysis approach allows us to integrate access pattern into the abstract interpretation framework, and determine the local behavior of data cache. In particular, our scope-aware persistence analysis computes memory block persistence within its temporal scope for each static scope (loop hierarchy) it may get accessed. Given above definition of scope persistence, for a memory block m possibly referenced by data access D to be persistent within loop L, it does not need to stay in the cache for all iterations of L. If m is not evicted out from cache during
.up], all accesses to m from D cause at most one cache miss (the cold miss) within one complete execution of L. To capture the scope persistence in the abstract domain of the persistence analysis framework, we define our scope-aware abstract set state and abstract cache state as follows.
Definition 4: (Scope-aware abstract set state) An abstract set stateŝ: {l 1 . . . l A } ∪ {l } → 2 T S maps cache lines (including the evicted line l ) to set of all temporal scopes T S (refer to Figure 4 (c) for an example).Ŝ denotes the set of all abstract set states. We have re-designed the persistence analysis framework to utilize the scope information. By capturing fine-grained persistence properties, our analysis can accurately model the local behavior of data cache for WCET estimation.
B. Overall Framework
As described in Section IV-D, a memory block m could be persistent in the inner loop but not in the outer loop (e.g., m 5 is persistent in L2 but non-persistent in L1, in the example given in Figure 2) . We adopt the multi-level persistence framework from [2] for instruction cache analysis, and extend it for our data cache analysis. As shown in Figure  4 (a), for each loop L, we perform a separate persistence analysis on the CFG fragment within L, with empty initial ACSĉ in Lentry [L] = ⊥ as input ACS of the L's entry node L entry . Consequently, the analysis will consider only paths and data accesses within L. As a result, we can determine the local persistence of a memory block in different loop levels. In Figure 4 we show the estimation results of our analysis for the motivating example presented in Figure 2 , and a detailed discussion will be given in Section VI-D.
Algorithm 1 MPA(L) -Multi-level Persistence Analysis
Algorithm. L denotes a loop (or the main procedure) under analysis. 
8:
for each data reference D in n do
10:
end for
11:
Queue.insert({n |∀n ∈ Succ(n) ∧ n ∈ L});
12: end while
Algorithm 1 describes the multi-level persistence analysis framework which captures the static scope (loop nesting
denote the input and output ACSs of a node n for analysis at loop level L. P red(n) and Succ(n) refer to the sets of predecessors and successors of n within the CFG of loop L currently being analyzed. We perform a standard fixed-point computation of the ACSs. The analysis initializes the input ACS of loop entry node L entry to empty (line 1) and inserts it to the processing queue Queue (line 2). For each node n, we compute the input ACSĉ In case where no-write-allocate is used (in write-through or write-back policy), a store instruction does not modify the cache state. We consider only load instructions in the cache analysis. Otherwise for write-allocate policy, all load and store instructions will be considered in the ACS calculation. Finally, all successors of n within L are inserted into Queue to capture the possible changes inĉ At loop level L, the scope-aware update function for a given input ACSĉ and set of temporal scopes T S D accessed by D can be defined as:
The scope-aware update functionÛĈ divides the accessed temporal scopes {m 1 ...m k } into X fi , the set of accessed temporal scopes for each cache set f i . For each input abstract set stateŝ in , the set update functionÛŜ computes the output abstract set stateŝ out , via updating the younger set and the maximal relative age of each temporal scope m ∈ (ŝ in ∪X fi )
where overlap(m, m a , L) is true when the temporal scopes m and m a overlap in loop level L according to Equation 1. In our set update function, the maximal relative age of a memory block in the output abstract set state is set to be larger than the number of all possible younger memory blocks of it, i.e., |YS(ŝ out , m)| + 1. To find the younger set YS(ŝ out , m), we have the following situations.
• If temporal scope m is not inŝ in , and m is newly accessed in X fi , m has no younger memory block and its maximal relative age is set to be 1.
• Else if m is inŝ in and it is also accessed in X fi . If there is no other temporal scope in T S D overlaps Figure 2 with m, then the data reference D accesses only m in the temporal scope defined by m. As a result, data reference D must renew the relative age of m inŝ in , and we can set its younger set to be empty.
• Otherwise, relative age of a memory block m can be interfered by any memory block m a accessed by D that maps to the same cache set, where the temporal scopes of m and m a overlap at loop level L (according to Equation 1) . We add all possible memory blocks m a to the younger set YS(ŝ out , m). At any program point p in loop level L, the join function JĈ (line 5 in Algorithm 1) computes an ACS from all the output ACSs of p's control flow predecessors. For each temporal scope m inĉ, the scope-aware join function unionizes the younger set of m in both output ACSs from the control flow predecessors to form the younger set of m at p. Therefore, YS(ŝ, m) always contains all possible younger memory blocks of m in scope m at p. Formally, our scope-aware join function is defined as follows.
JŜ (ŝ 1 ,ŝ 2 ) =ŝ with: Figure 4 (b), (c) and (d) shows the fixed-point ACSs computed by the original persistence analysis (at basic block B4, exit of L1), the proposed scope-aware multi-level analysis for loop L1 (at basic block B4) and L2 (at basic block B8) for the motivating example in Figure 2 In the presence of data cache, different executions of the same data reference may access various memory blocks and result in different cache behavior. In our motivating example shown in Figure 2 
D. ACS Computation of the Motivating Example
blockM iss(D, m) =            (m[L i ].up − m[L i ].lw + 1) ∀L i ∈ reside(D), if L ps == ∅ 1 if outer(L ps ) == ∅ (m[L i ].up − m[L i ].lw + 1) ∀L i ∈ outer(L ps ),
VIII. EXPERIMENTAL RESULTS
In this section, we evaluate the performance of our proposed scope-based persistence analysis using the dataintensive routines taken from the WCET Benchmarks ( [1] ). We assume the benchmarks are executed on a processor architecture with 5-stage pipeline, in-order execution, perfect branch prediction, separate L1 instruction cache and data cache. Both instruction and data caches have cache size 2 KB , block size 32 B, cache associativity 2, and perfect LRU replacement policy. Cache hit latency is 1 cycle, and cache miss latency is 6 cycles. We use SimpleScalar tool ( [3] ) to obtain simulation results. We extend SimpleScalar simulation to make it consistent with the assumptions made in our analysis. The cache analysis results on maximum number of data cache misses for each data reference are integrated as linear constraints into Chronos ( [13] ), an ILP-based WCET analysis tool for static WCET estimation. In our current implementation, we assume a processor architecture without timing anomalies [7] . However, the resulted cache modeling can be integrated with pipeline analysis as presented in [14] for architectures with timing anomalies. Table I shows the set of benchmarks used in our evaluation. We have enlarged the array sizes (and corresponding loop bounds) to introduce more data cache conflicts and amplify the effect of data cache performance on overall program execution time. Array Size shows the array size used in our simulation and analysis for each of the benchmarks. Simulation shows the observed WCET from SimpleScalar simulation in CPU clock cycles. Note that the simulation results may be smaller than the actual WCET values for benchmarks with input-dependent branches/accesses (e.g, Cnt, Bsort100, InsertSort and Adpcm). Finally, we report the WCET results obtained with our scope-aware persistence data cache analysis, as well as the time spent for the analysis (on a Intel(R) Xeon(TM) 2.20 Ghz with 2.5 GB RAM).
We have implemented the revised persistence analysis (Section IV-C), multi-level persistence framework [2] (using the revised persistence analysis), and the must analysis with loop unrolling as proposed in [16] to compare with our proposed scope-aware analysis. Figure 7 shows the percentage of overestimation from various data cache analysis approaches, compared to the normalized observed WCET results from SimpleScalar simulation (shown in Table I ). Given the array size in our experiment, since the entire array does not fit into the data cache for any of the benchmarks, no memory block can be categorized as persistent in the persistence analysis. Without the temporal scope information, multi-level persistence analysis [2] cannot give tighter estimation, except for the Lms benchmark, where only small arrays are accessed in different loop nesting levels. As a result, the estimated WCET results without temporal scope are up to 83% higher than the observed WCET (for InsertSort). We also compare the estimated WCET results using must analysis with 20% and 50% virtual unrolling of the loop nest ( [16] ), where the analysis is repeatedly performed for each unrolled loop iteration. As shown in Figure 7 , even when 50% of the loop nest is unrolled, must analysis [16] still reports up to 65% higher WCET estimate compared to the observed simulation time (for Adpcm). Must analysis requires loop unrolling to bring memory blocks to the data cache and to capture subsequent reuse. Therefore, in the remaining portion of the loop nest where unrolling is not applied, they can not capture any cache reuse.
On the other hand, our proposed analysis always obtains tighter WCET estimates compared to existing approaches. In most of the benchmarks, our WCET estimates are less than 10% higher than the simulation results (except for Matmult and Adpcm). We observe that many data references in these benchmarks have sequential array access patterns. They traverse array elements in sequential order, according to the row-major arrangement of array in the memory. Our scope-aware approach fully captures the temporal locality of such data accesses to bound the worst-case data cache performance. Our proposed analysis achieves 5% to 74% tighter WCET estimates compared to the original persistence analysis without temporal scope information, and 5% to 35% compared to must analysis with 50% unrolling.
Matmult contains a column array access in addition to sequential array accesses. In our analysis, a temporal scope captures the lower and upper bound of loop iterations where a memory block may get accessed. For column array access, array elements contained in a single memory block are Simulation Result Persistence Analysis [9] Multi-level PS Analysis [2] Must Analysis [14] (20% unrolling) Must Analysis [14] (50% unrolling) Our Analysis Figure 7 . WCET estimation results from different analyses usually accessed in non-contiguous loop iterations, which leads to over-estimation in the computed temporal scopes. However, as shown in Figure 7 , our estimated WCET is only 18% higher than the observed WCET, and is 17% to 46% tighter than other approaches.
Adpcm is a complex benchmark with input-dependent branches and accesses, so our simulation result may underestimate the real WCET. Due to the presence of inputdependent branches and accesses, must analysis cannot guarantee a memory block to be loaded into the cache for subsequent reuse even with unrolling. In our scope-aware persistence analysis, by guaranteeing the scope persistence of memory blocks, we can achieve 30% tighter WCET estimate compared to must analysis (with 50% loop unrolling).
IX. CONCLUDING REMARKS
In this paper, we have presented a novel data cache modeling approach for static WCET analysis. Our analysis effectively exploits regular data access patterns, while retaining the strength and applicability of the abstract interpretation approach. We define temporal scopes to capture the local behavior of memory references (when a particular memory block is accessed). These temporal scopes are automatically calculated during address analysis.
Our scope-aware multi-level data cache analysis extends the cache persistence analysis framework to compute finegrained scope-based persistence information, which leads to substantially tighter worst-case cache miss estimation. While we have presented our analysis for LRU-based cache replacement policy, it can also be extended to handle other deterministic cache replacement policies like FIFO and MRU. In particular, the abstract cache update function has to be changed to cope with the chosen replacement policy. Finally, the proposed analysis has been integrated into the open-source Chronos WCET analyzer ([13] version 4.1).
