Cache memory is used in almost all computer systems today to bridge the ever increasing speed gap between the processor and main memory. However, its use in multitasking computer systems introduces additional preemption delay due to reloading of memory blocks that were replaced during preemption. This cache-related preemption delay poses a serious problem in real-time computing systems where predictability is of utmost importance. In this paper, we propose an enhanced technique for analyzing and thus, bounding the cache-related preemption delay in xed-priority preemptive scheduling focusing on instruction caching. The proposed technique improves upon previous techniques in two important ways. First, the technique takes into account the relationship between a preempted task and the set of tasks that execute during the preemption when calculating the cache-related preemption delay. Second, the technique considers phasing of tasks to eliminate many infeasible task interactions. These two features are expressed as constraints of a linear programming problem whose solution gives a guaranteed upper bound on the cache-related preemption delay. This paper also compares the proposed technique with previous techniques. The results show that the proposed technique gives up to 60% tighter prediction of the worst case response time than the previous techniques.
I. Introduction
In a real-time computing system, tasks have timing constraints in terms of deadlines that must be met for correct operation. To guarantee such timing constraints, extensive research has been performed on schedulability analysis 1, 2, 3, 4, 5, 6] . In these studies, various assumptions are usually made to simplify the analysis. One such simplifying assumption is that the cost of task preemption is zero. This assumption, however, does not hold in general in actual systems invalidating the result of the schedulability analysis. For example, task preemption incurs costs to process interrupts 7, 8, 9, 10] , to manipulate task queues 7, 8, 10] , and to actually perform context switches 8, 10] . Many of such direct costs are addressed in a number of recent studies on schedulability analysis that focus on practical aspects of task scheduling 7, 8, 9, 10] .
In addition to the direct costs, task preemption introduces a form of indirect cost due to cache memory, which is used in almost all computer systems today. In computer systems with cache memory, when a task is preempted a large number of memory blocks 1 belonging to the task are displaced from the cache memory between the time the task is preempted and the time the task resumes execution. When the task resumes execution, it spends a substantial amount of its execution time reloading the cache with the memory blocks that were displaced during preemption. Such cache reloading greatly increases preemption delay, which may invalidate the result of schedulability analysis that overlooks this indirect cost.
There are two ways to address the unpredictability resulting from the above cache-related preemption delay. The rst way is to use cache partitioning where cache memory is divided into disjoint partitions and one or more partitions are dedicated to each real-time task 12, 13, 14, 15] . In the cache partitioning techniques, each task is allowed to access only its own partition and thus cache-related preemption delay is avoided. However, cache partitioning 1 A block is the minimum unit of information that can be either present or not present in the cache-main memory hierarchy 11]. We assume without loss of generality that memory references are made in block units.
has a number of drawbacks. One drawback is that it requires modi cation of existing hardware, software, or both. Another drawback is that it limits the amount of cache memory that can be used by individual tasks.
The second way to address the unpredictability resulting from the cache-related preemption delay is to take into account its e ects in the schedulability analysis. In 16], Basumallick and Nilsen propose one such technique. The technique uses the following schedulability condition for a set of n tasks, which extends the well-known Liu and Layland's schedulability condition 4].
n (2 1=n ? 1) In the condition, U is the total utilization of the task set and C i and T i are the worst case execution time (WCET) and period of i , respectively 2 . The additional term i is an upper bound on the cache-related preemption cost that i imposes on preempted tasks.
One drawback of this technique is that it su ers from a pessimistic utilization bound, which approaches 0.693 for a large n 4]. Many task sets that have total utilization higher than this bound can be successfully where R i is the worst case response time of i and hp(i) the set of tasks whose priorities are higher than that of i . This recursive equation can be solved iteratively and the resulting worst case response time R i of task i is compared against its deadline D i to determine the schedulability. 2 These notations will be used throughout this paper along with Di that denotes the deadline of i where Di Ti. We assume without loss of generality that i has higher priority than j if i < j. The i term used in both techniques is computed by multiplying the number of cache blocks used by task i and the time needed to re ll a cache block. This estimation is based on a pessimistic assumption that each cache block used by i replaces from the cache a memory block that is needed by a preempted task. This pessimistic assumption leads to overestimation of the cache-related preemption delay since it is possible that the replaced memory block is one that is no longer needed or one that will be replaced without being re-referenced even when there were no preemptions.
The above overestimation is addressed by Lee et al. in 18] . They use the concept of useful cache blocks in computing the cache-related preemption delay where a useful cache block is de ned as a cache block that contains a memory block that may be re-referenced before being replaced by another memory block. Their technique consists of two steps. The rst step analyzes each task to estimate the maximum number of useful cache blocks in the task. Based on the results of the rst step, the second step computes an upper bound on the cache-related preemption delay using a linear programming technique. As in BusquetsMataix et al.'s technique, this upper bound is incorporated into the response time equation to compute the worst case response time.
Although Lee et al.'s technique is more accurate than the techniques that do not consider the usefulness of cache blocks, it is still subject to a number of overestimation sources. We explain these sources using the example in Fig. 1 . In the example, there are three tasks, 1 = (C 1 = 10; T 1 = D 1 = 40), 2 = (C 2 = 20; T 2 = D 2 = 60), and 3 = (C 3 = 20; T 3 = D 3 = 120). Since we assume without loss of generality that i has higher priority than j if i < j, task 1 has the highest priority and task 3 the lowest priority. Suppose that main memory regions used by the three tasks are mapped to the cache as in Fig. 1-(a) . Also, suppose that the maximum number of useful cache blocks of 2 and 3 is 5 and 2, respectively, and that the time needed to re ll a cache block is a single cycle. The above solution, however, su ers from two types of overestimation. First, when a task is preempted, not all of its useful cache blocks are replaced from the cache. For example, when 2 is preempted by 1 , only a small portion of 2 's useful cache blocks can be replaced from the cache corresponding to those that con ict with cache blocks used by 1 , i.e., the cache blocks framed by thick borders in Fig. 1-(a) . Second, the worst case preemption scenario given by the solution may not be feasible in the actual execution. For example,
To rectify these problems, this paper proposes a novel technique that incorporates the following two important features. First, the proposed technique takes into account the relationship between a preempted task and the set of tasks that execute during the preemption when calculating the maximum number of useful cache blocks that should be reloaded after the preemption. Second, the technique considers phasing of tasks to eliminate many infeasible task interactions. These two features are expressed as constraints of the linear programming problem whose solution bounds the cache-related preemption delay. In this paper, we focus on the cache-related preemption delay resulting from instruction caching. To compute PC i (R k i ) at each iteration, the technique uses a two step approach. In the rst step, each task is analyzed to estimate the maximum number of useful cache blocks that the task may have during its execution. The estimation uses a data ow analysis technique 19] that generates the following two types of information for each execution point p and for each cache block c:
1. the set of memory blocks that may reside in the cache block c at the execution point p and 2. the set of memory blocks that may be the rst reference to the cache block c after the execution point p. 
This total cache-related preemption delay of i includes all the delay due to the preemptions of i and those of higher priority tasks. Note that the highest priority task 1 is not included in the summation since it can never be preempted.
In general, however, the exact g j values that give the worst case preemption delay of i cannot be determined. Thus, for the analysis to be safe, a scenario that is guaranteed to be worse than any actual preemption scenario should be assumed. Such a conservative scenario can be derived from the following two constraints that any valid g j combination should satisfy. 3 The technique de nes a more general preemption cost fi;j , which is the cost task i pays in the worst case for its j-th preemption over the (j ? 1)-th preemption. However, since in most cases the execution point with the maximum total number of useful cache blocks is contained within a loop nest, the generalized preemption cost has little e ect because fi;1 = fi;2 = = fi;L where L is the product of the iteration bounds of all the containing loops.
First, the total number of preemptions of 2 ; 3 ; : : : ; j during R i cannot be larger than the total number of invocations of 1 ; 2 ; : : : ; j?1 during R i , i.e.,
T k e; j = 2; 3; : : : ; i: (1) Second, the total number of preemptions of j during R i cannot be larger than the number of invocations of j during R i multiplied by the maximum number of times that any single j invocation can be preempted by higher priority tasks 1 ; 2 ; : : : ; j?1 , i.e.,
T k e; j = 2; 3; : : : ; i: (2) Note that since the technique computes the worst case response times from the highest priority task to the lowest priority task, the worst case response times of 1 ; 2 ; : : : ; i?1 , that is, R 1 ; R 2 ; : : : ; R i?1 , are available when R i is computed. 
III. Overall Approach
One problem with Lee et al.'s technique is that the preemption cost of a task is xed regardless of which tasks execute during the task's preemption. This may result in severe overestimation of cache-related preemption delay when only a few cache blocks are shared among tasks. For example, if the cache blocks used by a preempted task and those used by the tasks that execute during the preemption are disjoint, the preemption cost of this particular preemption would be zero. Nevertheless, Lee et al.'s technique assumes that the preemption cost is still the time needed to reload all the useful cache blocks of the preempted task.
To address this problem, the technique proposed in this paper takes into account the relationship between a preempted task and the set of tasks that execute during the preemption in computing the preemption cost. For this purpose, the proposed technique categorizes preemptions of a task into a number of disjoint groups according to which tasks execute during preemption. The number of such disjoint groups is 2 k ? 1 when there are k higher priority tasks. For example, if there are three higher priority tasks 1 , 2 , and 3 for a lower priority task 4 , the number of possible preemption scenarios of 4 is 7 (= 2 3 ? 1) corresponding to f 1 g, f 2 g, f 3 g, f 1 ; 2 g, f 2 ; 3 g, f 1 ; 3 g, and f 1 ; 2 ; 3 g according to the set of tasks that execute during 4 's preemption. For task j , we denote by P j?1 the set of all of its possible preemption scenarios by the higher priority tasks 1 ; 2 ; : : : ; j?1 . Note that set P j?1 is equal to the power set 23] of the set f 1 ; 2 ; : : : ; j?1 g excluding the empty set since for task j to be preempted, at least one higher priority task must be involved. In addition, we denote by p j (H) the preemption of task j during which the tasks in set H execute. For example, p 4 (f 1 ; 3 g) denotes the preemption of 4 during which tasks 1 and 3 execute. The preemption costs of tasks for di erent preemption scenarios are given by the following augmented preemption cost table for this example.
: useful cache blocks 
To compute f j (H), the preemption cost of scenario p j (H), the following three steps are taken based on the information about the set of useful cache blocks, which can be obtained through the analysis explained in 18]. First, for each execution point in task j , we compute the intersection of the set of useful cache blocks of j at the execution point and the set of cache blocks used by tasks in H. Second, we determine the execution point in j with the largest elements (i.e., useful cache blocks) in the intersection. Finally, we compute the (worst case) preemption cost of this preemption scenario by multiplying the number of useful cache blocks in that intersection and the cache re ll time.
As an example, consider Fig. 2 that shows the set of useful cache blocks (denoted by U's in the gure) for all the execution points of a lower priority task 4 and the sets of cache blocks used by higher priority tasks 1 , 2 , and 3 . In this example, the worst case preemption cost of 4 for the case where tasks 1 and 3 execute during preemption (i.e., f 4 (f 1 ; 3 g)) is three multiplied by the cache re ll time. This preemption cost is determined by the execution point shaded in the gure, which has the largest number of useful cache blocks that con ict with the cache blocks used by 1 and 3 .
Since there are 2 k ? 1 possible preemption scenarios for k higher priority tasks, we need to compute the same number of preemption costs in the worst case. This may require an enormous amount of computation when k is large. The computational requirement can be reduced substantially by noting that we do not need to consider the higher priority tasks whose cache blocks do not con ict with the cache blocks used by the task for which the preemption cost is computed. For example, in Fig. 2 since none of the cache blocks used by 2 con ict with those used by 4 , we do not need to consider the preemption scenarios that include 2 when computing the preemption costs of 4 . Instead, the preemption costs for scenarios that include 2 can be derived from those that do not include 2 by noting that f 4 (H f 2 g) = f 4 (H) for all H in fS 2 P 3 j 2 6 2 Sg.
A. Problem formulation
To formulate the problem of computing a safe upper bound of PC i (R i ) as a linear programming problem based on the augmented preemption costs f j (H)'s, we de ne a new variable g j (H) that denotes the number of preemptions of j by task set H, that is, the number of preemptions of scenario p j (H). The corresponding objective function is
This objective function states that the cache-related preemption delay of i during R i is the sum of delay due to preemptions of i and those of higher priority tasks during R i where the delay due to preemptions of a task is de ned as the sum of the counts of mutually disjoint preemption scenarios of that task multiplied by the corresponding preemption costs.
As in Lee et al.'s technique, we cannot determine the exact g j (H) values that give the worst case preemption delay and thus we should use various constraints on the g j (H)'s to bound the objective function value. In the next subsection, we give two such constraints that are extensions of Lee et al.'s original constraints. Then, in Section C, we discuss an advanced constraint that relates the invocations of a higher priority task and the preemptions of lower priority tasks where the higher priority task is involved. In Section IV, we give more advanced constraints that consider phasing of tasks to eliminate many infeasible task preemption scenarios. Finally, in Section V, we discuss an optimization that reduces the computational requirement of the proposed technique.
B. Extensions of Lee et al.'s constraints
This constraint states that the number of preemptions of j during which a higher priority task k executes is bounded by the number of invocations of j multiplied by the maximum number of times that any single j invocation can be preempted by k . To show that this constraint subsumes the second constraint of Lee nique yielding a tighter prediction of cache-related preemption delay.
As an example, consider the task set in Fig. 3 that consists of four tasks 1 , 2 , 3 , and 4 where 1 is the highest priority task and 4 the lowest one. Assume that the tasks are mapped to cache memory as shown in Fig. 3-(a) , where the useful cache blocks of tasks 1 , 2 , 3 , and 4 are denoted by numbers 1, 2, 3, and 4, respectively. The cache mapping and distribution of useful cache blocks of the tasks give the preemption cost table in Fig. 3-(b) assuming that the cache re ll time is a single cycle 4 .
Assume that we are interested in computing the cache-related preemption delay during the response time of task 4 , denoted by R 4 in Fig. 3-(c) . Also assume that during R 4 , there 4 In this example, to simplify the explanation, we assume that the set of useful cache blocks of each task shown in Fig. 3-(a) includes the set of useful cache blocks of any other execution point in the task. This assumption does not hold in general as the example in Fig. 2 illustrates. are four invocations of 1 , three invocations of 2 , and two invocations of 3 , whose response times are denoted in the gure by R 1 , R 2 , and R 3 , respectively. Note that these response times are available when we compute R 4 since we calculate the response times from the highest priority task to the lowest priority task. The maximum objective function value that satis es the above two sets of constraints is 32
and the values of g j (H)'s that give the maximum are as follows.
H f 1 g f 2 g f 1 ; 2 g f 3 g f 1 ; 3 g f 2 ; 3 g f 1 ; 2 ; 3 g The reason why our earlier constraints cannot eliminate the above infeasible preemption scenario is that they cannot relate the invocations of a higher priority task and the preemptions of lower priority tasks where that higher priority task is involved. For the example in At rst sight, it appears that the above problem can be solved by bounding the number of preemptions of lower priority tasks j+1 ; j+2 ; : : : ; i during which a higher priority task j executes by the number of invocations of that higher priority task j , which can be expressed by the following constraint:
T j e; j = 1; 2; : : : ; i ? 1:
This constraint, when cast into our example in Fig. 3, is translated This particular constraint, and Constraint (7) in general, however, is not safe meaning that a valid preemption scenario may not satisfy it because a single invocation of a higher priority task can be involved in more than one preemption of lower priority tasks, and thus can be counted multiple times in the summation on the left-hand side of Constraint (7). For example, when 3 is preempted by 2 The set of all possible safe constraints that can be derived by the above rule is as follows when the higher priority task involved is 1 : The other constraints for the cases where the higher priority task involved is 2 or 3 can be derived similarly.
IV. Advanced Constraints on Task Phasing
Among the two problems with Lee et al.'s technique explained in the introduction, the rst problem was addressed in the previous section by introducing a scenario-sensitive preemption cost. This section addresses the second problem, namely, the problem that the technique does not consider phasing among tasks and thus may allow many infeasible preemption scenarios. For example, the technique assumes that the number of preemptions of a lower priority task where a higher priority task is involved can potentially range from zero to the number of invocations of the higher priority task.
However, as Fig. 4-(a) illustrates, some of the invocations of a higher priority task (denoted by j in the gure) cannot be involved in any preemption of a lower priority task (denoted by k in the gure) even when we assume the worst case response time (denoted by R k in the gure) for the lower priority task. Similarly, as Fig. 4-(b) illustrates, some of the invocations of the higher priority task will inevitably be involved in preemptions of the lower priority task even when we assume the best case response time B k for the lower priority task. In this section, we incorporate these constraints and others about task phasing into the framework developed in the previous section. First, we de ne the following four numbers between two tasks j and k (j < k) whose priorities are higher than that of i for which the worst case response time R i is being computed: M jk , N jk , M 0 jk , and N 0 jk . Let I be the set of all intervals of length R i in the hyperperiod formed by j and k , that is, in LCM(T j ; T k ) (the least common multiple of T j and T k ). The number M jk is the maximum number of preemptions of the lower priority task k during which the higher priority task j executes over all intervals in I. Similarly, N jk is the minimum number of preemptions of the lower priority task k during which the higher priority task j executes over the same set of intervals. On the other hand, M 0 jk is the maximum number of times that instances of the lower priority task k are overlapped with an instance of the higher priority task j . More technically, it is the maximum number of level-k busy periods 24] that have both j and k over all intervals in I. Finally, N 0 jk is the minimum number of times that instances of the lower priority task k are overlapped with an instance of the higher priority task j , i.e., the minimum number of level-k busy periods that have both j and k over all intervals in I.
Assume that the worst case response times of j and k are R j and R k , respectively, both of which are available when we compute R i . Likewise, assume that the best case response times of j and k are B j and B k , respectively, for which the best case execution times of In the following, we give examples of constraints that use M 0 jk and N 0 jk . Assume that there are four tasks 1 , 2 , 3 , and 4 and that 1 and 2 correspond to j and k in the above constraints, respectively.
In our example, there are ten possible preemption scenarios of 3 and 4 : p 3 (f 1 g), p 3 (f 2 g), p 3 (f 1 ; 2 g), p 4 (f 1 g), p 4 (f 2 g), p 4 (f 3 g), p 4 (f 1 ; 2 g), p 4 (f 1 ; 3 g), p 4 (f 2 ; 3 g), and p 4 (f 1 ; 2 ; 3 g).
Among them, there are three preemption scenarios during which both 1 V. Optimization Based on Task Set Decomposition
One potential problem of the proposed technique is that it requires a large amount of computation when there are a large number of tasks since the number of variables used is O(2 n ) where n is the number of tasks in the task set. This section discusses a simple optimization based on task set decomposition that can drastically reduce the amount of Consider the example in Fig. 5 that shows the cache blocks used by four tasks 1 , 2 , 3 , and 4 . In the gure, we notice that although cache blocks are shared between 1 and 2 and also between 3 and 4 , there is no overlap between the cache blocks used by 1 and 2 and those used by 3 and 4 . This means that neither 1 nor 2 a ects the cache-related preemption delay of either 3 or 4 , and vice versa. Based on this observation, we can decompose a given task set into a collection of subsets in such a way that no two tasks from two di erent subsets share a cache block between them. Then the tasks in each subset can be analyzed independently of tasks in other subsets using the constraints given in the previous two sections.
For our example in Fig. 5 , the given task set is decomposed into two subsets: f 1 ; 2 g and f 3 ; 4 g. When we calculate the worst case response time of the lowest priority task 4 using the iterative procedure explained in Section II, the tasks in one subset can be analyzed independently of the tasks in the other subset and the two results can be combined as follows: To maximize the bene t of the optimization explained above, the number of subsets that can be analyzed independently should be large. An interesting topic for future research is to devise a scheme that allocates main memory to tasks so that the resulting cache mapping gives a large number of such subsets.
VI. Experimental Results
In this section, we compare the worst case response time prediction by the proposed technique with those by previous techniques using a sample task set. Our target machine is an IDT7RS383 board that has a 20 MHz R3000 RISC CPU, R3010 FPA (Floating Point Accelerator), and an instruction cache and a data cache of 16 Kbytes each. Both caches are direct mapped and have block sizes of 4 bytes. SRAM (static RAM) is used as the target machine's main memory and the cache re ll time is 4 cycles. In our experiment, we used a sample task set whose speci cation is given in The table also gives the period and WCET of each task in the second and third columns, respectively. Since our target machine uses SRAM as its main memory, its cache re ll time (4 cycles) is much smaller than those of most current computer systems, which range from 8 cycles to more than 100 cycles when DRAM is used as main memory 11] . To obtain the WCET of each task for more realistic cache re ll times, we divide the WCET into two components. The rst component is the execution time of the task when all memory references are cache hits, and is independent of the cache re ll time. It was measured from our target machine by executing the task with its code and data pre-loaded in the cache. The second component is the time needed to service cache misses that occur during the task's execution and is dependent on the cache re ll time. This component is computed by multiplying the total number of cache misses and the cache re ll time t refill . In our experiment, the total number of cache misses was obtained by the following procedure:
1. Two di erent execution times were measured for each task: one with its code and data pre-loaded in the cache and the other without such pre-loading, which are denoted by T 1 and T 2 , respectively.
2. By dividing the di erence between T 1 and T 2 by the 4 cycle cache re ll time of the target machine, we computed the total number of cache misses during the task's execution.
We used three di erent cache mappings for the code used by the four tasks as shown in . In the rst mapping, the code used by each task is mapped to the same cache region. On the other hand, in the second mapping, the cache regions used by the tasks are overlapped with each other by about 70%. Finally, in the third mapping, the code used by each task is mapped into a disjoint region in the cache. We speculate that these three mappings represent reasonably well the spectrum of possible overlap among the cache regions used by tasks.
TABLE II gives the preemption cost tables for the three mappings. Note that the preemption costs of the tasks decrease as the overlapping cache regions decrease. This is because less useful cache blocks are displaced during preemption, and eventually when the cache regions are disjoint, all the preemption costs are zero.
We used a public-domain linear programming tool called lp solve by Michel Berkelaar (URL: ftp://ftp.es.ele.tue.nl/pub/lp solve) to solve the linear programming problem posed by the proposed technique. The total number of constraints for our task set is 63 and it took less than 3 minutes of user CPU time and 5 minutes of system CPU time to compute all the data points presented in this section for the proposed technique on a Axil C ql is the time needed to move the rst task from the delay queue to the run queue (it measured 142 cycles in our target machine), C qs is the time needed to move each additional task from the delay queue to the run queue (it measured 132 cycles in our target machine).
A detailed explanation of this equation is beyond the scope of this paper and interested readers are referred to 7]. Note that unlike the proposed technique, the worst case response time predictions by C and P are insensitive to cache mapping since the preemption costs assumed by them are independent of cache mapping. The results in Fig. 7-(a) show that the proposed technique gives signi cantly tighter prediction of the worst case response time than the previous techniques. For example, when the cache re ll time is 100 cycles and the second cache mapping is used, the proposed technique gives a worst case response time prediction that is 60% tighter than the best of the previous approaches (5,323,620 cycles in M 2 vs. 13,411,402 cycles in P). This superior performance of the proposed technique becomes more evident as the cache regions used by the tasks become less overlapped, that is, as we move from M 1 to M 3 .
In Fig. 7 -(a), there are a few jumps in the worst case response time predictions of all the three techniques. These jumps occur when increase in the worst case response time due to increased cache re ll time causes additional invocations of higher priority tasks resulting in a number of bumps in Fig. 7-(b) .
The results in Fig. 7 -(a) also show that as the cache re ll time increases, the gap increases between the worst case response time prediction by M and those by the other two tech- niques. Eventually, the task set is deemed unschedulable by C and P when the cache re ll time is more than 90 and 100 cycles, respectively. On the other hand, the task set is schedulable by M even when the cache re ll time is more than 200 cycles if cache mapping 3 is used.
Finally, the results in Fig. 7-(b) show that as the cache re ll time increases, the cache-related preemption delay takes a proportionally large percentage in the worst case response time.
As a result, even for method M, the cache-related preemption delay takes more than 30% of the worst case response time when the cache re ll time is 100 cycles and cache mapping 2 is used. This indicates that accurate prediction of cache-related preemption delay becomes increasingly important as the cache re ll time increases, that is, if the current trend of widening speed gap between the processor and main memory continues 11].
To assess the impact of the various constraints used in the proposed technique on the accuracy of the resultant worst case response time prediction, we classi ed the constraints into two groups and calculated the reduction of the worst case response time prediction by each group. The constraint sets were classi ed as follows: the three constraints in Section III that deal with scenario-sensitive preemption cost were classi ed as Group 1 whereas those in Section IV that eliminate infeasible task phasing were classi ed as Group 2.
Figs. 8-(a) and (b) show the reduction of the worst case response time prediction as the two constraint groups are applied for cache re ll times of 60 cycles and 80 cycles, respectively. For comparison purposes, we also give the worst case response time prediction by technique P. The results show that for both cache re ll times when the cache regions used by the tasks are completely overlapped (i.e., cache mapping 1), most of the reduction comes from the constraints in Group 2 since in this case scenario-sensitive preemption cost degenerates to the preemption cost used by technique P. However, as the cache regions used by the tasks become less overlapped, the impact of the constraints in Group 1 becomes more signi cant and eventually when the cache regions are disjoint, all the reduction comes from the constraints in Group 1 alone since in this case all the scenario-sensitive preemption costs are zero.
We performed experiments using a number of other task sets and the results were very similar to those given in this section. Interested readers are referred to 25] where the results for the other task sets are presented.
VII. Conclusion
In this paper, we have proposed an enhanced schedulability analysis technique for analyzing the cache-related preemption delay, which is required if cache memory is to be used in multitasking real-time systems. The proposed technique uses linear programming and has the following two novel features expressed in terms of constraints in linear programming. First, the technique takes into account the relationship between a preempted task and the set of tasks that execute during the preemption when calculating the number of memory blocks that should be reloaded into the cache after the preempted task resumes execution. Second, the technique considers phasing of tasks to eliminate many infeasible task interactions.
Our experimental results showed that the incorporation of the two features yields up to 60% more accurate prediction of the worst case response time when compared with the prediction made by previous techniques. The results also showed that as the cache re ll time increases, the gap increases between the worst case response time prediction by the proposed technique and those by the previous techniques. Finally, the results showed that as the cache re ll time increases, the cache-related preemption delay takes a proportionally large percentage in the worst case response time, which indicates that accurate prediction of cache-related preemption delay becomes increasingly important if the current trend of widening speed gap between the processor and main memory continues.
