Time-predictable task preemption for real-time systems with direct-mapped instruction cache by Kirner, Raimund & Puschner, Peter
Time-Predictable Task Preemption for Real-Time Systems with
Direct-Mapped Instruction Cache ∗
Raimund Kirner, Peter Puschner
Institut fu¨r Technische Informatik
Technische Universita¨t Wien
Treitlstraße 3/182/1
A-1040 Wien, Austria
{raimund,peter}@vmars.tuwien.ac.at
Abstract
Modern processors used in embedded systems are
becoming increasingly powerful, having features like
caches and pipelines to speedup execution. While execu-
tion speed of embedded software is generally increasing,
it becomes more and more complex to verify the correct
temporal behavior of software, running on this high-end
embedded computer systems.
To achieve time-predictability the authors introduced
a very rigid software execution model with distribu-
tion being realized based on the time-triggered commu-
nication model. In this paper we analyze the time-
predictability of a preempting task-activation, running
on a hardware with direct-mapped instruction caches.
As one result we analyze why a task-preemption driven
by a clock interrupt is not suitable to guarantee time-
predictability. As a second result, we present a time-
predictable task-preemption driven by an instruction
counter.
1 Introduction
The number of embedded systems where the correct
real-time behavior is of utmost importance is steadily
increasing. Embedded systems also perform safety-
critical tasks in application domains where human op-
erators would be unable to provide the same system
dependability. Human society gets more and more de-
pendent on the correct operation of embedded systems.
Thus, it is important to have veriﬁcation methods that
∗This work has been supported in part by the European IST
Network of Excellence ARTIST2 under project No. IST-004527
and the European IST project DECOS under project No. IST-
511764.
allow us to make sure that a computer system behaves
correctly, also in the temporal domain.
Today, embedded systems become increasingly com-
plex. Embedded processors are often equipped with
performance enhancing features like caches or execu-
tion pipelines. As a result, the state space of the over-
all system becomes enormous, thus making the analy-
sis of all possible behaviors of a piece of software run-
ning on such complex computer systems in general ex-
tremely diﬃcult. Consequently, worst-case execution
time (WCET) analysis of a piece of code running on
such computer systems is already challenging, and the
overall timing analysis of the system, including timing
eﬀects due to task preemptions requires unmanageably
high eﬀorts. This complexity problem not only ap-
plies to highly dynamic systems where most behavior is
event-driven. Even in case of relatively simple systems
with time-triggered task activation and scheduling the
problem arises. To avoid the risk of missing a dead-
line, systems have to be over-dimensioned to reduce
the risk of an unexpected temporal resource shortage
during operation.
In the light of this unsatisfactory situation we
started to think about simpler execution models for
real-time systems. In recent work we proposed a sys-
tem architecture that is time-predictable [7]. As the
timing of the actions performed by a computer system
depends on both, the software running on the com-
puter and the properties of the hardware executing the
software [5], we list software as well as hardware fea-
tures that in combination allow us to make a computer
system time-predictable. In this paper we provide ev-
idence that the proposed architecture, which is sum-
marized in Section 2, is time-predictable. A central
point of the proposed architecture is the task activa-
tion. In Section 3 we describe our analysis by model
checking of our ﬁrst attempt, a tasking model that
allows for oﬄine-scheduled, clock-driven task preemp-
tion. Finally, in Section 4 we analyze a more robust
tasking model where preemption points are controlled
by an instruction counter, not a clock. For the lat-
ter approach we show that it is fully time-predictable
even when running on a hardware with direct-mapped
instruction caches.
2 A Time-Predictable Application
Computer
In this section we review the basic ideas of our aim
to deﬁne a time-predictable computer system. Fur-
ther details of the proposed time-predictable architec-
ture including the communication subsystem to de-
velop distributed time-predictable computer systems
can be found in [7].
2.1 Hardware Architecture
A central idea of our approach is to obtain time pre-
dictability by using a software architecture that has an
invariable control ﬂow (see below). As a consequence of
using this restrictive software model, we can allow for
the use of hardware features that are otherwise consid-
ered as being “unpredictable” (e.g., instruction caches)
and yet build systems whose timing is invariable. So
the idea is to keep hardware restrictions and modiﬁca-
tions within limits (e.g., we restrict caches to direct-
mapped caches but do not demand special hardware
modiﬁcations as, for example, needed for the SMART
cache [2]). To support our execution model, the follow-
ing hardware properties have to be fulﬁlled:
• The execution times of instructions do not depend
on the values of the operands.
• The CPU supports a conditional move instruction
or a set of predicated instructions that have invari-
able execution times.
• Instruction caches are direct mapped.
• Memory access times for data are invariable for
all data items. (In our view, this is the strongest
limitation at the moment. We will try to relax this
in future work).
• CPU has a counter that counts the number of in-
structions executed. The counter can be reset and
used to generate an interrupt when a given num-
ber of instructions has been completed. If such
a counter is not available, as a work-around one
could instrument the program code with software-
traps to trigger task preemption.
2.2 The Software Architecture
To construct a time-predictable computer system
while being not more restrictive about the hardware
than explained above, we need to be very strict about
the software structure. In fact, the proposed software
architecture does not allow for any decisions in the con-
trol ﬂow whose outcome has not already been deter-
mined before the start of the system. This property
is true for both the application tasks and the operat-
ing system. Even task preemptions are implemented
in a way that does not allow for any timing variation
between diﬀerent task invocations.
2.2.1 Task Model
The structure of all tasks follows the simple-task model
found in [3]. Tasks never have to wait for the comple-
tion of an input/output operation and do never block.
There are no statements for explicit input/output or
synchronization within a task. It is assumed that the
static schedule of application tasks and kernel routines
ensures that all inputs for a task are available when the
task starts and that outputs are ready in the output
variables when the task completes. The actual data
transfers for input and output are under control of the
operating system and are scheduled before respectively
after the task execution.
An important and unique property of our task model
is that all tasks have only a single possible execution
path. By translating the code of all real-time tasks
into single-path code we ensure that all tasks follow
the only possible, pre-determined control ﬂow during
execution and have invariable timing. For more details
about the single-path translation see Section 2.3.
2.2.2 Operating System Structure
If not properly designed, the activities of the operating
system can create a lot of indeterminism in the timing
of a computer system. We have therefore been very
restrictive in the design of the operating system and
its mechanisms.
Predictability in the code execution of the operating
system is achieved by two mechanisms. First, single-
path coding is used wherever possible. Second, all data
that are relevant for run-time decisions of the operat-
ing system are computed at compile time. These data
include the pre-determined times for I/O, task com-
munication, task activation, and task switching. They
are stored in static decision tables that the operating
system interprets at runtime.
Task communcation and I/O is implemented by sim-
ple read and write operations to speciﬁc memory loca-
tions. As these memory accesses are pre-scheduled to-
gether with the application tasks, no synchronization
and no waiting is necessary at run time.
The two greatest challenges in building a fully pre-
dictable operating system were in maintaining time-
predictability in case of task preemptions and keeping
the activities of the application computer in synchrony
with its environment (the rest of the system).
• To maintain the deterministic timing in the pres-
ence of preemptions it was necessary to introduce
a mechanism that allows for a precise preemption
when a given number of instructions have ﬁnished
execution, i.e., planning preemptions at speciﬁc
times of the CPU clock turned out to be insuf-
ﬁcient (see Section 3).
• The programmable time interrupt provided by the
communication system is used to synchronize the
operation of the application computer with the
global time base [7].
2.3 Deterministic Single-Path Task Execution
As all branches in the control ﬂow of a task may po-
tentially cause variable timing, we translate the code of
all tasks into so-called single-path code [6]. This trans-
lation can be done automatically. The code resulting
from the single-path translation has only a single exe-
cution trace, hence the name single-path translation.
The strategy of the single-path translation is to re-
move input-data dependencies in the control ﬂow. To
achieve this, the single-path translation replaces all
input-data dependent branching operations in the code
by predicated code. It serializes the input-dependent
alternatives of the code and uses predicates (instead
of branches) and, if necessary, speculative execution to
select the right code to be executed at runtime.
For pieces of code with an if-then-else semantics, a
similar transformation, called if-conversion, has been
used before to avoid pipeline stalls in processors with
deep pipelines [1]. In addition to code with if-then-
else semantics the single-path translation transforms
loops with input-data dependent control conditions.
This transformation yields loops with constant itera-
tion counts, again with a single execution path [4].
As a prerequisite for the single-path translation of
a piece of code, the upper bounds for the number of
iterations of all loops have to be available. These num-
bers can either be computed by a semantic analysis of
the code or can be provided by the programmer in the
form of annotations, in case an automated analysis is
not possible or available.
3 Analysis of Clock-Driven Task Pre-
emption
To complete our proposed predictable software ar-
chitecture, we need a time-predictable task preemption
model. The idea of the predictable preemption is to
preempt each task that needs to be preempted at the
same points in time in each execution cycle of the static
schedule. By doing so, the overall timing of all repet-
itive executions of the cyclic schedule would also be
invariable.
Our original plan was to implement the predictable
task preemption by using the CPU clock for task
preemptions, i.e., preempt tasks always when the
CPU clock assumed one of the values given in the
preemption-time tables of the operating system. The
motivation was that with exact clock-driven task pre-
emptions the cache behavior will eventually stabilize,
resulting in constant code locations where task preemp-
tion occurs and thus resulting in constant execution
times of each execution cycle.
To get a time-predictable system it is important
to avoid by design interrelations between task activa-
tion and the other components that may cause non-
predictable eﬀects. In the following we describe our
analysis of whether task activation by clock-driven task
preemption in combination with the hardware and soft-
ware patterns described in Section 2 will ensure time-
predictable operation.
We imposed the execution times of instructions to
not depend on the operand values. However, a poten-
tial source of timing interrelation with the task activa-
tion is the instruction cache, which we assume to be
direct-mapped.
3.1 Model-Checking of the Task Preemption
We analyzed the clock-driven task preemption by
model checking. The model checker we used was SAL 1
from SRI international.
We built a model of the task preemption and the
other hardware and software mechanisms. The basic
strategy on the development of the model was to get a
quite ﬂexible model so that the model checker is able
to automatically ﬁnd and test diﬀerent system conﬁg-
urations. For example, the model checker analyzes the
1http://sal.csl.sri.com/
cache behavior for varying numbers of tasks, instruc-
tions per task, task activation times, cache size, and
instruction-to-memory mapping.
Applying the model to the model checkers yielded
the interesting result that it may happen that for cer-
tain system conﬁgurations the timing behavior of the
system will never stabilize. A useful feature of model
checking is that in case a property to be checked is
violated, a concrete counterexample is provided.
T1
T2
T1
T2
Scenario A 
Scenario B 
I1
I3
I2
I1
I3
I2
Figure 1. Oscillating Clock-Driven Activation
(calculated by the Model Checker)
The counterexample, found violating the stabiliza-
tion property is given in Figure 1. It is a system conﬁg-
uration consisting of two tasks, T1 and T2. T1 preempts
T2 at the pre-scheduled time marked by the dashed line
on the left. T2 resumes after T1 has completed its ex-
ecution. Task T1 consists of only one instruction I3,
while task T2 consists of two instructions, I1 and I2.
Further, the model checker calculated the instruction
cache conﬂict relations shown in Figure 2 of a direct-
mapped cache. A drawn-through line between two in-
structions means denotes a cache conﬂict, i.e., the two
instructions reside in diﬀerent memory locations that
map to the same cache line of the instruction cache.
A dotted line between two instructions denotes that
the two instructions map to the same memory loca-
tion, i.e., if one of these instructions is executed after
the other instruction without executing a conﬂicting
instruction between, it will be a cache hit. For ex-
ample, instructions executed in diﬀerent iteration of a
loop behave like this. Also fractions of neighboring ba-
sic blocks mapping to the same cache entry can have
this behavior. The basic execution time of the instruc-
tions is marked by dark boxes. Extra time spent by
waiting due to an instruction cache miss is given by
striped boxes.
I2
. . . code of same memory location
. . . cache conﬂict
I1
I3
Task T2:
Task T1:
Figure 2. Cache Conflict Pattern
Let us assume the very ﬁrst activation of our sched-
ule leads to the execution shown in Scenario A. The
ﬁrst access of T2 to the conﬂicting cache line leads to
a cache miss, and so do the other accesses by T1 re-
spectively T2 (The latter misses are due to the order in
which the tasks access memory).
When the schedule is repeated, T2 has a cache hit on
the ﬁrst memory access. So T2 makes faster progress
and the second access to the conﬂicting address occurs
before T2 is preempted, thus resulting in a hit, too.
T1 then executes with a cache miss, and as T2 has al-
ready completed its two critical memory accesses, the
instruction of T1 remains in cache (see Scenario B).
On the next execution of the schedule, T2 has a cache
miss when accessing the conﬂicting address, and so Sce-
nario A is repeated. Following Scenario A, Scenario B
happens again, and so on. The timing does not stabi-
lize. Instead, the task execution times are oscillating.
Currently we do not know in general the lowest upper
bound of the length of this repetitive pattern of cache
behavior.
It is a convenient feature of model checking that typ-
ically counterexamples of short length are calculated.
But in our case this does not mean that these short
counterexamples show unrealistic behavior that can-
not occur in systems of real size. For example, the
code pattern of the counterexample given in Figure 1
can also cause the same oscillation eﬀect on tasks con-
sisting of much more instructions. Figure 3 shows a
possible instantiation of this counterexample. To have
the oscillating eﬀect there, it is suﬃcient that in task
T2 no other instruction between Ii and Ij is in conﬂict
with them and the task activations are at critical time
instances.
As a consequence, clock-driven task preemption is
not suitable to provide robust and predictable task ex-
ecution times.
T1
T2
T1
T2
Scenario A 
Scenario B 
Ii
Ik
Ij
Ii
Ik
Ij
Code sequence
Basic execution time of 
instruction
Extended execution 
time of instruction
(incl. cache miss delay)
Figure 3. Task Preemption by Clock Interrupt (instantiation of the counter-example given in Figure 1)
4 Predictable Task Preemption
We found that preempting tasks based on the num-
ber of instructions executed yields the desired time-
predictable behavior [7]. So instead of using a clock we
count the number of instructions completed. Preemp-
tions happen when the value of this counter matches
an entry in the scheduling table.
Figure 4 shows the schedule for our example, using
an instruction counter. Still, the timing of the second
execution of the schedule diﬀers from the ﬁrst, in which
we have an initial cache miss. From the second execu-
tion on, however, the execution always starts from the
same cache state and has a constant execution time.
This stabilization of execution time is guaranteed in
systems with direct-mapped instruction caches. Since
in direct-mapped caches each memory reference maps
to a single cache line, the new cache state after applying
a sequence of memory references one time is the same
as applying this sequence multiple times. As a result,
from the second execution of an instruction sequence,
the instruction cache behavior is constant.
Thus, direct-mapped instruction caches can be used
in a time-predictable computer system as described in
Section 2 in combination with task-preemption driven
by instruction counting to ensure time-stability.
5 Summary and Conclusion
In this paper we analyzed the task activation mech-
anism of a software architecture for safety-critical hard
real-time systems. This software architecture uses a
static task activation scheme of a cyclic executive. The
operating system design and the single-path translation
of code support the construction of time-predictable
computer systems based on modern, powerful state-of-
the-art hardware.
Within this paper we ﬁrst analyzed why static,
oﬄine-scheduled task preemption driven by a clock in-
terrupt is not guaranteed to be time-predictable on a
computer architecture with instruction cache. We then
demonstrated that time-predictable task preemption
can be achieved, if the clock based preemption is re-
placed by a preemption model that uses an instruction
counter to control preemption points. Using this new
model all task instances are always preempted after the
same number of instructions executed and their execu-
tion times stabilize, i.e., after the very ﬁrst execution
the execution times of the tasks become and remain
invariable.
References
[1] J. R. Allen, K. Kennedy, C. Porterﬁeld, and J. D. War-
ren. Eﬃcient path proﬁling. In Proc. 10th ACM Sym-
posium on Principles of Programming Languages, pages
177–189, 1983.
[2] D. B. Kirk. SMART (strategic memory allocation for
real-time) cache design. In Proc. 10th Real-Time Sys-
tems Symposium, pages 229–237, Santa Monica, CA,
USA, Dec. 1989.
[3] H. Kopetz. Real-Time Systems - Design Principles
for Distributed Embedded Applications. Kluwer, 1997.
ISBN: 0-7923-9894-7.
[4] P. Puschner. Transforming execution-time boundable
code into temporally predictable code. In B. Kleinjo-
hann, K. K. Kim, L. Kleinjohann, and A. Rettberg,
editors, Design and Analysis of Distributed Embedded
Systems, pages 163–172. Kluwer Academic Publishers,
2002. IFIP 17th World Computer Congress - TC10
T1
T2
T1
T2
First cycle 
All other 
cycles 
Code sequence of constant 
number of instructions
Basic execution time of 
instruction
Extended execution 
time of instruction
(incl. cache miss delay)
Figure 4. Task Preemption by Instruction Counter (with constant instruction count at each preemption
point)
Stream on Distributed and Parallel Embedded Systems
(DIPES 2002).
[5] P. Puschner and A. Burns. A review of worst-case
execution-time analysis. Journal of Real-Time Systems,
18(2/3):115–128, May 2000.
[6] P. Puschner and A. Burns. Writing Temporally Pre-
dictable Code. In Proc. 7th IEEE International Work-
shop on Object-Oriented Real-Time Dependable Sys-
tems, pages 85–91, Jan. 2002.
[7] P. Puschner and R. Kirner. From time-triggered to
time-deterministic real-time systems. In Proc. 5th IFIP
Working Conference on Distributed and Parallel Em-
bedded Systems, pages 115–124, Braga, Portugal, Oct.
2006.
