Cache behavior prediction by abstract interpretation by Ferdinand, Christian et al.
Cache Behavior Prediction by Abstract Interpretation
Christian Ferdinand Florian Martin Reinhard Wilhelm
Martin Alt
Universitat des Saarlandes Fachbereich Informatik Postfach    D
Saarbrucken Germany fferdi	orianwilhelmaltg
csunisbde
Abstract
Abstract interpretation is a technique for the static detection of dynamic proper
ties of programs It is semantics based that is it computes approximative properties
of the semantics of programs On this basis it allows for correctness proofs of anal
yses It replaces commonly used ad hoc techniques by systematic provable ones
and it allows the automatic generation of analyzers from specications as in the
Program Analyzer Generator PAG
In this paper abstract interpretation is applied to the problem of predicting
the cache behavior of programs Abstract semantics of machine programs are de
ned which determine the contents of caches For interprocedural analysis existing
methods are examined and a new approach that is especially tailored for the cache
analysis is presented This allows for a static classication of the cache behavior of
memory references of programs The calculated information can be used to sharpen
worst case execution time estimations It is possible to analyze instruction data
and combined instructiondata caches for common replacement and write strate
gies Experimental results are presented that demonstrate the applicability of the
analysis
Keywords abstract interpretation program analysis cache memories real time
applications worst case execution time prediction
 Cache Memories and RealTime Applications
Caches are used to improve the access times of fast microprocessors to rela
tively slow main memories They can reduce the number of cycles a processor
is waiting for data by providing faster access to recently referenced regions
of memory

 Caching is more or less used for all general purpose processors

Hennessy and Patterson 	
 describe typical values for caches in 	
 worksta
tions and minicomputers Hit time 	 clock cycles normally 	 Miss penalty 
clock cycles
Preprint  March 
and with increasing application sizes it becomes more and more relevant and
used for high performance microcontrollers and DSPs
Programs with hard realtime constraints have to be subjected to a schedu
lability analysis eg by the compiler  This should determine whether
all timing constraints can be satis	ed WCET 
Worst Case Execution Time
extimations for processes have to be used for this The degree of success for
such a timing validation  depends on sharp WCET estimations There are
two components to the prediction of WCETS

i architecture modeling the determination of how much time it will take
to execute an execution path on the target system and

ii program path analysis the determination of a worst case execution path
Here we focus on the 	rst point
For hardware with caches the typical worst case assumption is that all accesses
miss the cache This is an overly pessimistic assumption which leads to a waste
of hardware resources
 Overview
In the following Section we briey sketch the underlying theory of abstract
interpretation and present the program analyzer generator PAG Cache mem
ories are briey described in Section  In Section  we give a semantics for
programs that reects only memory accesses 
to 	xed addresses and its ef
fects on cache memories and we present the must analysis that computes for
all program points a set of memory blocks that must be in the cache when
ever control reaches this point and the may analysis that computes a set of
memory blocks that may be in the cache The behavior of memory references
within loops and recursive procedures can be analyzed with interprocedural
analysis methods In Section  existing approaches are discussed and a new
approach is presented An example is given in Section  Section  describes
extensions to data and combined caches In Section  we present and discuss
the results of practical experiments from an implementation of the analyses
and Section  describes related work
 Program Analysis by Abstract Interpretation
Program analysis is a widely used technique to determine runtime properties
of a given program without actually executing it Such information is used

for example in optimizing compilers  to enable code improving transfor
mations A program analyzer takes a program as input and computes some
interesting properties Most of these properties are undecidable Hence both
correctness and completeness of the computed information are not achievable
together Program analysis makes no compromise on the correctness side the
computed information is reliable as for enabling optimizing transformations It
cant thus guarantee completeness The quality of the computed information
usually called its precision should be as good as possible
There is a well developed theory of static program analysis called abstract
interpretation  With this theory correctness of a program analysis can
be easily derived According to this theory a program analysis is determined
by an abstract semantics Usually the meaning of a language is given as func
tions for the statements of the language computing over a concrete domain
A domain is a complete partially ordered set of values For such a semantics
an abstract version consists of a new simpler abstract domain and simpler
abstract functions which de	ne the abstract meaning for every program state
ment
For an abstract semantics and an input program a system of recursive equa
tions can be constructed The variables in this system stand for the values
of the abstract domain at every program point In this equation system the
value at a program point depends on the values at all program points which
can directly precede the execution of this program point For example the
value after the exit of a loop depends on the value at the end of the loop
body and on the value before the loop because it is possible that the loop
is never executed The control ow graph of a program describes every pos
sible ow of control and therefore all dependencies between the variables of
the equation system Lattice theory underlying abstract interpretation states
that the recursive equation system can be solved by 	xpoint iteration if the
abstract domain has only 	nite ascending chains ie every chain of values
v

 v

    has only 	nite length and if in addition every semantic function
is monotonic
The program analyzer generator PAG  oers the possibility to generate a
program analyzer from a description of the abstract domain and of the ab
stract semantic functions in two high level languages one for the domains and
the other for the semantic functions Domains can be constructed inductively
starting from simple domains using operators like constructing power sets
and function domains The semantic functions are described in a functional
language which combines high expressiveness with ecient implementation
Additionally the user has to supply a join function combining two domain
values into one This function is applied whenever a point in the program has
two 
or more possible execution predecessors

 Cache Memories
A cache can be characterized by three major parameters
 capacity is the number of bytes it may contain
 line size 
also called block size is the number of contiguous bytes that
are transferred from memory on a cache miss The cache can hold at most
n  capacityline size blocks
 associativity is the number of cache locations where a particular block may
reside nassociativity is the number of sets of a cache A set can be consid
ered as a fully associative subcache
If a block can reside in any cache location then the cache is called fully
associative If a block can reside in exactly one location then it is called direct
mapped If a block can reside in exactly A locations then the cache is called
Away set associative 
In the case of an associative cache a memory block has to be selected for
replacement when the cache is full and the processor requests further data
This is done according to a replacement strategy Common strategies are LRU

Least Recently Used FIFO 
First In First Out and random
We restrict our description to the semantics of Away set associative caches
with LRU replacement strategy The fully associative and the direct mapped
caches are special cases of the Away set associative cache where A  n and
A   rsp
 Cache Semantics
In the following we consider an Away set associative cache as a sequence of

fully associative sets F  hf

     f
nA
i a set f
i
as a sequence of set lines
L  hl

     l
A
i and the store as a set of memory blocks M  fm

    m
s
g
The function adr  M  N

gives the address of each memory block The
function set  M  F gives the set where a memory block would be stored

 denotes the modulo division
set
m  f
i
 where i  adr
m
nA  
To indicate the absence of any memory block in a set line we introduce a new
element I M

 M  fIg

Our cache semantics separates two key aspects
 The set where a memory block is stored This can statically be determined
as it depends only on the address of the memory block The dynamic dis
tribution of memory blocks into sets is modeled with the cache states
 The aspect of associativity and the replacement strategy within one set of
the cache Here the history of memory reference executions is relevant This
is modeled with the set states
Denition  concrete set state	 A concrete set state is a function s 
LM

 S denotes the set of all concrete set states
Denition  concrete cache state	 A concrete cache state is a function
c  F  S C denotes the set of all concrete cache states
If s
l
x
  m for a concrete set state s then x describes the relative age of
the memory block according to the LRU replacement strategy and not the
physical position in the cache hardware
The update function describes the side eects on the set 
cache of referencing
the memory
 The set where a memory block may reside in the cache is uniquely deter
mined by the address of the memory block ie the behavior of the sets is
independent of each other
 The LRU replacement strategy is modeled by using the positions of memory
blocks within a set to indicate their relative age The order of the memory
blocks reects the history of memory references
The most recently referenced memory block is put in the 	rst position
l

of the set If the referenced memory block m is in the set already then
all memory blocks in the set that have been more recently used than m
are shifted by one position to the next set line ie they increase their
relative age by one If the memory block m is not yet in the set then all
memory blocks in the cache are shifted and the oldest ie least recently
used memory block is removed from the set
Denition  set update	 A set update function U
S
 SM  S describes
the new set state for a given set state and a referenced memory block
Denition  cache update	 A cache update function U
C
 C M  C
describes the new cache state for a given cache state and a referenced memory
block
Updates of fully associative sets with LRU replacement strategy are modeled

in the following way
U
S

sm 


















l

 m
l
i
 s
l
i
 j i      h
l
i
 s
l
i
 j i  h      A if l
h
 s
l
h
  m
l

 m
l
i
 s
l
i
 for i      A otherwise
Notation y  z denotes a function that maps y to z f y  z denotes a
function that maps y to z and all x  y to f
x
Updates of Away set associative caches are modeled in the following way
U
C

cm  cset
m  U
S

set
mm
 Control Flow Representation
We represent programs by control ow graphs consisting of nodes and typed
edges The nodes represent basic blocks

 For each basic block the sequence
of references to memory is known

 ie there exists a mapping from control
ow nodes to sequences of memory blocks L  V M


We can describe the working of a cache with the help of the update function
U
C
 Therefore we extend U
C
to sequences of memory references
U
C

c hm

    m
y
i  U
C

  U
C

cm

    m
y

The cache state for a path 
k

     k
p
 in the control ow graph is given by
applying U
C
to the initial cache state c
I
that maps all set lines in all sets to I
and the concatenation of all sequences of memory references along the path
U
C

c
I
L
k

  L
k
p


A basic block is a sequence of fragments of instructions in which control ow
enters at the beginning and leaves at the end without halt or possibility of branching
except at the end For our cache analysis it is most convenient to have one memory
reference per control ow node Therefore our nodes may represent the dierent
fragments of machine instructions that access memory

This is appropriate for instruction caches and can be too restricted for data caches
and combined caches See Section  for weaker restrictions

 Abstract Semantics
The domain for our abstract interpretation consists of abstract cache states
that are constructed from abstract set states
Denition  abstract set state	 An abstract set state s  L  
M

maps
set lines to sets of memory blocks

S denotes the set of all abstract set states
Denition 
 abstract cache state	 An abstract cache state c  F 

S
maps sets to abstract set states

C denotes the set of all abstract cache states
We will present two analyses Themust analysis determines a set of memory
blocks that are de	nitely in the cache whenever control reaches a given pro
gram point The may analysis determines all memory blocks that may be in
the cache at a given program point The latter analysis is used to guarantee
the absence of a memory block in the cache
The analyses are used to compute a categorization for each memory reference
that describes its cache behavior The categories are described in Table 
Table 	
Categorizations of memory references
Category Abb Meaning
always hit ah The memory reference will always result in a cache hit
always miss am The memory reference will always result in a cache miss
not classied nc The memory reference could neither be classied as ah
nor am
The abstract semantic functions describe the eect of a memory reference
on an element of the abstract domain The abstract set cache	 update
function

U for abstract set 
cache states is an extension of the set 
cache
update function U to abstract set 
cache states
On control ow nodes with at least two

predecessors joinfunctions are used
to combine the abstract cache states
Denition  join function	 A join function

J 

C

C 

C combines two
abstract cache states

Our join functions are associative On nodes with more than two predecessors
the join function is used iteratively

	 Must Analysis
An abstract cache state c describes a set of concrete cache states c and an
abstract set state s describes a set of concrete set states s
To determine if a memory block is de	nitely in the cache we use abstract set
states where the position 
the relative age of a memory block in the abstract
set state s is an upper bound of the positions 
the relative ages of the memory
block in the concrete set states that s represents
m
a
 s
l
x
 means that the memory block m
a
is in the cache The position

relative age of a memory blockm
a
in a set can only be changed by references
to memory blocks m
b
with set
m
a
  set 
m
b
 ie by memory references that
go into the same set Other memory references do not change the position of
m
a
 The position is also not changed by references to memory blocks m
b

s
l
y
 where y 	 x ie memory blocks that are already in the cache and are
younger or the same age as m
a

m
a
will stay in the cache at least for the next A
 x references that go to the
same set and are not yet in the cache or are older than m
a

The meaning of an abstract cache state is given by a concretization function
conc

C


C  
C
 The concretization function for the must analysis conc


C
is
given by
conc


C

c  fc j  	 i 	 nA  c
f
i
  conc


S

c
f
i
g
conc


S

s  fs j  	 a 	 A  m  s
l
a
  b  s
l
b
  m and b 	 ag
We use the following abstract set update function

U


S

sm 






















l

 fmg
l
i
 s
l
i
 j i      h
 
l
h
 s
l
h
  
s
l
h

 fmg
l
i
 s
l
i
 j i  h     A if l
h
 m  s
l
h

l

 fmg
l
i
 s
l
i
 j i      A otherwise

Example  

U


S
	 l

l

l

l

s fm
a
g fg fm
b
m
c
g fm
d
g

U


S

sm
c
 fm
c
g fm
a
g fm
b
g fm
d
g
The address of a memory block determines the set in which it is stored This
is reected in the abstract cache update function in the following way

U


C

cm  cset
m 

U


S

c
set
mm
The join function for abstract set states is similar to set intersection A memory
block only stays in the abstract set state if it is in both operand abstract set
states It gets the oldest age if it has two dierent ages

J


S

s

 s

  s where
s
l
x
  fm j l
a
 l
b
with m  s


l
a
m  s


l
b
 and x  max
a bg
Example  

J


S
	 l

l

l

l

s

fm
a
g fm
b
g fm
c
g fm
d
g
s

fm
c
g fm
e
g fm
a
g fm
d
g

J


S

s

 s

 fg fg fm
a
m
c
g fm
d
g
The join function for abstract cache states applies the join function for abstract
set states to all its abstract set states

J


C

c

 c

  f
i


J


S

c


f
i
 c


f
i
 for all  	 i 	 nA
An abstract cache state c at a control ow node k is interpreted in the following
way Let m a memory block and s  c
set 
m If m  s
l
y
 for a set line
l
y
then m is de	nitely in the cache every time control reaches k Therefore a
reference to m is categorized as always hit 
ah

 May Analysis
To determine if a memory block is never in the cache we compute the set
of all memory blocks that may be in the cache We use abstract set states s
where the position 
the relative age of a memory block in the abstract set
 
state is a lower bound of the positions 
the relative ages of the memory blocks
in the concrete set states that s represents
m
a
 s
l
x
 means the memory blocks m
a
may be in the cache The position

relative age of a memory blockm
a
in a set can only be changed by references
to memory blocks m
b
with set
m
a
  set 
m
b
 ie by memory references that
go into the same set Other memory references do not change the position of
m
a
 The position is also not changed by references to memory blocks m
b

s
l
y
 where y  x ie memory blocks that are already in the cache and are
younger as m
a

If there are no memory references to m
a
 then m
a
will be removed from the
cache after at most A 
 x   references to memory blocks that go into the
same set and are not yet in the cache or are older or the same age than m
a

The concretization function for the may analysis conc


C
is given by
conc


C

c  fc j  	 i 	 nA  c
f
i
  conc


S

c
f
i
g
conc


S

s  fs j  	 a 	 A  m  s
l
a
  b  s
l
b
  m and b 	 ag
We use the following abstract set update function

U


S

sm 






















l

 fmg
l
i
 s
l
i
 j i      h
l
h
 s
l
h
  
s
l
h

 fmg
l
i
 s
l
i
 j i  h     A if l
h
 m  s
l
h

l

 fmg
l
i
 s
l
i
 j i      A otherwise
Example  

U


S
	 l

l

l

l

s fm
a
g fm
b
m
c
g fg fm
d
g

U


S

sm
c
 fm
c
g fm
a
g fm
b
g fm
d
g
The abstract cache update function for the may analysis has the same struc
ture as the one for the must analysis

U


C

cm  cset
m 

U


S

c
set
mm

The join function is similar to set union If a memory block s has two dierent
ages in two abstract cache states then the join function takes the youngest
age

J


S

s

 s

  s where
s
l
x
 fm j l
a
 l
b
with m  s


l
a
m  s


l
b
 and x  min
a bg
 fm j m  s


l
x
 and  l
a
with m  s


l
a
g
 fm j m  s


l
x
 and  l
a
with m  s


l
a
g
Example  

J


S
	 l

l

l

l

s

fm
a
g fm
b
g fm
c
g fm
d
g
s

fm
c
g fm
e
m
f
g fm
a
g fm
d
g

J


S

s

 s

 fm
a
m
c
g fm
b
m
e
m
f
g fg fm
d
g
The join function for abstract cache states for the may analysis has the same
structure as for the the must analysis

J


C

c

 c

  f
i


J


S

c


f
i
 c


f
i
 for all  	 i 	 nA
An abstract cache state c at a control ow node k is interpreted in the following
way Let m be a memory block and s  c
set 
m If m is not in s
l
y
 for an
arbitrary l
y
then it is de	nitely not in the cache whenever control reaches k
Therefore a reference to m is categorized as always miss 
am
 Termination of the Analysis
There are only a 	nite number of sets and set lines and for each program a
	nite number of memory blocks This means the domain of abstract cache
states c  F  
L  
M

 is 	nite Hence every ascending

chain is 	nite
Additionally the abstract cache update functions

U and the join functions

J
are monotonic This guarantees that our analysis will terminate

The order is given by set inclusion and the concretization functions


 Analysis of Loops and Recursive Procedures
Loops and recursive procedures are of special interest since programs spend
most of their time there In a control ow graph a loop is represented as a
cycle The start node of a loop
	
has two incoming edges One represents the
entry into the loop the other represents the control ow from the end of the
loop to the beginning of the loop The latter is called loop edge 
see Figure 
start node
loop edge
Fig 	 Control ow graph of a loop
There are loops that can iterate more than once Since the execution of the
loop body usually changes the cache contents it is useful to distinguish the
	rst iteration from others This could be achieved by conceptually unrolling
each loop once
Example  Let us consider a suciently large fully associative data cache
with LRU replacement strategy and the following program fragment
  
 Variable x not in the data cache 
for i to  do    yx    end
  
	
We consider here loops that correspond to the loop constructs of higher pro
gramming languages Program analysis is not restricted to this but will produce
more precise results for programs with well behaved control ow

In the 	rst execution of the loop the reference to x will be a cache miss
because x is not in the cache In all further iterations the reference to x will
be a cache hit if the cache is suciently large to hold all variables referenced
within the loop
For the abstract interpretation the join function

J

combines the abstract
cache states at the start node of the loop Since the join function is similar
to set intersection the combined abstract cache state will never include the
variable x because x is not in the abstract cache state before the loop is
entered For a WCET computation for a program this is a safe approximation
but nevertheless not very good
Loop unrolling would overcome this problem After the 	rst unrolled iteration
x would be in the abstract cache state and would be classi	ed as always hit
For our analysis of cache behavior we treat loops as procedures to be able to
use existing methods for the interprocedural analysis


 This is done by trans
forming all loops into loopprocedures in the control ow graph according
to Figure  This is only done for the analyses and has no inuence on the
program code
proc loop
L
	


 if P then
while P do BODY
BODY  loop
L
	 

end	 end






loop
L
	 




Fig  Loop transformation
In the presence of 
recursive procedures a memory reference can be executed
in dierent execution contexts An execution context corresponds to a path in
the call graph of the program
The interprocedural analysis methods dier in which execution contexts are
distinguished for a memory reference within a procedure Widely used are the
callstring approach and the functional approach which have been proposed by
Sharir and Pnueli   and are implemented in PAG


Ludwell Harrison III  also proposed this transformation for the analysis of
loops

The callstring approach limits the number of distinguished execution contexts
statically To do this the call graph is considered The goal is not to merge
information that is obtained on dierent paths through the graph But in
presence of recursion the graph is cyclic and therefore has an in	nite number
of paths So only the information obtained on paths which dier in suxes of
a 	xed length K are kept separated
In the functional approach the number of distinguished execution contexts is
not statically limited The PAG generated analyzer tabulates all dierent input
values and output values of the abstract domain 
here abstract cache states
for every procedure To guarantee termination of the analysis the abstract
domain has to be 	nite The functional approach computes the most precise
solution
The applicability of these approaches to the cache behavior prediction is lim
ited
 Callstring approach If we restrict the callstring length K to  
call
string
 then one categorization for each memory reference in the program
is computed This is fast but yields not very precise information
Callstring
 gives better results as it distinguishes as many dierent
execution contexts of a memory reference in a procedure as there are calls
For each transformed loop there is one call to the loopprocedure at the
original place of the loop in the program 
 
see Figure  and one for the
recursive call of the loopprocedure 
 The 	rst call corresponds to the 	rst
iteration of the loop The second call corresponds to all other iterations of
the loop
Longer callstrings increase the analysis eort and lead to a more pre
cise categorization The precision gained is quite poor with respect to the
enormously increasing analysis costs as there are many execution contexts
distinguished that are non interesting for our analysis
 Functional approach The dynamically distinguished execution contexts
cannot be easily combined with the results of a program path analysis that
determines a safe approximation to the worst case execution path This
makes a WCET estimation more dicult
To overcome the de	ciencies of the callstring
 and the functional ap
proaches we have developed the VIVU approach which has been imple
mented with the mapping mechanism of PAG as described in  It corre
sponds to callstring
 but paths through the call graph that only dier
in the number of repeated passes through a cycle are not distinguished It
can be compared with a combination of virtual inlining of all non recursive
procedures

and virtual unrolling of the 	rst iterations of all recursive proce

This has also been used in 	

dures 
and loopprocedures The results of the VIVU approach can naturally
be combined with the results of a path analysis to predict the WCET of a
program
The results of the callstring
 callstring
 and the VIVU approach are
compared in Section 
 Example
We consider must and may analyses for a fully associative data cache of 
lines for the following program fragment of a loop where x stands for a
construct that references variable x
while e do b c a d c end
The control ow graph and the result of the analyses with VIVU

are shown in
Figure  We assume that all variables are stored in pairwise dierent memory
blocks The nodes of the control ow graph are numbered  to  and each node
is marked with the variable it accesses For the analysis we assume the loop
has been implicitly transformed into a loopprocedure according to Figure 
Each node is marked with the abstract cache states 
in the same format as in
Example  computed by the PAGgenerated analyzer immediately before the
abstract cache states are updated according to the memory references The
loop entry edge is marked with the incoming abstract cache states The loop
exit edge is marked with the outgoing abstract cache states
 Data Caches and Combined Caches
Our analysis can be used to predict the behavior of data caches or combined
instruction!data caches if the addresses of referenced data can be statically
computed
Addresses of references to global data can usually be easily determined Local
variables and procedure parameters that are allocated on the stack are ad
dressed relatively to the stack pointer or frame pointer ie a register that
points to a known address within the procedure frame on the execution stack
If the values of the stack pointer or frame pointer are known the absolute
addresses of the variables and parameters can be determined by a data ow

Here the analyses with callstring	 yield the same results

dc
c
a
e






b
must
f
may
f
fag fcg fbg feg
must
o
may
o
fag fcg fbg feg
must
f
may
f
fdg fag fcg fbg
must
o
may
o
fdg fag fcg fbg
may
f
feg fbdzg fg fg
must
f
feg f g f g fbdzg
must
o
may
o
feg fcg fdg fag
may
f
fbg feg fdzg fg
must
f
fbg feg f g fdzg
must
o
may
o
fbg feg fcg fdg
may
f
fcg fbg feg fdzg
must
f
fcg fbg feg fg
must
o
may
o
fcg fbg feg fdg
EXIT
ENTRY
may feg fbcdzg fag f g
must feg f g f g fdg
must
o
may
o
fcg fdg fag fbg
may fbeg fdzg fg fg
must f g f g fbdg fezg
Fig  Must and may analysis for a fully associative data cache with VIVU must
and may are the abstract cache states for the must and the may analysis must
f
andmay
f
are the abstract cache states for the rst loop iterationmust
o
andmay
o
are the abstract cache states for all other iterations The abstract cache states can
be interpreted for each variable reference as follows

NodeVariable 	rst iteration other iterations

e 
b always hit always miss

c always miss always hit

a 
d always miss always miss

c always hit always hit

analysis  For programs without recursive procedures it is possible to de
termine all values of the stack or frame pointers for all procedures for the
distinguished execution contexts of the cache behavior analysis
To support the analysis of programs for which not all addresses of the memory
references can precisely be determined the

U functions are extended to handle
a set of possibly referenced memory locations


Since it is not de	nitely known which memory block is put into the cache the
update function

U


C
for the must analysis applied to a set of possible memory
locations fm

 m
x
g and an abstract cache state c only aects the ages of
the memory blocks in c in all sets where an element of fm

 m
x
g could be
stored

U


C

c fm

 m
x
g  cf
i


U


S

c
f
i
X
f
i
 for all f
i

fset
m

  set
m
x
g
where X
f
i
 fm
y
j m
y
 fm

 m
x
g and
set
m
y
  f
i
g

U


S

s fm

 m
x
g 

U


S

  

U


S

s fm

g     fm
x
g

U


S

s fmg 






















l

 fg
l
i
 s
l
i
 j i      h
 
l
h
 s
l
h
  s
l
h

l
i
 s
l
i
 j i  h      A if h  m  s
l
h

l

 fg
l
i
 s
l
i
 j i      A otherwise
The update function

U


C
for the may analysis applied to a set fm

 m
x
g of
possible memory locations and an abstract cache state c inserts all elements
of fm

 m
x
g into their corresponding sets The ages of the memory blocks
that are already in c are not changed because it is not known which set of

References to an array X can be treated conservatively by using a reference to
the set fm

  m
x
g of all memory blocks of X 

the concrete cache is touched

U


C

c fm

 m
x
g  cf
i


U


S

c
f
i
X
f
i
 for all f
i

fset
m

  set
m
x
g
where X
f
i
 fm
y
j m
y
 fm

 m
x
g and
set
m
y
  f
i
g

U


S

s fm

 m
x
g  l

 l

 fm

 m
x
g
l
i
 s
l
i

 fm

 m
x
g j i      A
 Writes
So far we have ignored writing to a cache and only considered reading from a
cache There are two common cache organizations with respect to writing to
the cache 
 Write through On a cache write the data is written to both the memory
block and the corresponding set line
 Write back The data is written only to the set line The modi	ed set line
is written to main memory only when it is replaced This is usually imple
mented with a bit 
called dirty bit for each set line that indicates whether
the set line has been modi	ed
The execution time of a store instruction often depends on whether the mem
ory block that is written is in the cache 
write hit or not 
write miss For the
prediction of hits and misses we can use our analysis There are two common
cache organizations with respect to write misses
 Write allocate The block is loaded into the cache This is generally used
for write back caches
 No write allocate The block is not loaded into the cache The write changes
only the main memory This is often used for write through caches
Writes to write through!write allocate caches can be treated as reads for the
cache analysis For no write allocate caches a write access to a block m is
treated as a read access if m is already in the concrete or abstract cache
state Otherwise the write access is ignored
Write back caches write a modi	ed line to memory when the line is replaced
The timing of a load or store instruction may depend on whether a modi	ed or

an unmodi	ed line is replaced

 To keep track of modi	ed set lines we extend
the cache states by a dirty bit ie we use pairs 
m b of memory blocks and
dirty bits instead of memory blocks in the set!cache states where b  d
means modi	ed b  p means unmodi	ed The update functions distinguish
reads and writes The dirty bit is set to d on writes and to p on reads The
join function for the must analysis sets the dirty bit for a memory block to d
only if it is set to d in both arguments The join function for the may analysis
sets the dirty bit for a memory block to d if it is set to d in at least one
argument
Let k be a control ow graph node m be a memory reference at k c


the
abstract cache state for the may analysis immediately before m is referenced
and c


the abstract cache state immediately after m was referenced c


the
abstract cache state for the must analysis immediately before m is referenced
and c


the abstract cache state immediately after m was referenced
If the memory reference to m cannot be classi	ed as always hit then all dirty
memory blocks that may have been replaced by the memory reference to m
are contained in
Rep 



m j 
md 
nA

i
A

j
c



f
i

l
j









m j 
m b 
nA

i
A

j
c



f
i

l
j




 If the memory reference to m has been classi	ed as always hit or Rep  
then no dirty memory block has been replaced This reference has de	ni
tively caused no write back
 If Rep   then we have to consider a possible write back
 If there is a 
md pair in c


that is not in c


 then a dirty memory block
has been replaced This reference has de	nitively caused a write back
The identi	ed 
possible write backs can be used in another abstract inter
pretation similar to the cache analysis for the prediction of the write buer
behavior
 Practical Experiments
For reasons of simplicity we have restricted our practical experiments to the
analysis of instruction caches

Many cache designs use write buers that hold a limited number of blocks Write
buers may delay a cache access when they are full or data is referenced that is
still in the buer To analyze the behavior of the write buers possible write backs
have to be determined
 
The cache analysis techniques are implemented in a PAG generated analyzer
that gets as input the control ow graph of a program and an instruction
cache description and produces a categorization cat of the instruction!context
pairs of the input program A context represents the execution stack ie the
function calls and loops along the corresponding path in the call graph It is
represented as a sequence

of 	rst and recursive function calls 
call f
f
 call f
r

and 	rst and other execution of loops 
loop l
f
 loop l
o
 for the functions f and

conceptually transformed loops l of a program INST is the set of all instruc
tions inst in a program CONTEXT is the set of all execution contexts context
of a program IC is the set of all instruction!context pairs ic
CONTEXT fcall f
f
 call f
r
 loop l
f
 loop l
o
g

IC INST CONTEXT
cat  IC  fah am ncg
Additionally we compute for every instruction!context pair ic with cat
ic 
nc the set of competing instructions ie the instructions that are in the same
fully associative set in the abstract cache state of the may analysis For in
stance if the competing instructions reside in less than A 
 level of asso
ciativity memory blocks then all executions of the instruction will result in
at most one cache miss Generally an upper bound of the number of cache
misses of the instruction is given by one plus the maximal number of possible
sequences of length A of executions of competing instruction that are stored
in pairwise disjoint memory blocks To determine the bound is a nontrivial
problem We use simple heuristics to compute a safe approximation to the
upper bound
Our experiments have been performed for the Sun SPARC architecture The
Sun SPARC is a RISC architecture with pipelined instruction execution It has
a uniform instruction size of four bytes The front end to the analyzer reads a
Sun SPARC executable in aout format Our implementation is based on the
EEL library  of the Wisconsin Architectural Research Tool Set 
WARTS
EEL 
Executable Editing Library is a C library for building tools to an
alyze and modify an executable 
compiled program It hides systemspeci	c
detail 
like executable 	le format and allows to edit linked executables not
just object 	les
The objective of our work is to improve the WCET estimation of programs on
computer systems with caches The execution time of a program depends on
the program path ie the sequence of instructions that are executed and their
individual execution times But the program path is usually dependent on the
program input and cannot generally be determined in advance Therefore a

For callstringK the sequence has a maximal length of K

Specification:
...
cache.optla
WCET, 
Dyn. Cache 
Prediction
cache.set
Behavior 
BCET, 
PAG
Sample Input
Worst Case Input
Best Case Input INST x CONTEXT
-> Num
+Cache Parametera.out
Trace
profile:
cat:
INST x CONTEXT
-> {ah,am,nc}
Cache Hit Ratio
"Program Path Analysis":
Analyzer
Static Cache Analysis:
CFG-Builder
Evaluation
Tracer
(qpt2)
Profiler &
Cache Simulator
Fig  The structure of the analysis
program path analysis is part of a WCET analysis  For example
with the help of user annotations like maximal iteration counts of loops an
architecture dependent worst case execution pro	le can be determined that
gives a conservative approximation to the worst case execution path
The program path analysis can be very accurate YauTsun Steven Li and
Sharad Malik report that their estimated bounds are within two percent of
the 
calculated worst case bounds for their set of benchmark examples 
The worst case execution pro	le allows to compute how often each instruc
tion!context pair is maximally encountered Combined with the categoriza
tions of our cache analysis the overall number of cache hits and cache misses
can be estimated 
see Figure 
In our experiments we have circumvented the program path analysis problem
and combine the categorizations cat with exact execution pro	les instead
of worst case execution pro	les 
see Figure  This allows us to assess the
eectiveness of our analysis without the inuence of possibly pessimistic path
analyses The pro	lers that produce the pro	les are produced with the help
of qpt 
Quick program Pro	ler and Tracer that is part of the WARTS dis
tribution A pro	ler for a program computes an execution pro	le prole ie
the execution counts for the instruction!context pairs
prole  IC  N


Table 
Test set of C programs with number of instructions
Name Description Inst
matmult 
x
 matrix multiplication 	
ndes

data encryption 	
matsum

	

x	

 matrix summation 	
dhry Dhrystone integer benchmark 
fdct

JPEG forward discrete cosine transform 

stats two arrays sum mean variance standard deviation and 
linear correlation
fft fast Fourier transformation 		

djpeg

JPEG decompression 	x color image 	

lloops Livermore loops in C 
avl inserts and deletes 	


 elements in an AVL tree 	

Worst case input data

Random input data
For the experiments we use parts of the program suites of Frank M"uller 
the djpeg and fdct program of YauTsun Steven Li  and some additional
programs 
see Table  For some programs there exists a worst case input
so that our execution pro	les are worst case execution pro	les The programs
are compiled with the GNU C compiler version  under SunOS  with
O and 
if applicable the FDLIBM 
Freely Distributable LIBM library of
SunPro version 
The programs fft stats and lloops use arithmetic library functions These
functions are more or less structured into treatment of special cases normal
ization computation and 	nal rounding Not all parts are necessarily executed
when the function is called This uncertain execution path typically leads to
relatively many occurrences of nc in our categorizations
The executable of lloops consists of more than  loops in deeply nested loop
nests This program structure leads to a very high number of distinguished
execution contexts with the VIVU approach
The AVL tree as implemented in avl
 is a height balanced binary tree Every
insert or delete operation may lead to a series of recursive calls for rebalancing
The code of the insert and delete operations consists of many cases for the
dierent rebalancing operations called rotations Such a program structure
seems to be rather typical for the handling of many dynamic data structures

Table 
The numbers of occurrences of ah am and nc in the categorizations for a 	KB
way set associative instruction cache with 	 byte linesize
callstring
 callstring	 VIVU
Name ah am nc ah am nc ah am nc
matmult 		 	  	  	 
 
 

ndes  	 		   		 	
 	 
matsum  	 	 	  	 	  

dhry  
 	
   	
  	 	
fdct    	  
 	  

stats 		 	 	 	  	 		
 	 	
fft 	 	  	   		 	
 
djpeg 	    	  	
 	 
lloops   	 
 
 
  	 	
avl   	 			 	 

   	

Table  shows the distribution of ah am and nc in the categorizations for
the test programs for callstring
 callstring
 and VIVU for one selected
cache con	guration The sum of ah am and nc in the categorizations is the
number of distinguished instruction!context pairs It is a measure for the
complexity of the analysis In our current implementation the categorization
for a given cache con	guration can be computed within seconds on a SUN
SPARCstation  for most of our test programs but the computation for
lloops with VIVU requires about  minutes In our implementation there is
room for improvements though
To give a more expressive presentation of the results of our experiments than
bounds on cache hit ratios we assume an idealized hardware that executes
all instructions that result in an instruction cache hit in one cycle and all
instructions that result in an instruction cache miss in  cycles


The cache behavior of the test programs for dierent cache con	gurations is
computed by simulating the cache for the program trace The cache simula
tion is always started with the empty cache and we assume uninterrupted
execution For technical reasons instructions in functions from dynamic link
libraries

are not traced and their eects on the cache are therefore ignored
From the number of hits and misses in the trace we compute the execution
time ET of our idealized hardware

These are the same parameters as used in 	

In our case these are the calls to IO routines and timers

With our categorization an upper and a lower bound of the execution time
can be computed by combining the pro	les with the results of our analysis An
upper bound of the execution time is given if we count all instructions in the
pro	le as misses that cannot be determined from the categorization as cache
hits A lower bound of the execution time is given if we count all instructions
in the pro	le as hits that cannot be determined from the categorization as
cache misses The upper and lower bounds of the test programs for various
cache con	gurations are shown in Figures  and  in percent of the execution
time ET 
the meaning of the x axis tic marks is given in Table 
Table 
The cache parameters size  level of associativity of the x axis tic marks of Figures 
and  The linesize is 	 bytes
	B	 	B 	B B	 B
B 	B	 	B 		B 
	B
	B	 	B 	kB	 	kB 	kB
kB	 kB kB 	kB	 
kB
kB kB	 kB kB 
kB
Figures  and  can be interpreted as follows
 The VIVU approach generally leads to the most precise predictions
 Conditionally executed code eg as found in the arithmetic library functions
or in avl
 can lead to less precise predictions which result from many nc
in the categorizations
 There can be a wide variation of the quality of the prediction depending on
the cache con	guration
 For all test programs our method 
especially with VIVU gives much better
results than the naive methods that counts all memory references as misses
for a WCET estimation and as hits for a BCET estimation
 Related Work
The computation of WCETs for realtime programs is an ongoing research
activity Park and Shaw  describe a method to derive WCETs from the
structure of programs In  Puschner and Koza propose methods to guide
the computation of WCETs by user annotations like maximal loop counts
This approach seems to be commonly used in WCET analysis tools Both
approaches do not take cache behavior into account
The possibilities to use optimizing compilers to improve cache performance of
programs has extensively been studied   But all the proposed

99.90%
100.00%
100.10%
100.20%
100.30%
100.40%
100.50%
100.60%
100.70%
100.80%
100.90%
1 4 7 10 13 16 19 22 25
matmult/callstring(0)
99.98%
99.99%
100.00%
100.01%
100.02%
100.03%
100.04%
100.05%
100.06%
100.07%
100.08%
1 4 7 10 13 16 19 22 25
matmult/callstring(1)
90.00%
95.00%
100.00%
105.00%
110.00%
115.00%
1 4 7 10 13 16 19 22 25
matmult/VIVU
UB
LB
40.00%60.00%
80.00%100.00%
120.00%140.00%
160.00%180.00%
200.00%220.00%
240.00%260.00%
1 4 7 10 13 16 19 22 25
ndes/callstring(0)
90.00%
95.00%
100.00%
105.00%
110.00%
115.00%
1 4 7 10 13 16 19 22 25
ndes/callstring(1)
94.00%
96.00%
98.00%
100.00%
102.00%
104.00%
106.00%
108.00%
1 4 7 10 13 16 19 22 25
ndes/VIVU
UB
LB
99.30%
99.40%
99.50%
99.60%
99.70%
99.80%
99.90%
100.00%
100.10%
100.20%
1 4 7 10 13 16 19 22 25
matsum/callstring(0)
99.65%
99.70%
99.75%
99.80%
99.85%
99.90%
99.95%
100.00%
100.05%
1 4 7 10 13 16 19 22 25
matsum/callstring(1)
90.00%
95.00%
100.00%
105.00%
110.00%
115.00%
1 4 7 10 13 16 19 22 25
matsum/VIVU
UB
LB
20.00%
40.00%
60.00%
80.00%
100.00%
120.00%
140.00%
1 4 7 10 13 16 19 22 25
dhrystone/callstring(0)
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
110.00%
120.00%
130.00%
1 4 7 10 13 16 19 22 25
dhrystone/callstring(1)
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
110.00%
120.00%
1 4 7 10 13 16 19 22 25
dhrystone/VIVU
UB
LB
82.00%
84.00%
86.00%
88.00%
90.00%
92.00%
94.00%
96.00%
98.00%
100.00%
102.00%
1 4 7 10 13 16 19 22 25
fdct/callstring(0)
90.00%
95.00%
100.00%
105.00%
110.00%
115.00%
1 4 7 10 13 16 19 22 25
fdct/callstring(1)
90.00%
95.00%
100.00%
105.00%
110.00%
115.00%
1 4 7 10 13 16 19 22 25
fdct/VIVU
UB
LB
50.00%
100.00%
150.00%
200.00%
250.00%
300.00%
1 4 7 10 13 16 19 22 25
stats/callstring(0)
90.00%
100.00%
110.00%
120.00%
130.00%
140.00%
150.00%
160.00%
170.00%
180.00%
1 4 7 10 13 16 19 22 25
stats/callstring(1)
99.96%
99.98%
100.00%
100.02%
100.04%
100.06%
100.08%
100.10%
100.12%
100.14%
1 4 7 10 13 16 19 22 25
stats/VIVU
UB
LB
Fig  Upper UB and lower bounds LB for the execution time for dierent cache
parameters in  of execution time for callstring
 callstring	 and VIVU

60.00%
80.00%
100.00%
120.00%
140.00%
160.00%
180.00%
200.00%
1 4 7 10 13 16 19 22 25
fft/callstring(0)
70.00%
80.00%
90.00%
100.00%
110.00%
120.00%
130.00%
140.00%
150.00%
1 4 7 10 13 16 19 22 25
fft/callstring(1)
80.00%
85.00%
90.00%
95.00%
100.00%
105.00%
110.00%
115.00%
120.00%
125.00%
130.00%
1 4 7 10 13 16 19 22 25
fft/VIVU
UB
LB
50.00%
100.00%
150.00%
200.00%
250.00%
300.00%
1 4 7 10 13 16 19 22 25
djpeg/callstring(0)
70.00%
80.00%
90.00%
100.00%
110.00%
120.00%
130.00%
140.00%
150.00%
160.00%
170.00%
1 4 7 10 13 16 19 22 25
djpeg/callstring(1)
70.00%
75.00%
80.00%
85.00%
90.00%
95.00%
100.00%
105.00%
110.00%
115.00%
120.00%
1 4 7 10 13 16 19 22 25
djpeg/VIVU
UB
LB
80.00%
100.00%
120.00%
140.00%
160.00%
180.00%
200.00%
220.00%
240.00%
1 4 7 10 13 16 19 22 25
lloops/callstring(0)
85.00%
90.00%
95.00%
100.00%
105.00%
110.00%
115.00%
120.00%
125.00%
130.00%
1 4 7 10 13 16 19 22 25
lloops/callstring(1)
85.00%
90.00%
95.00%
100.00%
105.00%
110.00%
1 4 7 10 13 16 19 22 25
lloops/VIVU
UB
LB
40.00%
60.00%
80.00%
100.00%
120.00%
140.00%
160.00%
180.00%
200.00%
1 4 7 10 13 16 19 22 25
avl2/callstring(0)
40.00%
60.00%
80.00%
100.00%
120.00%
140.00%
160.00%
180.00%
1 4 7 10 13 16 19 22 25
avl2/callstring(1)
60.00%
80.00%
100.00%
120.00%
140.00%
160.00%
180.00%
1 4 7 10 13 16 19 22 25
avl2/VIVU
UB
LB
Fig  Upper UB and lower bounds LB for the execution time for dierent cache
parameters in  of execution time for callstring
 callstring	 and VIVU
program transformations and code reorganizations do not necessarily help in
computing the worst case execution time of a program
An overview of Cache Issues in RealTime Systems is given in  We restrict
our examination here to the intrinsic cache behavior
The work of Arnold M"uller Whalley and Harmon has been one of the start
ing points of our work  describes a data ow analysis for the prediction
of instruction cache behavior of programs for direct mapped caches The ex
tension to set associative instruction caches has later been given in  Two
data ow analyses are used The result of the 	rst corresponds to the result of
our may analysis The second is only required for set associative caches for the
categorization of instructions within loops It corresponds to the 	rst analysis

whereby the loop back edges are deleted in the control ow graph In contrast
to our method that derives semantics based categorizations of memory refer
ences only from the results of our analyses an additional complex bottomup
algorithm over the control ow graph is used to compute a classi	cation of the
instructions for each loop level The distinction of a 	rst or a further execution
of a loop is not explicit but expressed by the classi	cations rst miss and rst
hit For a set of small programs the same or slightly worse upper bounds of the
execution time than our results are reported in 

 But the assessment is
dicult as the environment for the experiments is not the same eg dierent
compilers have been used to compile the test programs
In  YauTsun Steven Li Sharad Malik and Andrew Wolfe describe an
integrated method to determine the worst case execution path of a program
and to model architecture features like instruction caches and!or pipelines
The problem of 	nding an accurate worst case execution time bound is for
mulated as an integer linear program that must be solved which is a NPhard
problem This approach has been implemented in the cinderella tool
	
 Un
like the method described in  or our method that rely only on the control
ow graph to determine the cache behavior of a memory reference user pro
vided functionality constraints can be used to describe the control ow more
precisely For direct mapped instruction caches and programs whose execution
path is well de	ned and not very input dependent the predictions can be com
puted fast and are very accurate  Increasing levels of associativity where
the cache behavior of one memory reference depends on more other references
and less de	ned execution paths lead to prohibitively high analysis times
In  Lim et al describe a general framework for the computation of WCETs
of programs in the presence of pipelines and cache memories Two kinds
of pipeline and cache state information are associated with every program
construct for which timing equations can be formulated One describes the
pipeline and cache state when the program construct is 	nished The other
can be combined with the state information from the previous construct to
re	ne the WCET computation for that program construct Unlike our method
that is based on well explored theories and tools for abstract interpretation
the set of timing equations must be explicitly solved An approximation to
the solution for the set of timing equations has been proposed The usage of
an input and output state provides a way for a modularization for the timing
analysis Experimental results are reported for three small programs but they
cannot be easily compared with our experiments
The approach of Lim et al has also been applied to data caches In  Hur
et al treat references to unknown addresses as two cache misses The reported

For the sake of space the results of not all programs could be reported here
	
See httpwwweeprincetoneduyaulicinderella


results are worse than the ones without data cache analysis where one assumes
one cache miss for every data reference But the authors expect that the results
improve with better methods to resolve addresses of data references For loops
that reference only data that 	t entirely into the cache Kim et al  have
improved the approach based on the pigeonhole principle Applied to the cache
analysis the pigeonhole principle says If we have n memory reference to m
memory locations and n  m and all referenced memory blocks 	t into the
cache then there must inevitably some cache hits
A method for the data cache analysis by graph coloring is described in 
Similar to the ChowHennessy register allocator variables are allocated to
cache lines The objective of the analysis is to show that throughout the live
range of a cache line no other memory access interferes with this particular
cache line This approach has limited success even for small programs
 Conclusion and Future Work
We have described semantics based analysis methods by abstract interpreta
tion that allows to predict the intrinsic cache behavior of programs for various
types of one level caches The theory of abstract interpretation supports the
correctness proofs for the analysis and provides ecient implementation meth
ods
The analyzers are generated by the program analyzer generator PAG from very
concise speci	cations It is possible to trade time for precision but even with
the VIVU approach our implementation of the analyses is quite fast No special
input of a skilled user is required to tune for acceptable results This makes it
feasible to use our analyses as part of the compilation process to support the
automatic schedulability analysis by the compiler
The applicability of our methods has been shown with the results of our prac
tical experiments The newly developed VIVU approach makes it possible to
predict the cache behavior within tight bounds for many programs and cache
con	gurations
We directly analyze executables and there are no special compilers or link
ers required Our current implementation supports the SPARC architecture
Other architectures can be supported by supplying additional front ends to our
analyzers The analyses are extensible to accommodate further cache designs
like multilevel caches or wrap around line 	ll
Future work includes the integration of our tool with a program path analysis
We are working on extension to predict the pipeline behavior of processors

The pipeline analyzers will be generated from a description similar to the
speci	cations used for the generation of code schedulers For the analysis of
array references there exist methods based on data dependency analysis which
should be combined with our approach Finally we will explore methods that
allow to combine the separated analyses of modules libraries or operating
systems calls and thereby support the modularization of the analysis
Acknowledgement
We like to thank Mark D Hill James R Larus Alvin R Lebeck Madhusud
han Talluri and David A Wood for making available the Wisconsin architec
tural research tool set 
WARTS Thomas Ramrath for the implementation
of the PAG front end for SPARC executables YauTsun Steven Li and Frank
M"uller for providing their benchmark programs and the latter for fruitful
discussions
References
	 M Alt and F Martin Generation of Ecient Interprocedural Analyzers with
PAG In SAS Static Analysis Symposium LNCS  pages 
 Springer
Sept 	
 M Alt F Martin and R Wilhelm Generating Dataow Analyzers with PAG
Technical Report A	
 Universitat des Saarlandes 	
 R Arnold F Mueller D B Whalley and M Harmon Bounding WorstCase
Instruction Cache Performance In IEEE Symposium on RealTime Systems
pages 			 Dec 	
 S Basumallick and K Nilsen Cache Issues in RealTime Systems In
Proceedings of the  ACM SIGPLAN Workshop on Language Compiler
and Tool Support for RealTime Systems June 	
 P Cousot and R Cousot Abstract Interpretation A Unied Lattice Model for
Static Analysis of Programs by Construction or Approximation of Fixpoints In
Conference Record of the th ACM Symposium on Principles of Programming
Languages pages  Jan 	
 P Cousot and R Cousot Static Determination of Dynamic Properties of
Generalized Type Unions In Proceedings of an ACM Conference on Language
Design for Reliable Software volume 	 pages  Mar 	
 P Cousot and R Cousot Static Determination of Dynamic Properties of
Recursive Procedures Formal Description of Programming Concepts pages
 	
 
 W A Halang and K M Sacha RealTime Systems World Scientic 	
 L Harrison Personal communication on Abstract Interpretation Dagstuhl
Seminar 	
	
 J Hennessy and D Patterson Computer Architecture A Quantitative
Approach Morgan Kaufmann 	

		 Y Hur Y H Bea SS Lim BD Rhee S L Min Y C Park M Lee H Shin
and C S Kim Worst case timing analysis of RISC processors R


R
	

case study In IEEE RealTime Systems Symposium pages 
	 Dec 	
	 S Kim S Min and R Ha Ecient worst case timing analysis of data caching
In IEEE RealTime Technology and Applications Symposium June 	
	 J R Larus EEL Guts Using the EEL Executable Editing Library Computer
Sciences Department University of WisconsinMadison 	
	 YT S Li and S Malik Performance Analysis of Embedded Software Using
Implicit Path Enumeration In Proceedings of the nd ACMIEEE Design
Automation Conference pages 	 June 	
	 YT S Li S Malik and A Wolfe Ecient Microarchitecture Modeling and
Path Analysis for RealTime Software In Proceedings of the IEEE RealTime
Systems Symposium pages 
 Dec 	
	 YT S Li S Malik and A Wolfe Cache Modeling for RealTime Software
Beyond Direct Mapped Instruction Caches In Proceedings of the IEEE Real
Time Systems Symposium Dec 	
	 SS Lim Y H Bae G T Jang BD Rhee S L Min C Y Park H Shin
K Park SM Moon and C S Kim An Accurate Worst Case Timing Analysis
for RISC Processors IEEE Transactions on Software Engineering 	

 July 	
	 S McFarling Program Optimization for Instruction Caches In Architectural
Support for Programming Languages and Operating Systems pages 			
Boston Massachusetts Apr 	 Association for Computing Machinery ACM
	 A Mendlson S S Pinter and R Shtokhamer Compile Time Instruction Cache
Optimizations Computer Architecture News 		 Mar 	

 F Mueller Static Cache Simulation and its Applications Phd thesis Florida
State University July 	
	 F Mueller Generalizing Timing Predictions to SetAssociative Caches
Technical Report TR  Institut fur Informatik HumboldtUniversity July
	
 F Mueller D B Whalley and M Harmon Predicting Instruction Cache
Behavior In Proceedings of the  ACM SIGPLAN Workshop on Language
Compiler and Tool Support for RealTime Systems June 	

 K D Nilsen and B Rygg WorstCase Execution Time Analysis on Modern
Processors In Proceedings of the  ACM SIGPLANWorkshop on Language
Compiler and Tool Support for RealTime Systems June 	
 C Y Park and A C Shaw Experiments with a Program Timing Tool Based
on SourceLevel Timing Schema IEEE Computer  May 		
 K Pettis and R C Hansen Prole Guided Code Positioning In
ACM SIGPLAN Conference on Programming Language Design and
Implementation pages 	 White Plains New York June 	

 A K Portereld Software Methods for Improvement of Cache Performance
on Supercomputer Applications Phd thesis Rice University May 	
 P Puschner and C Koza Calculating the Maximum Execution Time of Real
Time Programs RealTime Systems 			 	
 J Rawat Static Analysis of Cache Performance for RealTime Programming
Masters thesis Iowa State University May 	
 M Sharir and A Pnueli Two Approaches to Interprocedural Data Flow
Analysis In S S Muchnick and N D Jones editors Program Flow Analysis
Theory and Applications chapter  pages 	 PrenticeHall 		

 A Smith Cache Memories ACM Computing surveys 	
 Sept
	
	 J A Stankovic RealTime and Embedded Systems ACM 
th Anniversary
Report on RealTime Computing Research
 A D Stoyenko V C Hamacher and R C Holt Analyzing HardReal
Time Programs For Guaranteed Schedulability IEEE Transactions on Software
Engineering 	 Aug 		
 R Wilhelm and D Maurer Compiler Design International Computer Science
Series AddisonWesley 	 Second Printing
 M E Wolf and M S Lam A Data Locality Optimizing Algorithm SIGPLAN
Notices 
 June 		 Proceedings of the ACM SIGPLAN 	
Conference on Programming Language Design and Implementation

