In 
Introduction
The tight analysis of reaction times is a major criterion in the design of embedded systems. In particular, safety critical applications often require that certain tasks must be completed before a strict deadline. To this end, a variety of methods to estimate the worst case execution time (WCET) [23] have been proposed. Usually, one distinguishes between two approaches: the high-level and the low-level WCET analysis [9] .
High-level analysis is applied to an architecture independent description of the system to obtain approximate timing estimates in early design phases. For example, in hardware/software co-design WCET analysis helps the designer to decide which parts of the system should be implemented in hardware. At this stage, the primary goal is to quickly get a rough estimate of the worst case execution time without particular demands on accuracy and tightness. High-level analysis can be performed by determining path information [22] such as unfeasible computations or bounds on the maximal number of loop iterations. Clearly, high-level WCET analysis is undecidable when infinite data types are used, and therefore only limited automation can be achieved. State of the art approaches use abstract interpretation [10] , symbolic execution [17, 15] , special restrictions on loops [11] , and computer algebra [1] . In contrast, low-level analysis is performed in late design phases. Thus, it depends on the hardware/software partitioning as well as on the chosen architecture (microcontroller). At this stage, tight and safe estimates are mandatory to ensure that all requirements on the timing are met. A straightforward approach to low-level WCET analysis is to compute the execution time of each basic block 1 and to add these up afterwards. However, simply adding up the execution times of the basic blocks yields pessimistic bounds. The major problem of a tight WCET analysis is that the maximal number of computation steps, e.g. in a loop, may heavily depend on the input data.
Another important issue concerning WCET analysis is the increasing complexity of embedded systems. There is a growing trend towards distributed and parallel systems which are notoriously hard to implement. On the background of safety critical applications and their economical impact, it is therefore surprising that the design of embedded real-time systems still suffers from a lack of seamlessly integrated design tools. One reason for this is that embedded systems are usually heterogeneous, i.e., their development often requires the design of software at different levels of abstraction as well as specialized hardware designs. The increasing size and complexity of software naturally raised the need for new description languages like message sequence charts, SystemC, statecharts, synchronous languages, and others. However, the introduction of new tools and methods has made the design flow even more heterogeneous which complicates WCET analysis. For this reason, WCET analysis techniques on the level of machine code provide a unifying basis for software descriptions, since all the different tools finally generate assembler programs.
Recently, we presented a new approach to WCET analysis which is based on translating programs given in synchronous languages like Esterel [2] or Quartz [24, 25] to an equivalent finite state transition system [16] . The key idea is then to use symbolic methods for a complete state space exploration where all possible inputs are considered at once. During this state space exploration, the minimal and maximal numbers of transitions are counted.
In this paper, we follow a similar approach to [16] , but do not presuppose an implementation given in a synchronous language. Instead, we take into account the target architecture to open our tools towards other design flows and to obtain an accurate estimation of the worst case execution time. This is achieved by considering assembler programs. Using symbolic simulation of assembler programs, we are able to consider all possible computations and to determine tight WCET bounds. The estimates are given at a logical level, since we compute the number of executed assembler instructions. For the sake of simplicity, we do not yet take into account pipeline and cache effects, even though this is possible.
Our approach takes advantage of established techniques for efficiently manipulating large systems using implicit set representations. These have been originally developed for representing finite state transition systems by means of binary decision diagrams (BDDs) [4] . BDDs provide a canonical representation of formulas in propositional logic and have been used for many applications, in particular for model checking [7] , equivalence checking [8] , and other forms of state space exploration. However, using propositional logic has the drawback that only finite data types can be considered. In addition, these have to be encoded by Boolean variables to be representable by BDDs. However, the large state space of the obtained data flow is one of the main sources of the state space explosion problem.
Due to these difficulties and restrictions, we use Presburger arithmetic instead of propositional logic to capture the semantics of assembler programs at a higher level of abstraction. Presburger arithmetic has also been used for many other applications, e.g. for symbolic model checking [6] and for reachability analysis of extended finite state machines (EFSMs) [14] . Presburger arithmetic can be translated to finite automata [3, 12, 28] which equips us with an efficient data structure for storing and manipulating possibly infinite sets during symbolic simulation. In order to achieve a compact representation of automata, we have chosen a semi-symbolic representation where sets of transitions between automaton states are encoded with BDDs. To summarize, our approach offers the following advantages:
-tight estimates by handling semantic dependencies -no overestimation even for data dependent loops -independence of the design flow -architecture specific (processor, compiler, etc.) -fully automated, i.e. no manual annotations required
In the next section, we define XPres, an extension of Presburger arithmetic. We need this logic to formally define the semantics of assembler programs. Then, we sketch the translation of XPres formulas to automata (Section 3) and consider some details of our implementation (Section 4) that are important to obtain industrial strength tools. In Section 5, we describe how XPres is used to translate assembler programs to transition systems, and show how WCET analysis is performed by symbolic simulation. Finally, we present experimental results (Section 6) and conclude with a summary and directions for future work (Sections 7).
Extended Presburger Arithmetic XPres
As already mentioned, the basic formalism of our approach is an extension of Presburger arithmetic [21] . Presburger arithmetic is a decidable subset of the first-order theory of the natural numbers where multiplication and division are not allowed. In contrast to the original definition of Presburger arithmetic, we interpret the logic over the integers Z rather than over the natural numbers N. Furthermore, we introduce logical operations on numbers since these are common to all instruction set architectures and frequently used in assembler programs. The syntax of extended Presburger arithmetic is defined as follows: 
The set of extended Presburger formulas XPres V is the smallest set satisfying the following rules:
In order to explain the semantics of XPres, we need some notation on bitvector arithmetic. First of all, we have to define a particular encoding of integers by strings over {0, 1}.
In the following, we use the two's complement encoding with b i ∈ {0, 1}:
Recall that the most significant bit b n determines the sign: b n = 1 yields negative numbers, and b n = 0 yields positive numbers or zero. Moreover, a well-known matter of fact (that we use later on) is the sign extension lemma which means that the following equation is valid for any bitvector
For this reason, there are infinitely many bitstrings b such that enc(b) = c holds for any c ∈ Z. Hence, the encoding is not injective, but surjective, which suffices for our purposes. The value of a Boolean variable can be simply encoded by the leading bit, i.e., the sign. 
Definition 2 (Semantics of Ext. Presburger Arithmetic) Given a variable assignment
In the following, let ϕ :≡ {ς | ζ ς * (ϕ) = 1} denote the set of assignments that satisfy a formula ϕ ∈ XPres V .
Translating XPres to Automata
It is well-known that Presburger arithmetic is decidable [21] . In particular, one can translate every Presburger formula to a finite automaton A ϕ that encodes the models ϕ of ϕ (see also [5, 3, 28] ). As there exists for every finite automaton an equivalent deterministic one, and even more a canonical (minimal) one, we can use automata as canonical representations for ϕ . This is very much in the same spirit as binary decision diagrams (BDDs) are used as canonical normal forms for propositional logic.
The relationship between finite automata and sets of variable assignments ς : V → Z is established via our encoding of integers by bitvectors. For a finite set of variables V = {v 1 , . . . , v m }, we consider nonempty finite words over the alphabet B m and interpret them as variable assignments. More precisely, a word w over the alphabet B m with
Using this encoding scheme, the i-th row is a bitvector encoding of the value of the i-th variable, and the j-th column is read by an automaton in its j-th processing step. Hence, the number of variables is finite and fixed, whereas their bitwidth is also finite, but arbitrarily large.
} is a set of integer variables, the assignment ς w with ς w (v 1 ) = −3, ς w (v 2 ) = 9, ς w (v 3 ) = 6 is represented by the following word w over the alphabet B 3 :
The relationship between sets of assignments and finite automata is as follows: 
As a subtle matter of fact, we have to distinguish between language equivalence and model equivalence: Given two automata A 1 and A 2 with Lang(A 1 ) = Lang(A 2 ), we also have Models V (A 1 ) = Models V (A 2 ), but not vice versa.
This is due to the sign extension, since it may be the case that an automaton accepts one particular bitvector b that represents an integer enc(b), but not all of the bitvectors that encode enc(b). For this reason, we define a closure operation on automata in the following way: 
The sign extension closure ensures that two different words which represent the same assignment are either both accepted or both rejected by an automaton. Obviously, the closure of an automaton can be computed by successively adding transitions to ∆. We then have the following lemma that is important for our construction procedure:
Lemma 1 (Sign Extension Closure)
The following properties hold for the sign extension closure:
As a corollary, checking Models V (A 1 ) = Models V (A 2 ) for automata A 1 and A 2 is decidable, since this problem can be reduced via the closure operation to language equivalence. A decision procedure for Presburger arithmetic is obtained by translating a given formula ϕ ∈ XPres V to an automaton A ϕ such that ϕ = Models V (A ϕ ). Hence, the formula ϕ is valid iff the automaton A ϕ accepts any word, and the formula ϕ is unsatisfiable iff Lang(A ϕ ) is empty.
Theorem 1 (Translating XPres V to Automata) Given a finite set of variables V, there is an algorithm that computes for any formula ϕ ∈ XPres
By Definition 1, a term can contain arithmetic and bitvector operators at the same time. This complicates the construction of automata for expressions which consist of both types of operators. In contrast, the translation from purely arithmetic or bitvector type expressions to automata is much easier. For this reason, we separate the arithmetic parts of an expression from its bitvector parts by means of the following rules, where ∈ {=, =, <, ≤, >, ≥} and • ∈ { ∧, ∨, ⊕, ↔, →}:
For example, the equation (x + y) ⊕z = c is translated to the formula ∃u. ((u = x + y) ∧ (u ⊕z = c) ). If these rules are recursively applied to an expression, one finally obtains a formula whose basic equations and inequations are either of completely arithmetic or of completely bitvector type. In the following, we describe the construction of automata for both types of formulas. For the sake of simplicity, we restrict ourselves to equations. In principle, inequations can be handled similarly even though the details are more complicated.
The construction of automata for pure bitvector equations is straightforward. Given an initial state q 0 , an accepting state q a , and a sink state q s , the automaton evaluates the equation at each step according the current input letter. If the equation is satisfied for the given bitvector, the automaton moves to the accepting state q a . Otherwise, it moves to the sink state q s which cannot be left. In this way, it is guaranteed that all the letters of a word satisfy the equation. As an example, Figure 1 shows the automaton for the equation x ∧y = z. The initial state is used to prevent the automaton from accepting the empty word.
Figure 1. Automaton representing bitwise AND
The construction of automata for Boolean variables is also simple: It suffices to test the sign of the corresponding bitvector. The construction of automata for arithmetic equations is somewhat more complex. In the following, we adopt the method given in [28] . As a first step, the equation to be encoded is brought to the form 
Note that the transitions from the initial state read the most significant bits which have a negative value in two's complement notation. Figure 2 shows a sample automaton which accepts the solutions to the equation x + y = z.
States from which there is no path to an accepting state can be collapsed to a single sink state such that the resulting automaton becomes complete.
Figure 2. Automaton representing addition
Once we have constructed the basic automata, we can combine them using the following definitions: Given two automata
The Boolean operations follow the usual definitions of automata theory, as e.g. described in [13] . Existential quantification can be performed by erasing one or more positions in the bitvectors that constitute the alphabet. In contrast to the Boolean operations, the projection operation does not preserve model consistency. For this reason, the closure operation has to be applied after each projection step. Furthermore, we add a determinization step since the resulting automaton may be nondeterministic after the projection operation. Otherwise, we were not able to compute the complement of an automaton as shown above 2 . However, complementation is used for performing universal quantification, since ∀v.ϕ :≡ ¬∃v.¬ϕ.
On the one hand, it follows from the translation of formulas to equivalent automata that the set of assignments that satisfy an XPres formula is regular. On the other hand, there also exist assignments which are not regular and can thus not be represented by a finite automaton. However, this is not a limitation, since all computable functions can be encoded by a regular relation as will be shown later.
Implementation Issues
For our implementation, we have chosen a semi-symbolic representation where the states of an automaton are stored explicitly and the transitions between two states are encoded by BDDs. Thus, for an automaton A with state set Q, we have at most Q 2 BDDs (one for each transition which encodes a set of letters). It is important to note that the number of states is usually rather small, whereas the alphabet can be exponentially large. Let The basic data structure we use is an adjacency list whose size is O( Q 2 ). Since Q 2 2 m for most automata, every transition is labeled with many letters. Hence, storing transitions between states explicitly does not cause significant memory consumption as long as sets of letters are represented symbolically. A similar encoding is also found in [12] . Figure 3 illustrates the representation of the automaton shown in Figure 2 where each f i is a BDD with
For each state q ∈ {q 0 , 0, −1} and for each transition, there is exactly one BDD representing a set of letters. The operations on the chosen data structure are implemented as follows: Given complete and deterministic automata A ϕ and A ψ with F ψ ) , the Boolean operations follow immediately from the above definitions with • ∈ {∧, ∨, ⊕, ↔, →}:
Since existential quantification of a variable v i is performed by erasing the i-th position of the alphabet, we abstract the corresponding variable w i in the associated BDD:
As mentioned previously, we have to perform a determinization step after each projection operation. To keep the size of the resulting automaton small, we construct only reachable states and omit transitions labeled with a zero BDD. In addition, the automata are minimized regularly.
WCET Analysis
In the previous sections, we described the base logic XPres and its translation to finite automata. In this section, we show how assembler programs can be translated to equivalent transition systems. To this end, we introduce the notion of integer Kripke structures which serve as the basis for the WCET analysis algorithm.
Definition 5 (Integer Kripke Structure (IKS)) An integer Kripke structure (IKS) over a finite set of variables V is a transition system K = (S, I, R) where S is the possibly infinite set of states, I ⊆ S are the initial states, and R ⊆ S × S is the transition relation. Every state ξ ∈ S is a variable assignment ξ : V → Z. In addition, it is required that the set of initial states I and the transition relation R are definable in XPres, i.e. they must be regular sets.
According to the above definition, a state of an IKS is a variable assignment for the variables V. Regarding the modeling of an assembler program, an assignment describes the current values of the processor's registers and the used memory locations. As the program proceeds with its execution, it changes some of the register contents and therefore, we have a new assignment at the next point of time (cf. Figure 4) . Hence, we can represent the transition relation of a program by a Presburger formula XPres V∪V where V and V are the register contents of the current and the next state, [18] are expressible in Presburger arithmetic.
The translation of an assembler program to an IKS is straightforward, provided that we have a definition of the semantics of the processor's instruction set. Assume that for each register r we have an associated pair of variables r and r . In particular, let pc and pc denote the program counter for the current and next instruction, respectively. Moreover, let P = I 0 , . . . , I n be the program to be modeled, i.e. a sequence of instructions such that I k is the instruction located at memory address k. Then, the transition relation for P can be obtained as follows where S(I) defines the semantics of an instruction I (cf. Table 1) :
The set of initial states I specifies the initial register contents and the address of the first instruction to be executed. To obtain tight bounds on the execution time, it is often required to specify preconditions for the program. Suppose for example that one of the inputs is given by an analog/digital converter which produces values in the range between 0 and 2 8 − 1. In contrast to other approaches to WCET analysis, we can define the initial state set using this information to avoid overestimation.
Preconditions are also useful in compositional methods where the system is divided into parts which are analyzed separately. The overall execution time can be obtained by adding up the results for the subsystems. As mentioned in the introduction, this yields an overly pessimistic estimation if dependencies between the subsystems are neglected. However, using symbolic simulation, we can determine the set of possible output values for one subsystem and use this set as input constraints for other subsystems.
In the following, we consider the MIPS32 architecture which defines a simple yet powerful instruction set [19] . Table 1 shows the semantics of some selected instructions. Note that memory locations are modeled by a set of variables M i . 3 The following function ensures that registers, which are not affected by an instruction, remain unchanged:
In order to execute a program symbolically, we have to compute the successor states for a set of states Q, i.e., the image of the transition relation with respect to Q: 
rd, rs, rt and
As usual in symbolic methods, image computation can be performed by intersection, existential quantification (projection), and variable substitution [8] . Since all inputs are considered at once during image computation, we can traverse the state space in a breadth first manner where at each step all successor states are explored. Given an IKS K = (S, I, R) and a set of final states F , the worst case execution time of a program modeled by a transition relation R can be computed by the following algorithm:
function WCET : n := 0; Q := I; while Q ⊆ F do Q := Img R (Q); n := n + 1; end; return n; end;
Usually, final states are those states where the program counter points to the last instruction (e.g. return from subroutine). As an example for the construction of an IKS from an assembler program, Figure 4 shows the core of a von Neumann adder, its translation to Presburger arithmetic, and a part of the corresponding Kripke structure.
It can happen that the above algorithm does not terminate, in particular for programs whose execution times depend on the input values. In practice, this is not a restriction since we can constrain the initial register contents to their minimal and maximal values, respectively. Additionally, we can stop the algorithm when the time budget given to the program is exceeded as proposed in [10] .
Experimental Results
To evaluate our approach, we implemented and integrated the algorithms in our Equinox toolbox [26] and applied them to some benchmarks. All experiments were performed For the representation and manipulation of BDDs we used the CUDD package [27] . The benchmarks were written in C and compiled with the GNU C-compiler to obtain assembly code for the MIPS R3000 processor family (code optimization was enabled using option -O2). Table 2 shows the results for different bitwidths which are given as preconditions of the form 0 ≤ x < 2 n and −2 n/2 ≤ x < 2 n/2 , respectively. The worst case execution times (WCET) are given in number of executed instructions. The runtimes of our analysis algorithm are given in seconds. The first benchmark program (von Neumann adder) could be analyzed without any modifications of the assembly code in less than one minute. For the IntRoot benchmark, the runtimes are worse, but still acceptable.
The RussMult and Booth benchmarks were simplified to reduce the runtime of the WCET algorithm. The idea behind this simplification is that we are not interested in the results of the algorithms, but merely in their worst case execution times. In other words, we can omit some computations provided that the algorithm's execution time remains unaffected. For the two benchmarks, we simply replaced those instructions that compute the result, but do not affect the termination condition of the loops with NOPs.
The benchmark ErGu97 was taken from [10] and consists of two interdependent loops. Since the number of iterations of the loops is bounded, the runtimes of our algorithm are nearly constant. The last two benchmarks could not be simplified in a straightforward manner and are hardly tractable with symbolic simulation since the automata become exponentially large. Note that for all the benchmarks except Collatz and IntRoot, an exhaustive analysis using explicit techniques would require to simulate the algorithms for 2
48 input values in the case of 24 bit numbers.
In a second set of experiments, we determined the amount of memory which is needed by the benchmark programs. For that purpose, we measured the size of the automata which represent the transition relations of the benchmark programs (Table 3 ). The first two columns specify the number of automaton states and the number of transitions, respectively. The third column shows the total number of BDD nodes which are used to encode the transitions. Moreover, we determined the runtimes for constructing the automata (fourth column, given in seconds). As can be seen, the benchmark programs can be quickly translated to finite automata which provide a succinct representation of the transition relations. It also follows from Table 3 that the complexity of WCET analysis mainly depends on the size of the state sets and not on the size of the transition relation. For the tractable benchmarks, Table 4 shows the size of the largest automaton constructed during WCET analysis. The last column shows the total number of bytes which are used to represent this automaton. These results indicate, that memory consumption is not a crucial concern for our approach unless the program under consideration causes an exponential blow-up such as Collatz and GCD. Our experiments have also shown that most of the time is spent for automaton minimization, and hence, efficient minimization algorithms as proposed in [12] are beneficial. 
Summary and Conclusion
We presented an exact technique for analyzing the worst case execution time of assembler programs. It determines the maximal number of executed instructions for all possible computations by means of symbolic simulation. This is accomplished by translating the program to an integer Kripke structure which can be represented by formulas in Presburger arithmetic. Using Presburger arithmetic, we are able to model most instructions directly such as addition and bitvector operations which belong to the basic instructions of all processors and micro-controllers. The translation of Presburger arithmetic to finite automata gives us an efficient means for symbolically traversing the state space of the program.
Our method is particularly well-suited for algorithms in real-time systems whose execution times highly depend on the input values and are thus hard to estimate accurately using conventional methods. Tight estimates of worst case execution times are especially important for core routines which are frequently called and contribute a large part to the total execution time, e.g. interrupt service routines of real-time operating systems. Moreover, our approach neither imposes any restrictions on the structure of the programs nor requires the user to annotate the programs with information about the expected behavior.
However, it does not aim at analyzing large systems which consist of thousands of lines of code. We believe that a complete and accurate WCET analysis of large systems can only be achieved by combining different methods for each level of abstraction. In such a framework, our approach can be applied to critical modules whose execution times are data dependent and have significant impact on the whole system. Currently, our algorithms do not support multiplication as a single operation which is due to the fact that Presburger arithmetic would otherwise be undecidable. However, we plan to incorporate multiplication into our tools by reduction to successive additions. To this end, we have to extend our algorithms such that intermediate computations do not contribute to the execution time. As a more general approach, one could also label each transition with a natural number that represents the time units required to take the transition.
Furthermore, it is advantageous to simplify the transition relations in order to reduce the runtimes of our algorithms. For this reason, we are working on compiler techniques that replace instructions which do not affect the execution time with NOPs. This can be achieved by a kind of dead-code elimination in order to remove redundant computations [20] . In addition, we do not yet consider pipeline hazards, cache misses, branch prediction, and arithmetic overflows which is part of our future work.
