Abstract. We consider the verification of a simple pipelined microprocessor in Maude, by implementing an equational theoretical model of systems. Maude is an equationally-based language, with an efficient term rewriting implementation, and effective meta-level tools. Microprocessors and other systems are modelled as iterated maps operating in time over some state-set, and are related by means of data and abstraction maps, and correctness is reduced to state exploration by the choice of an appropriate initialisation function, ensuring/enforcing consistency of the initial state.
Introduction
This paper considers the verification of a simple pipelined microprocessor in Maude [3] , an equational, algebraic language with strong meta-language tools and an efficient term rewriting implementation. Hardware systems, and models of hardware correctness, are represented within a well-developed set of mathematical tools, developed by application to case studies, and based on an equational, algebraic model. In a related paper [13] , we consider the process of verification in Maude in more detail.
Microprocessors, and related systems, are modelled as iterated maps
F(t + 1, a) = f(F(t, a)),
where T is a clock, A is the state-set, f is a next-state function defining state evolution, and h is an (optional) initialisation function, ensuring/enforcing consistency of initial state a. Initialisation function h is an important component of the verification process, and careful construction is essential (Sect. 5.4 and [9, 7] ) to reduce formal verification to state exploration. In this paper, we do not consider input and output: however, they are easily accommodated [15, 7] . Maude was chosen as the appropriate tool to implement our theoretical model because (a) it has the same mathematical basis; (b) it is fast (approximately 700K rewrites per second on a 700MHz Pentium III, when applied to hardware examples); (c) its meta-level tools allow proof strategies to be constructed quickly and flexibly; and (d) it is easy to learn. However, other tools could also be used (initial experiments were undertaken with PVS [26] ).
This paper forms part of a series on theoretical models of microprocessors. In [14, 15] mathematical models of microprogrammed examples are considered. In [9, 11] , correctness models and the formal verification process are examined. In [8, 10] models of superscalar processors are examined by means of a substantial example. An extended account of some of this work can be found in [7] . To date, our principle interest has been theoretical models of systems, their correctness, and their verification: principally microprocessors, but also include programming languages and their compilers [28] , including the Java Virtual Machine [29] . Work has progressed on a set of mathematical tools for modelling behaviour and correctness in a modular and software tool-independent way. However, in this paper and in [13] we consider the implementation of these mathematical tools within Maude.
The structure of this paper is as follows. In Sect. 1.1 we consider related work. In Sect. 2 we introduce the required theoretical fundamentals. In Sect. 3 we introduce the Maude system. In Sect. 4 we introduce the architecture (specification) of a simple microprocessor SPM, in Maude. In Sect. 5 we introduce a pipelined implementation ACP. In Sect. 6 we consider the correctness and verification of ACP with respect to SPM. Finally, in Sect. 7 we summarise our techniques and their applicability.
Related Work
The main distinction between this and related work is that our principle interest is building theoretical/mathematical models of system, rather than on addressing [industrial] examples within software tools, where an important theme has been efficient verification strategies.
Interesting work on pipelined microprocessor verification includes [25] on AAMP5, a non-trivial, industrial example, and its verification in PVS [26] (recent accounts are [6, 27] : see also [17] ); [31] on UINTA, a processor of moderate complexity, and its verification in HOL [12] ;and [2] on a part of DLX [16] . A refinement of the approach in [2] , more applicable to out-of-order systems and long pipelines is [19, 20] . In addition, work has been undertaken on the complex timing models of superscalar processors [30, 1, 5] : [18] additionally considers exception processing in such an environment. The work in [21, 4] uses Hawk, a variant of the functional language Haskell.
Generaly, the intuitive models seen are conceptually similar to our own [14, 15, 8] , though significant differences exist in the approach to time. Commonly, in pipelined systems, state elements in the specification are viewed as distributed in time in the implementation. We regard specification states as [some function of] state elements in the implementation at a single point in time (see [9, 7] ). In addition, time is explicitly present in our model. Although explicit time is removed from the verification process for microprocessors (Sect. 2.4), recall that our principle interest has been in theoretical models of systems, of which microprocessors are only one example.
Theoretical Preliminaries
Computer systems are modelled in a universal algebra framework: see (among many) [32, 24] . We define functions equationally, primarily using definition by cases and primitive recursion. Time is modelled using a clock algebra and computer systems are modelled with [many-sorted] state algebras.
A many-sorted algebra consists of carrier sets and functions ranging over the carrier sets: (A 1 , A 2 
Clocks and Iterated Maps
Systems operate in time, starting at time zero in an initial state h(a) where h is some initialisation function. Future system states are determined by a next-state function f , enumerated by a clock algebra T = (T | 0, +1). Initialisation function h (which may be the identify function) limits the number of initial states, and (if carefully chosen) acts as an invariant during verification.
Data and Timing Abstraction Mappings
Data abstraction maps are surjective functions ψ : B → A between two statespaces. Data abstraction maps are commonly projections between two composite state-spaces, for example, a map
Two clocks are related using a temporal abstraction map, or retiming 1 :
Definition 2. A retiming λ is a surjective and monotonic map between two clocks such that λ(0) = 0. The set of all retimings from clock S to clock T is denoted by Ret(S, T ). The immersionλ of a retiming
λ ∈ Ret(S, T ) is defined byλ (t) = least s ∈ S such that λ(s) = t.
The set of all immersions of retimings in Ret(S, T ) is denoted by Imm(S, T ).
Monotonicity ensures there is never a discrepancy, after abstraction, in the temporal ordering of events because, for all s,
where λ is a retiming.
Given two clocks S and T related by retiming λ ∈ Ret(S, T ), and a clock cycle s ∈ S, we commonly wish to identify the clock cycle s ∈ S such that s is the first cycle of S where λ(s ) = λ(s).
Definition 3. The function start : Ret(S, T ) → [S → S] is defined by
start(λ) =λλ.
Definition 4. A state-dependent retiming λ : A → Ret(S, T ) is a map from states to retimings. The set of all state-dependent retimings from state-space A to retimings in Ret(S, T ) is denoted by Ret(A, S, T ).
For each state of an implementation there is an associated state-dependent uniform retiming, defined in terms of a duration function over the state-space of F . 
S, T ) is the immersion of λ. The singleton set containing the uniform retiming with respect to F and dur is denoted by URet dur F (A, S, T ).
Suppose that F represents the implementation of some system over a clock S, and that T is the (slower) clock of the corresponding specification. Then specification clock cycle t + 1 ∈ T lasts dur(x) cycles of clock S, where x = F (λ(a)(t), a)) is the state of F on clock cycleλ(a)(t) ∈ S. That is, the cycle of implementation clock S corresponding with the start of the previous specification clock cycle t ∈ T . Note that dur is a function only of state, and consequently the number of cycles corresponding with any state is independent of the numerical value of t ∈ T .
Implementation Correctness
Correctness is defined in terms of the relationship between two algebras, representing implementation and specification. The state sequences specified by the implementation are mapped onto those of the specification by a data abstraction map ψ and a temporal abstraction map λ. 
or, alternatively, if the following diagram commutes for all b ∈ B and s
Correctness must hold at all clock cycles corresponding with specification states, expressed by s = start(λ(b))(s).
Time-Consistency and the One-Step Theorems
Iterated map state functions are time-consistent if all states that may arise at times s ∈λ(B) are legal initial states.
Definition 7. An iterated map F : S × A → A is time-consistent with respect to a state-dependent retiming λ ∈ Ret(A, S, T ) if, and only if, for all a ∈ A and
where s 2 =λ(a)(t 2 ) and
. Initialisation function h must be carefully chosen to ensure time-consistency. In practice, h may be complex and difficult to construct. In Sect. 5.4 we describe a systematic method that is sometimes applicable.
The following two results are called one-step theorems. Theorem 1 states that if λ ∈ Ret(B, S, T ) is a uniform retiming then time-consistency with respect to λ is sufficiently verified by examining the implementation at times t = 0, 1. Theorem 2 states that retiming uniformity and implementation time-consistency are sufficient conditions to enable correctness to be verified by examining times t = 0, 1. F (0, a) = h(F (0, a)), and F (λ(a)(1), a) = h(F (λ(a)(1), a) ).
Proof. See [7, 9] . 0, b) ), and F (1, ψ(b)) = ψ(G(λ(b)(1), b) ).
Proof. See [7, 9] .
Introduction to Maude
Maude is an equationally-based, algebraic language with a term rewriting implementation [3] . The following simple algebra, or module, representing a memory, illustrates the main features of interest to us 2 .
fmod MEM is protecting MACHINE-WORD . sorts Mem . SPM is a simple microprocessor architecture, with five instructions (add, load, store, branch and set), separate program and data memories md and mp, a general purpose register set reg and a program counter pc. We use separate data and program memories to simplify the process of mapping to a pipelined implementation with data and instruction caches. However, this is not necessary. SPM was first used in the form described here in [10, 7] : a previous version appeared in [8] . In addition to the pipelined implementation ACP (Sect. 5), which has been verified manually [7] and using Maude (Sect. 6), a superscalar version ACS exists [10, 7] . A variant of ACS is currently being verified in Maude 3 . The SPM architecture is parameterised by three constants r, m w ∈ N + which determine the number of general-purpose registers, the memory address space and the word size respectively. The basic format of SPM instructions is shown in Figure 1 , from which it can be seen that w ≥ max(3 + 3r, 3 + r + m). The state-spaces for op codes, register and memory addresses, machine words, registers and memory are as follows:
We define SPM in Maude as follows. 
The Pipelined Implementation ACP
The implementation ACP of SPM has a four-stage pipeline (see Figure 2) with the following stages.
Fetch. A single instruction is fetched from the instruction cache
4 either using a fetch program counter fpc, or a branch target address generated by the execute unit. In normal operation, when the pipeline is full (i.e. a branch has not been taken in the past three cycles), fpc = pc + 3, where pc is the architectural program counter. In the event of a read-write conflict (Sect. 5.2), no instruction is fetched. The instruction is stored in an instruction register. Figure 1 . Execute. The instruction stored in the Decode Unit is executed, and the results encoded as a triple representing the result; the destination register/memory address; and the unit (if the result is to be stored in memory, register, or program counter; or if outcome is a failed conditional branch, or the pipeline stalls). Some elements of the execution triple are redundant in some circumstances. Committal. Results from the Execution Unit are written to program counter, registers and/or data cache. 
Decode. The contents of the instruction register in the Fetch

Pipeline States
The pipeline of ACP has four distinct states, identified by a reset counter and the unit field of the execution triple (and the relative values of pc and fpc).
Boot. In this state, which can only occur at start up, reset = 2, fpc = pc, unit is set to wait (i.e. no results to be committed), and the instruction register in the Fetch Unit and Decode Unit contain junk. After Branch. This state is indistinguishable from one cycle after boot. In this state reset = 1, fpc = pc + 1, unit is set to wait, the instruction register in the Fetch Unit contains mp [pc] , and the Decode Unit contains junk. Stall. This state is indistinguishable from two cycles after boot. In this state, reset = 0, fpc = pc + 2, unit is set to wait, the instruction register in the Fetch Unit contains mp[pc + 1], and the Decode Unit the decoded form of mp [pc] . Pipeline full. In this state, reset = 0, fpc = pc + 3, unit is set to something other than wait (indicating some result to be committed), the instruction register in the Fetch Unit contains mp[pc + 2], and the Decode Unit the decoded form of mp[pc + 1].
Pipeline Conflicts
There are a number of circumstances in which pairs of consecutive instructions can conflict. If the first instruction is a branch, then there is a procedural dependency between the instructions, and the second instruction will need to be discarded if the branch is taken by flushing the pipeline, and switching to the after branch state. There may also be data dependencies (or RAW hazards) between instruction pairs, if the second instruction requires the result of the first. In this case, the pipeline is suspended for one cycle by switching to the stall state, to allow the result of the first instruction to be committed before the second executes. The various data dependencies are illustrated in add a2 b2 c2 branch a2 load a2 b2 store a2 b2 set a2 b2 add a1 b1 c1 Table 1 . Possible data dependencies between instructions: note that data dependencies cannot occur when the first instruction is a branch.
providing mechanisms to permit internal forwarding of results before they are committed to registers/memory. However, we do not include such mechanisms here.
Formal Description of ACP
Space precludes including the full formal description of ACP in Maude (about 550 lines of with comments and whitespace). Instead, we give a partial definition in the notation used in the underlying mathematical model. This description of ACP first appeared in [7] .
The state of ACP is the cartesian product of the states of each of its components:
where
The iterated map ACP : S × State ACP → State ACP is defined by 
With the exception of icache, which does not change state, each unit of ACP has its own next-state function. Here, we only define the next-state function for the Execute unit.
The next-state function Execute : State ACP → Exec generates an execution triple for each instruction, specifying result, destination and unit, together with the value of the reset counter, controlling the state of the pipeline.
In the event that the pipeline is not full (reset > 0), Execute decrements reset and prevents any results from being committed by setting unit = wait. Otherwise, if the pipeline is full (reset = 0), then Execute checks for conflicts and either executes an instruction (using exec), or stalls the pipeline. Subfunction
To address this, we can build proof strategies within the Maude meta-level, that enables Maude modules to be treated as data types, and the rewriting process to be controlled. We can use these properties to construct a range of verification strategies, tailored to classes of example. In this case, we employed a simple, naive, strategy, and used Maude to automatically construct and check the tree representing all the subcases we are interested in, by defining operations to dynamically extend Maude modules by adding equations asserting each case, and directing the rewriting process. A more detailed description of this process can be found in [13] . The example was successfully re-verified using this method, requiring approximately 2.5M rewrites and taking about 4 seconds on the same 700MHz PIII.
Further Work and Considerations
The Maude strategy used for this example is experimental, and not particularly efficient. The same example was previously run successfully without running out of physical memory on a 64Mbyte machine, suggesting that there is scope to undertake significantly larger examples on existing hardware without addressing the efficiency of the proof strategy. However, a number of approaches are being followed to increase the size of examples that can be addressed.
-The existing proof strategy can be made more efficient. For example, by reducing the number of times identical sub-terms are evaluated. -Different proof strategies can be considered. For example, to automatically generate (possibly in stages) the sets of case-defining constants used in the manually-directed proof. -Combining the manual and automatic approaches, by defining sets of constants to identify groups of sub-cases, and automatically verifying all subcases within a group. -To adopt techniques seen in the literature to reduce verification complexity (for example, [5] ). Note that because our principle aim is building coherent theoretical models of systems, and their correctness and verification, rather than performing verifications per se, it will be necessary for us to integrate such efficiency-increasing techniques into our theoretical model first.
One of the key strengths of Maude is the ease with which proof strategies can be constructed: together with the efficiency of the underlying term-rewriting engine, this means that effective, specialised verification tools can be built relatively quickly. A memory system example currently being addressed will require induction: an inductive proof strategy is being currently being developed (based on a pre-existing example [23] ). Other future work includes completing verification of ACS, the superscalar implementation of SPM, and addressing the high-level language and compiler examples of [28, 29] . In addition, theoretical work is underway on a model of operating system kernels: starting with a simple system involving multiple communicating processes. The ultimate intention is to build a unified theoretical model of systems from high-level languages to hardware, backed by an implementation in Maude.
