A general algebraic method for modelling microprocessors at different levels of abstraction by means of iterated maps is introduced, supporting equational specification and design. We apply this iterated map method to the top level Programmer's Model specification, and to an abstract implementation, the Abstract Circuit Design. We formalise what it means for the implementation to correctly implement the specification. We illustrate the iterated map method with a case study.
Introduction.
In this paper we consider a general algebraic method for modelling microprocessors at different levels of abstraction, and for expressing formally the relationships between each level. These algebraic methods support the equational specification and design of microprocessors by a process of successive correctness preserving refinements. We consider models of microprocessors at different levels of abstraction, determined by time and by details of construction.
A clock is a means of dividing time into (not necessarily equal) segments; these segments are defined by steps or stages in a computation or process. We use this idea as follows. The method is to model a computer by means of the iteration of a map f : A --+ A in discrete time T = {0, 1, 2,...}, defined by
F:T• F(t, a) = f' (a),
fort E T, a E A. The set A is made to model the state of the computer and the function f to model the next-state map; thus F(t, a) is the state of the computer at time t E T on operating from initial state a E A. The computer is then modelled by the algebra
(A, TIE).
The nature of the set A of states and clock T is determined by the level of abstraction of the computer. A typical clock would be the system dock. However, we may also consider instruction docks, where each cycle represents the execution of a single instruction, or indeed any other division of time. Multiple clocks may exist in a single specification, and clocks may be formally related.
We show how to give an equational specification of this type of algebra using the fact that F is defined by primitive recursive equations over the algebra (A,Tlf, O,t+ 
l).
we show that this iterated map method is a systematic technique that decomposes the modelling of the computer into easily understood stages so that each model can be equationally specified using initial algebra semantics.
We illustrate our algebraic tools by specifying a simple computer (based on the DEC PDP-8). We develop a specification at the programmer's level, and consider the implementation of this specification at the microcode level.
For work on formal methods in hardware, particularly those concerned with the formal specification and design of general-purpose computers, see Gordon [1983] , Joyce [1987] , and Stavridou [1993] which discuss the same simple computer, known as " Gordon's computer", Cohn [1987] , Cullyer [1987a] and Cullyer [19875] which discuss the Viper microprocessor, , Graham and , and Birtwistle and which discuss an implementation of Landin's SECD machine (Landin [1963] ), Geser [1989] , which discusses the Intel 8085 processor, and Hunt [1986] . For work on hardware specification and verification in general, see, for example, Gordon [1987] , Cohn and Gordon [1990] , Milne [1989] , Milne [1990] , Melham [1988] , Subrahmanyam [1988] , , and , and Weijland [1990] .
This work is part of a project on the theory of verifiable synchronous concurrent algorithms, where a synchronous concurrent algorithm, or SCA, is an algebraic model of parallel deterministic computation. The work on formal specification, of which this is a part, is intended to support the work on SCAs by providing a basis for the formal verification of parallel systems modelled as SCAs. The work of the SCA group encompasses methodologies for specification and design; formal analysis; manipulation and verification of specifications and designs; and software tools. For further information see Tucker [1991] and Thompson and Tucker [1991] . For work on case studies, see Harman and Tucker [1988a] , Harman and Tucker [1988b] , Harman and Tucker [1989] , Harman [1989] , Harman and Tucker [1992] . In particular, Harman and Tucker [19885] and chapter 7 of Harman [1989] are concerned with the specification of computers.
The structure of this paper is as follows. In w we discuss algebraic modelling and equational specification tools. In w we apply the iterated map algebras to model program execution and implementation in digital computers, and the relations between them. In w we outline the informal specification of a simple computer. In w we develop an algebraic specification of this computer. In w we propose an implementation of this computer. In w we consider the relationship between the specifications of the computer and of the controller, and show how the different levels of timing abstraction of each can be formally related.
The authors would like to thank J M Emmett for useful comments on a draft of this paper. 
F(t + 1, a) =-f(F(t, a)).
A solution to this set EF of equations is
and which will be used to represent the behaviour of a computer in time, starting from an initial state a.
The equations of EF are primitive recursive equations over the algebra (A, T I f, 0, t + 1), see Tucker and Zucker [1988] , Tucker [1991] . (t,a,,..., a.) = (F,(t, a,,..., a.),..., F,(t,a,,..., am)) for t E T, a, E A~, and where F, : T • A1 • 9 9 9 • An -* A~ is the i 'h component function for F, 1 < i < n. Furthermore, the function / : A --+ A, whose iteration defines F, can be rewritten / (al,...,a,) = (fl(a,,...,a,),..., f.(al,...,a,) Fl(t + 1, al, ..., as) = f~(Fl(t, a~, ..., an), ..., F, (t, a~, ..., a.) ), F.(t + 1, a,,..., a.) = f. (F~(t, el,..., an),..., Fs(t, hi,..., am) is an equational (or conditional equational) specification of (A, , ..., A., TIfl, ..., f., O, t+ 1, FI, ..., F~) and hence of (A,,...,A.,TIF,,...,F.) . 2.6. Retimings. Specifications may contain multiple clocks which are not equivalent in speed. Also, each cycle of a clock T need not be the same length relative to another (e.g. standard) clock R: that is, clocks can be irregular. In order to relate multiple clocks we introduce retimings. A retiming A : T --* R is a surjective, monotonic map between two clocks T and R. We denote the set of retimings from clock T to clock R by Ret(T, R).
For each retiming A there is a corresponding immersion -X : R --+ T, Harman [1989] , or Harman and Tucker [1989] . Retimings may also be statedependent. That is, given an initial state g E B, we may define a retiming ;~ : S x B --* T. Further formal tools can be developed from retimings: see Harman [1989] , Harman and Tucker [1989] , Harman and Tucker [1988a] , and Harman and Tucker [1992] .
3. Computers as State Transformers. We will give computers state transformer or programmer's mode/specifications PM, that progressively update the state of the machine. By state, we mean those aspects of the internal structure of the computer visible to the programmer. That is, registers and memory that can be explicitly modified, sometimes called the architecture of the computer (Stallings [1987] ). There is no requirement that these registers and/or memory actually exist in the form seen in PM (though this is usually the case); however, the machine must behave as if they were present. In an actual design, there will be additional registers, hidden from the programmer, that are necessary for the implementation of the machine. These registers are not properly part of the specification. Additionally, we include memory in a state transformer specification even when specifying, say, a microprocessor, where memory is not an integral part of the device.
A hierarchy of specification models may be seen emerging. First, the programmer's model PM, followed by one or more levels modelling the implementation in successively greater detail.
Models and Abstraction in the Design of a Computer. The r61e
of the programmer's model specification PM is as part of a of specification of some implementation 1, for which we may require a correctness proof. We can now ask: what is 1, and how is it constructed?
In the case of a complex device the design would proceed in stages, with each successive stage a refinement of its predecessor, terminating in a set of chip masks, or other representation of the final hardware. We also admit the possibility of false trails in the design process: ideas that later are seen as inappropriate, and exploration of alternate designs. The design process will not be a simple trail of increasing detail: see Harman [1989] , Harman and Tucker [1992] . In the basic case however, we can postulate an initial abstract circuit design AC, followed by one or more stages of detailed circuit design, and culminating in the final product: the chip mask-set, circuit board design, etc, whose characteristics are uniquely determined by a physical system.
In the case of a microprocessor, what does AC consist of? We can partition a typical microprocessor into two main components: control and datapath. Additionally, since the initial programmer's model specification contained memory M, we must include M in AC. However, we are not concerned with the design of M, and we can employ the same specification as used in PM. The abstract circuit design phase consists of specifying the control section CT and the datapath DP, and composing thse together with M to form AC. Then we must show that AC correctly implements PM. We may continue the design process by subdividing CT, DP and M and specifying their components until we reach practical limits imposed by the physical behaviour of devices.
3.2. Computer as an Iterated Map. The behaviour of a computer is described by an infinite sequence of states over time. The state comprises the contents of the machine's registers and memory. Additionally, there may be streams for input and output, but we will not consider these here. Let C be the set of states. Starting from some initiM state (r E C, in which a program is held in memory and the program counter points to the start of this program, the machine state evolves in time as controlled by a next state function comp : C --* C, and a clock T. At every cycle t of the clock, comp is applied to the current state a to generate a new state comp(~). Hence the evolution of the state of the machine from time 0 E T to time t E T is represented by
comp( ), comp'( ).
The machine is therefore represented by defined by
COMP : T x C --~ C COMP(O, ~) = ~, COMP(t + 1, ~) = eomp(COMP(t, ~)). However, it is often convenient to represent COMP in the form

COMP(t, r = eomp'(a).
We know from w that the computer is represented by a simultaneous primitive recursive function COMP over the algebra (c, T [ comp, 0, t + 1), which we call the next-state algebra. The algebra
(C, T [ COMP)
we call the state algebra.
The speed of clock T determines the level of timing abstraction of COMP. Typically, T will be the instruction dock, where each clock cycle represents the execution of a single instruction. Since instructions take differing amounts of "real" time to execute, the instruction clock will be irregular with respect to the system clock. The system clock is at a lower level of timing abstraction than the instruction clock. Each cycle of the system clock represents one cycle of the clock signal used to control the computer or microprocessor. In w we will show how to formally map from the instruction clock to the system clock. (CPV(t, c, m), MEM(t, c, m) ), MEM(t + 1, c, m) = mem(CPU(t, c, m) , MEM(t, c, m) ).
The next-state algebra becomes (Cpu, Mem, T [ cpu, mere, O, t + 1) , and the state algebra (Cpu, Mem, T [ CPU, MEM) . The carriers C1,... ,C. and the functions comp,,... ,comp. of the state algebra are constructed from a machine algebra at a lower level of data abstraction. The machine algebra consists of carriers (typically vectors of bits) and functions representing ALU operations. A typical machine algebra for a microprocessor would be constructed as follows.
Let the algebra Bit consist of carrier {0, 1}, constants 0 and 1, and logical operations such as and, or, not, and so on (the precise choice depending on the architecture). Let W. = {0, 1} ". The machine algebra M is constructed by adding to Bit carriers consisting of vectors of Bit, and special-purpose operations on these vectors: typically (Bit, W32, WI~, We, I and, or, not, add, shift, sub, mul, . . .) 
PM(t, or) = pmcomp'(~).
The abstract circuit design AC behaves in a similar way, except now the state or' E CAC is enlarged to include registers/memory required to implement PM, and the controlling clock system S is faster than clock T. Again, computation is represented by a next-state function accomp : Cac ~ CAt, and SO
AC(s, ~') = accomp'(~').
Although an element ~' of CAc will contain information not in CpM, at the start of the execution of any instruction, that information will refer to the execution of the previous instruction. Any correct implementation of PM will assume that, at the start of the execution of any instruction, those parts of CAc not in CpM contain "junk". Therefore, at the start of any sequence of instructions we can assume that the contents of those parts of ~' E CAc not in CVM contain arbitary values.
Suppose there exists a function 
PM(t, ~) = r(AC(-A(t, ~), a,(~))).
Recall the definitions of PM and AC as the iterated maps
pmcomp : Ceg ---* CpM and accomp : Cac --+ CAc
respectively. Then the condition for correctness becomes: for all t E T, ~ E CpM and any padding values z pmcomp' ( ) = accomp (',' ( oo ( ) ) ).
Observe that -A : T x CpM ---> S qualifies as the immersion of a state-dependent retiming A : S x CpM --* T (w
I00
Informal Specification.
The machine we will specify is based on the DEC PDP-8. This machine has been used previously as a design example: for instance, see Florentin [1991] . In this section, we will sketch the informal specification of the machine. Full informal specifications can be found in the literature: for example, Florentin [1991] . We will omit a number of straightforward features of the PDP-8 that are tedious to specify. 4.1. Registers, Word Size and Instruction Format. The PDP-8 uses a 12-bit word, giving an address space of 212(= 4096) words. There is a single accumulator ACC, a single-bit link register L, used for overflow detection and shifting, and a 12-bit program counter PC. All instructions are one word long, with format: 3-bit opcode; indirection bit; page bit; and ?-bit page offset. The page offset only allows 128 words to be addressed, hence the indirection and page-structured memory are used to allow access to the entire memory. Memory is divided into 128-word pages, and the page offset can either access a word in the first page of memory, or the page of memory containing the current instruction, depending on the page bit. The indirection bit allows indirect memory addressing.
4.2. Instruction Set. We will specify the following instructions. Subroutine calls store the return address in the first word of the subroutine, and jump to the following word. To return, an indirect jump though the first word of the subroutine is necessary.
Formal
Specification.
Recall from w and w our strategy for specifying a computer as an iterated map over a state set, and then decomposing this map into separate iterated maps for CPU and memory. We will construct a formal specification of the machine informally described in w in the manner of w and w We will not give a complete formal specification, but supply sufficient information for the reader in possession of an informal specification to complete the process.
Following w we may construct the state algebra
(C, T I COMB)
of the machine from the algebra (Cpu, Mere, T I CBU, MEM) of the CPU and memory, with C = Cpu x Mem, as follows:
COMB : T x C --* C, COMB(t, c, m) = (CBU(t, c, m), MEM(t, c, m)).
In particular,
CPU(O, c, m) = c, CBU(t + 1, c, m) = cpu(CPU(t, e, m), MEM(t, c, m)),
and
MEM(O, c, m) = m, MEM(t + 1, c, m) = mem(CPU(t, c, m), MEM(t, c, m)).
To proceed, we must define the state sets Cpu and Mem, together with the next-state functions cpu and mem. There are two 12-bit registers ACC and PC, and a single-bit register L, so we define
where A = Wx2. Memory consists of 212 12-bit words. Hence we define
Mem = [A --+ A].
ThusC=CpuxMem=AxAxbit x [A--+ A].
5.1. The Machine Algebra. Following the process outlined in w we define the machine algebra for our example 9 Our machine will require the following carriers 9 Let A = W12 represent memory words and the accumulator 9 Let La = W13 represent the result of additions 9 Additionally, we require Bit, the booleans B and the natural numbers N, and the functions A : A 2 --+ A, V : A ~ --* A and add : A 2 --+ La.
In addition to ALU operations, we add the successor function for N, together with some functions for manipulating bit vectors, converting bit vectors to N, and performing memory substitution 9
The field 
O<j<l<i.
The function pn : A ---* N interprets a bit vector (al, ..., a,) E A as a natural number, using the usual binary representation of positive integers. We will commonly, as an abuse of notation, omit. applications of pn.
The function sub : [A --* A] x A x A ---* [A --* A] is the memory substitution function, defined by
We will use re[all] to stand for sub (m, a, l) . Summarising, the machine algebra is as follows:
bzts m , pn, sub, cases, =)
Observe that we can use bzts m to construct projection functions.
5.2. Field Extraction Functions. We define the following functions to extract the various fields of an instruction 9 The opcode field is extracted by op : A --* N, defined by op(a) = pn (bits~/3(a) ). We use pn to convert the three opcode bits into a natural number. This makes the specification easier to read.
The page offset field is extracted by pgoff : A ~ A, defined by cpu (a, pc, l, m) --I (a A mval(pc, m) , pc + 1, l), (bits~?12(add(a, l, reval(pc, m) 
pgoff (a) = bits~/12(a ).
We explain each case briefly below. (ii) TAD: accumulator a is replaced by the least-significant 12 bits of the sum of a and mval, and I is replaced by the most significant bit of this sum. The program counter pc is incremented. (iii) ISZ: the program counter is incremented by one or two (by tskip), depending on the new value of the memory word.
(iv) DCA: the accumulator is set to zero. The program counter is incremented. (v) JSR: the value of pc is replaced by mval+l. (vi) 3MP: the value of pc is replaced by mval.
Next we define the memory next-state function mem : A • A • Bit • [A --+ A] --* [A --~ A]
as follows:
We explain each case briefly below. 
(iv) (pc A ADDRMSK)),
where ADDRMSK '- (1, 1, 1, 1, 1, O, O, O, O, O, O, 0) . In cases (/) and (iii) we are accessing memory words in page zero. In cases (ii) and (iv) we are accessing memory words in the current page, and so must extract the most significant five bits of pc and use them to extend the page offset. In cases (iii) and (iv), we are using indirect addressing, and so must make a memory access.
We define mval : A • Mem ~ A as follows.
royal (pc, m) = m( rnad dr(pc, m ) COMB(t, e, m) = (CPU(t, c, m) , MEM(t, e, m) ), (1) and eomp (e, = (ep (c, mere(c, m) ).
By using (1) and (2), expanding our definitions of CPU and MEM, and substituting appropriately, we can construct COMB (t, c, m) = comp'(c, m) ,
where comp is defined in (2).
5.6. Algebraic Structure of a Microprocessor. It is easy to check that function comp is polynomial over the machine algebra, and that therefore COMP is primitive recursive over the machine algebra. Since the machine algebra is easily algebraically specified, the state algebra of the computer is algebraically specified by lemma 2.4.1. In fact, the specification can be chosen to have useful term rewriting properties (e.g. orthogonality and completeness).
A Microprogrammed Implementation.
We now consider how COMB specified in w may be implemented. We will proceed in the manner outlined in w First, we will decide on, and informally specify, an appropriate datapath DP, and controller CT. Then we will show how DP and CT may be formalised as iterated maps. 6.1. The Datapath. We will use the two-bus datapath, illustrated in Fig. 1 . As well as the user-visible registers ACC, L and PC, there are four further registers, two buses A and B, the ALU and an incrementer for PC. The memory address register MAR holds memory addresses, and is linked to the address bus: The memory buffer register MBR holds data to be read/written from/to memory, and is linked to the data bus. The instruction register IR holds the current instruction, and is linked separately to the data bus. The ALU buffer register is used as a store for ALU results. Additionally, the results of ALU test operations are available to the controller. 6.2. The Controller. We will use a microcoded controller CT to control DP. To avoid confusion, we will use microinstruction to refer to instructions of CT, and instruction to refer to instructions of COMP. Each microinstruction will have three fields. The control field to operate DP; the sequencing field to determine the next microinstruction; and the address field to contain a microinstruction address.
The sequencing field will allow the following sequences: next sequential microinstruction; unconditional or conditional jump; unconditional or conditional subroutine; and decode next instruction. The last is used when the sequence of microinstructions to execute the current instruction ends, and a new instruction starts. Conditional jumps and subroutine calls are based on the results of ALU test operations. Also only one level of subroutine is allowed, as the return addresses are stored in a register. Fig. 2 . illustrates the microprogrammed controller.
6.3. Formal Specification of DP and CT. We now consider the formal specification of DP and CT as iterated maps. We wish to construct an algebra (Ct, Dp, Mem, S I CT, DR, SMEM), where Ct is the state of the controller CT, Dp is the state of the datapath DP, Mem is the memory as in w S is the system clock which is faster than instruction clock T, and CT, DP and SMEM are iterated maps representing CT(s + 1, c, d, c, d, m), DR(s, c, d, m), SMEM(s, c, d, m) ), c, d, m) = dp (CT(s, c, d, m), DR(s, c, d, m), SMEM(s, c, d, m) ), SMEM(s + 1, c, d, m) = smem(CT(s, c, d, m), DR(s, c, d, m), SMEM(s, c, d, m) ),
To proceed, we must define state sets Ct and Dp, as well as next-state functions ct, dp and smem. Space does not permit the inclusion of definitions for ct, dp and smem. However, the process is the same as that followed for PM in w First, we consider Dp. Observe in fig. 1 #COMB (s, c, d, m) = (CT(s, c, d, m), DR(s, c, d, m), SMEM(s, c, d, m) (#pc, st, #m, a, pc, mar, mbr, it, l, alubu ~x (a,pc, l,m) = (x..c,x.r,x~..,a,pc, x .... x..~r,x,~,l,x.,.b.,,x.,.,o.,,m) , 
XTIM E(c, m)(t) = ztime( COMP(t -1, c, m) ).
The time taken for the instruction executed at time t depends on the state of COMB at time t -1. Function xtime : Cpu • Mem ~ N + determines how many cycles of clock S the next instruction in state c E Cpu, m E Mem will take to execute. [r~,x,(pCOMP(s, a~(c) , m))], where 7r~~ : Ct x Dp • Mem ---* B is a function that determines when execution of the current instruction is complete, by examining the appropriate part of the microinstruction word: see w 8. Further
ztime(c, m) = (least s)
Considerations. We have shown how computers may be algebraically modelled as iterated maps at different levels of abstraction, and defined what it means for a lower-level algebraic model to correctly implement a higher-level specification. Additionally, we have applied our algebraic tools to a case study.
The case study can be continued: both PM and AC can be completed, and the correctness condition verified. That is, does the diagram of w commute? The process of modelling and verification of the components of the computer can continue with DP and CT, each of which may be further subdivided.
