Abstract-In this paper, we propose a verification method for pipelined microprocessors with out-of-order execution. We define a class of pipelined microprocessors with out-of-order execution and give a sufficient condition that guarantees the correctness of implementation. Each microprocessor in this class has a pipeline stg,, . . . , stg, such that the stages stg,, . . . , stg, are so-called "inorder pipeline" and changes the execution order of instructions within the stages of stg,, . . . ,stg,-l. Using our method, we carried out the correctness proof of a practical 6-stage pipelined microprocessor that has a so-called scoreboard [l]. We used a verifier having a decision procedure for Presburger sentences. The total CPU time spent in the proof was about 8 hours.
I. INTRODUCTION
Out-of-order instruction execution[ 13 is one of the techniques to overcome data hazards with dynamic scheduling. Therefore almost all modern pipelined microprocessor architectures, such as the PowerPC and the DEC Alpha, employ it. Recently, some efforts have gone into verifying out-of-order architectures [2, 3, 41 . In those methods, however, the class of the implementations that the method can be applied for is not defined clearly. It is exceedingly important to define a suitable class of the implementations that the method can be applied for and to give a sufficient condition to guarantee that a given implementation in this class satisfies its specification.
In this paper, we give a design constraint C in order to specify a class of the pipelined microprocessors with out-of-order instruction execution. Moreover we propose a sufficient condition V to guarantee that a given implementation designed under the design constraint C satisfies its specification.
The design constraint C claims as follows. First, each instruction must execute the stages stg, , stg,, . . ., stg, in this order, provided that it is possible for each instruction to abort its execution before executing a certain stage stg,, Secondly, the stages stg,, . . . , stg, are so-called "in-order pipeline". Therefore, the implementation changes the execution order of instructions within the stages of stg, , . . . , stg,-, . Moreover, w.r.t each visible register (except for program counter), the order of instructions that read values frodwrite values into this register is the same as the order of instructions which execute the stage stgc. The microprocessors in this class can have data forwarding, speculative instruction fetch, and so on. pipeline" by C, we can use the method similar to ones proposed for in-order architecture [5] .
Moreover, we designed a practical pipelined microprocessor with out-of-order instruction execution under the design constraint C. This microprocessor has 6-stage pipeline and some buffers to record the program order of instructions and so on. We proved that this microprocessor satisfies the sufficient condition V as follows. First, we introduced lemmas concerned with properties of the buffers and so on, and proved them. Secondly, we proved that the implementation satisfies the sufficient condition V under the lemmas. To prove those, we used a decision procedure for the prenex normal form Presburger sentences bounded by only universal quantifiers [6] . Although it was necessary to decide the truth of the sentence whose length' was over 8,000, the total CPU time spent in the proof was the practical time of 8 hours.
SPECIFICATION
We assume that a pipelined microprocessor has a program counter(PC) and some registers and memories as visible registers. All instructions, which the microprocessor fetches and executes, are stored in an instruction memory(1MEM). We assume that IMEM keeps its contents unchanged in execution. The instruction set is supposed to be so-called RISC-type ISA. Each instruction consists of an op-code and some operands. It is supposed that the registers that an instruction reads values frordwrites values into are specified by its operands. In the The description(abstraction) level of specifications is socalled ISA level. We write cycle to denote the the transition function of the specification S. The transition function cycle specifies the operations of the microprocessor executing each instruction independently. Let a(t") denote the state after the transition cycle is executed from the initial state t" times. We abbreviate a(t") as t" and the state after cycle is executed fro:m the state t" as cycle(t"), respectively.
IMPLEMENTATION
We assume that an implementation has special registers arid controller to enable out-of-order execution, and pipeline registers in addition to the visible registers. The datapath is divided into a fixed number of pipelined stages, for example n stages, stg,, . . . , stg,. The description(abstracti0n) level of implementations is socalled RT level. We write elk to denote the transition function of the implementation I . The transition function elk specifies the operations that are executed in each stage during a clock cycle. Let a(t) denote the state after the transition elk is executed from the initial state t times. We abbreviate a ( t ) as t and the state after elk is executed from the state t as clk(t), respectively.
We give a design constraint C in order to specify a class of the pipelined microprocessors as follows.
(Cl) Each instruction must execute the stages stg,, stg,, . . ., stg, in this order. However, it is possible for each instruction to abort its execution before executing a certain stage stg,. 
IV. CORRECTNESS
In this section, we give a definition that implementation satisfies specification.
Definition 1 Suppose that a specijication S and an implementation I are given and each has the same instruction memoiy IMEM and the same initial value of each visible register:
The implementation I satisfies the specijication S iffor each 
However if C~C F ,
(t") follows concpc(t" + l), the value of COTICF~ (t" + 1) is the same as the value of c o n c~, (t") .
v. SUFFICIENT CONDITION FOR CORRECTNESS

A. Sufficient condition
In this section, we give a sufficient condition V for the correctness of microprocessor. We have only to check whether a given implementation satisfies the sufficient condition V in order to verify that this implementation satisfies its specification. and Fk E Fdest (zj). 
+ PC(cycle(ts)) = ~~( ' c l k L P ,~) ( t ' ) ) .
Fj €3"
[ P C ( t S ) = PC(t) + PC(cycle(ts)) = PC(clkl(t)) (3)
operations of stg, .
where clkl denotes the transition function that performs only
We are certain that for the class provided by the design constraint C it is difficult to relax the sufficient condition given above.
B. Proof of sufficientness of condition V
In this section, we briefly prove that when a given implementation I designed under the design constraint C satisfies the sufficient condition V , the implementation I satisfies the specification S. We prove it by induction on the number of instructions executed by the specification S. Since the specification and implementation have the same initial values of each visible register, we can prove the basis step obviously. 
PC(k) = PC(t) (5)
Even if there exist instructions following Zt, equation (7) holds from the design constraint C since any instruction can not read values written by the instructions that execute the stage stg, later than them. To prove them, we used a decision procedure for the prenex normal form Presburger sentences bounded by only universal quantifiers. This procedure is based on the transformation 
VI. A N EXPERIMENTAL PROOF
We designed a out-of-order pipelined microprocessor under the design constraint C(See Fig. 1) . Our implementation is based on FDDP, which has been designed as an implementation with in-order instruction execution by "IT and is similar to DLX[ 11. Our microprocessors has a program counter(PC), a register file(RF), and a data memory(DMEM) as visible registers. Moreover it has 6-stage pipeline and a buffer, named scoreboard[ 11. The scoreboard of our microprocessor has the entries of six instructions currently executed in the microprocessor and records the program order and execution status, of these instructions, and so on. The instruction set of it is a RISC-style ISA with four operation class: 3-register ALUs, Load, Store, and Branches.
We proved that our microprocessor satisfies the sufficient condition V . This proof was carried out as follows. First, we introduced lemmas concemed with properties of the scoreboard and so on, and proved them. For example, we introduced the lemmas describing that the scoreboard always records the program order of instructions, and so on. Secondly, we proved that the implementation satisfies the sufficient condition V under those lemmas. For example, to show that our microprocessor satisfies V3, we proved following condition under the lemma describing that the scoreboard always records the program order of instructions and so on :"w.r.t each pair T,, Ti of instructions such that the scoreboard has their entries, T, does not execute the ro2 stage, if (1) the program order of T,, which is recorded by scoreboard, is less than the order of I,, ( rulecalled "quantifier elimination" which is used in Cooper's algorithm. For speed-up, we added many devices to the algorithm [6] . Although our decision procedure can decide the elimination-order for deleting variables efficiently depending on the form of a given expression, in a few cases we specified this ordering. Although there existed some sentences whose length was over 8,000, the total CPU time spent in the proof was the practical time of 8 hours (See TABLE I ).
VII. CONCLUSIONS
In this paper, we specified the class of the implementation by imposing a design constraint C on the implementation and proposed a sufficient condition V of the correctness of pipelined microprocessors with out-of-order instruction execution. Experimental proof showed that each conditions in the sufficient condition was able to be proved, although the correctness of the whole microprocessor was difficult to be proved.
