In this paper, Interval Temporal Logic (ITL) is used to specify and verify the event processor EP/3, which is a multi-threaded pipeline processor capable of executing parallel programs. We first give the high level specification of the EP/3 with emphasis on the interlock mechanism. The interlock mechanism is used in processor design especially for dealing with pipeline conflict problems. We prove that the specification satisfies certain safety and liveness properties. An advantage of ITL is that it has an executable part, i.e., we can simulate a specification before proving properties about it. This will help us to get the right specification.
INTRODUCTION
As is well known, the complexity of current VLSI has been increasing very rapidly. Traditional simulation methods cannot exhaustively test all cases so that the correctness of products cannot be guaranteed. Formal methods is therefore used to deal with this problem. Formal methods are based on mathematical methods and thus can ensure the correctness in a very rigorous way. We choose ITL as our basic formalism.
Our selection of ITL is based on a number of points. It is a flexible notation for both propositional and first-order reasoning about periods of time found in descripc IFIP 1997. Published by Chapman & Hall
Proving the Correctness of the Interlock Mechanism in Processor Design
tions of hardware and software systems. Unlike most temporal logics, ITL can handle both sequential and parallel composition and offers powerful and extensible specification and proof techniques for reasoning about properties involving safety, liveness and projected time (Moszkowski 1994) . Timing constraints are expressible and furthermore most imperative programming constructs can be viewed as formulas in a slightly modified version of ITL . Tempura provides an executable framework for developing and experimenting with suitable ITL specifications. In addition, ITL and its mature executable subset Tempura (Moszkowski 1986) have been extensively used to specify the properties of real-time systems where the primitive circuits can directly be represented by a set of simple temporal formulae.
We will use ITL to specify and verify the correctness of the interlock control mechanism of an experimental CPU prototype, the Event Processor EP/3 . The EP/3 is a non-von Neumann data-flow pipeline processing element designed for high performance over a range of general computing tasks. The interesting aspect of the EP/3 processor architecture is the integration of multi-threading, pipelining and data flow mechanisms. This is reflected in the manner in which instructions are executed (cf.Section 4). Using the multi-threading technique, program parallelism is exploited by interleaving threads onto successive pipeline stages. The processor may also be used as an element in a multiprocessor system. Three different simulations to the EP/3 have been obtained independently , Li and Coleman 1996 which indicates that the general design of the EP/3 is correct. To increase the level of trustworthiness in the design, formal specification and correctness verification were sought in particular for the interlock control mechanism. The interlock mechanism is used to control the multi-thread pipeline during the execution of conditional and multi-destination instructions.
The approach we take in this paper is that we first simulate (execute) the specification before proving its correctness. The specification we get is the abstract version of . The correctness proof should be done in a compositional way adopting rules developed in (Moszkowski 1994 , Moszkowski 1995 , Moszkowski 1996 . Some work on the formal verification of microprogrammed processors has already been done (Cohn 1988 , Windley 1995 , Tahar and Kumar 1995 . However, they concentrated on the instruction level design and are thus on a lower level than the approach presented here. Furthermore the considered microprocessors have a different architecture from our EP/3. To get an even higher level confidence the generated proofs are mechanically checked using the Prototype Verification System (PVS) (Rushby 1993 ) for which we have developed an ITL proof checking library Moszkowski 1996, Cau et al. 1997) .
The structure of this paper is as follows. Section 2 presents a brief overview of ITL. The general architecture of the EP/3 is described in section 3. We give the specification and the simulation of the EP/3 in section 4, the properties of the EP/3 in section 5 and the verification that the specification satisfies those properties in section 6. We give conclusions and discuss related issues in section 7.
Interval temporal logic is a state based logic which can be used to specify and verify hardware and software systems. Especially it can describe both qualitative and quantitative requirements of systems. Here we only give a brief introduction of ITL. For more details, please refer to B. Moszkowski's papers (Moszkowski 1985 , Moszkowski 1986 , Moszkowski 1994 ).
An interval σ is considered to be a (in)finite sequence of states σ 0 : : : σ i : : : σ n , where a state σ i is a mapping from the set of variables Var to the set of values Val.
The length jσj of an interval σ 0 : : : σ n is equal to n (one less than the number of states in the interval, i.e., a one state interval has length 0).
The main feature of ITL is the temporal operator ';' (chop). In ITL a formula f 1 ; f 2 holds on an interval σ 0 : : : σ n means that there exists an i, 0 i n, such that f 1 and f 2 hold on respectively the intervals σ 0 : : : σ i and σ i : : : σ n .
The syntax of expressions and formulas in ITL is defined in Table 1 , where i denotes an integer; x is a static (global) variable which doesn't change within an interval; A is a state variable which can change within an interval; g is an n-ary function; p is an n-ary predicate. 
The informal semantics of the most interesting constructs are as follows: 8v q f : for all v such that f holds. skip: unit interval (length 1). f 1 ; f 2 : holds if the interval can be decomposed ("chopped") into a prefix and suffix interval, such that f 1 holds over the prefix and f 2 over the suffix.
The Chop operator has some similarities with the sequence operator of program languages. Using the chop operator, the general temporal operators 2 (read always) and 3 (read sometimes) can be defined. 
Compositional proof rule
In (Moszkowski 1994 , Moszkowski 1995 , Moszkowski 1996 several compositional proof rules were developed. Due to lack of space, we will not give a full exposition to the compositionality theory and we thus refer the reader to published work. However, we will use the following compositional proof rule to prove the termination and liveness properties of the EP/3.
CR`w^S
where w, w 0 and w 00 are formulas in conventional first-order logic containing no temporal operators and describing properties of individual states. The turn-style`means that the formula to its right is provable in ITL axiom system. The first lemma states that if w is true in an interval's initial state and S is true on the interval then w 0 is true in the final state and T is true on the interval. The rule shows how to compose two such lemmas proved about input-output behavior of S, T , S 0 and T 0 into a corresponding lemma for S ; S 0 and T ; T 0 . Instructions in the EP/3 flow in a circular pipeline controlled by a 150MHz clock. New instructions flow from the Iy (Instruction Highway) into the Inst, where they are decoded and issued onto the My (Memory Highway). All instructions consist of a command field which specifies the operation and operands, and a destination field which specifies the target instructions to which the result will be sent. An instruction is accompanied by a word of data which forms one of the operands. The other operand can specify a location in the main memory which is read from or written to.
From the My the instruction enters the Memadd, in which its effective address is calculated by adding the base operand and displacement. Then the instruction enters the Memory at next clock cycle. After 'write' or 'read' operations in memory, the instruction with the result will be sent to the Sy (Stack Highway).
The Stack receives the input data from the Sy at the beginning of each clock cycle. The interlock signal Ilock determines whether the output data on the Py is kept or the input data on the Sy is stored into the Stack.
The instruction from the Py enters the Cache and Alu1 units at the same time. They compute different functions of the instruction concurrently. The Cache fetches the target instruction from the cache memory array according the destination address. And the target instruction will be sent to the Inst via the Iy at next clock cycle. At the same time Alu1 executes part of an arithmetic or logical operation and sends the result to Alu2 which computes the remainder of the arithmetic or logical operation.
Here we only focus on the interlock control mechanism. So certain components are ignored, such as Alu1, Alu2, Memadd and Memory. We also assume that the functional operations in each component are correctly implemented. We also ignore the cache loading mechanism, i.e., we assume that the complete instruction tree is already in the Cache.
We use a special symbol bubble to denote an empty pipeline-slot. We use an instruction tree for representing machine programs in EP/3. It is a binary tree where nodes represent the instructions, arcs represent the relations of father and son among the instructions, and leaves represent the finished instructions. The model gives the order relations among the instructions in the EP/3 program. Figure 2 is an example. The root node of the instruction tree is instruction 0. It has two subtrees which represent two threads that start with respectively instructions 1 and 2. After instruction 0 is executed, the instructions 1 and 2 will be issued, one after the other, onto the My. EP/3 should execute instruction 0 before the executions of the instruction sons 1 and 2. The safety and liveness properties given in the next section will specify this order of execution. An instruction with no son will be considered terminated, for example instruction 7 is such an instruction. Now we briefly describe the instruction structure of EP/3. An instruction consists of three parts; one is the operation part which gives the operation style such as 'write' or arithmetic operation in Alu, the second part is the operands of the instruction and the third is the destination addresses which are to be used to get the descending instructions. In other words, the instruction tree gives the address relations among the instructions. Here we assume that the instruction tree has only a finite number of nodes. We will use i j to denote that i is the ancestor of j.
Component Interfaces and EP/3 Instruction Tree

SPECIFICATION OF THE EP/3
We will first show some simulation results (for which the Tempura code is given in the appendix1) and then proceed to give the formal specification of the EP/3.
Simulation in Tempura
In Fig 3, we present the result of executing the instruction tree of EP/3 given in Fig 2. The figure shows clearly the behavior of a stack, i.e., instruction 4 enters the stack in State 8 and leaves the stack in State 14 while instruction 6 enters the stack in State 10 and leaves the stack in State 13. The execution time for 12 instructions is 20 cycles.
If there is only a single thread in an instruction tree, the performance of the EP/3 is then at it worst, i.e., there is always at most one instruction in the pipeline. Obviously, the execution time is 3n cycles for a single thread of length n.
The formal specification
The specification of EP/3 is the composition of the specifications of three components, i.e., Figure 3 Simulation of executing a tree Section 3.1 gave the input and output interfaces of each component. At the beginning of each clock cycle (present state), each component gets the input information from its input ports, then it will process the information and send the result to the output ports at the end of the clock cycle (next state), i.e., there is a unit time delay between input and output. The signal Ilock is an exception to this and is produced immediately after Stage1 receives the input information. In fact, there is still a time delay between the input signals and Ilock, but we ignore it since it is very small relative to the clock cycle. In practical circuit design, the Ilock signal is produced by a combinational circuit and affects Stage0 and Stage2 immediately as shown in Fig 1. This is the key to the interlock mechanism in the EP/3 design. Now we will describe the specification of each component separately.
EP=3 b = Stage0^Stage1^Stage2 Proving the Correctness of the Interlock Mechanism in Processor Design
For the Stage0, the input variables are Ilock and Py. The output variable is Iy. If Ilock = 0 (meaning the unit is unlocked), the value for Iy will be fetched from the cache memory according to the destination address of instruction Py. Simultaneously the result of instruction Py will be computed, we will omit how this is done. Here we use a function sons which can read the descending instruction of Py from the cache memory. If Ilock = 1, the unit is locked, then the output remains the stable. The formal ITL specification of Stage0 can be described as follows: The Stage2 component receives its input from Stage1 and sends its output to Stage0 via the Py. However, the instruction with two destinations will need two clock cycles to send two successive instructions onto the My. Therefore, Stage2 cannot always send new instruction parcels onto Py. EP/3 uses the interlock signal Ilock to signal that Stage2 should store the instruction from My at this time, and wait until some future time when Py is clear and a bubble is present on My. It will then pop a stored instruction which is the head of the list L. 
The specification of the EP/3 should satisfy some requirements (properties), such as safety (no bad thing will happen) and liveness (a good thing will eventually happen). For example, if an instruction is pushed into the stack, then it should be popped out eventually. This is a liveness property of EP/3. If the specification of the EP/3 satisfies these properties, then we say the abstract specification is correct. Following are some safety and liveness properties. We assume that the instruction tree is finite.
Safety For any arbitrary instruction tree, EP/3 should execute the instructions conform the ancestor relation . The first instruction i 0 (the root node) is sent on the My at the beginning of execution of any instruction tree by an initial procedure of a component I/O Processor of EP/3 omitted in this paper.
The safety property of the EP/3 is the conjunction of following properties in which i and j denote any non-bubble instructions in the instruction tree.
1. Any instruction is descended and can not be lost.
If an instruction i is sent onto the My, then it will pass through each component of the EP/3, i.e., it cannot be lost before it is finished. This safety property can be specified as the conjunction of following properties 
No instruction is repeatedly executed.
Properties of the EP/3
Any instruction in the tree cannot appear more than once on the highways of the EP/3, i.e., non-duplication. where hwy 2 f Py Iy Myg.
Termination The termination property expresses that the program must terminate for any instruction tree whose number of nodes is finite. The final state of the EP/3 can be described as
Suppose n is the number of nodes in the instruction tree. If there is at most one instruction in the pipeline then it is clear that the pipeline is not efficient. This could happen if the tree has no 2-destination instructions. The execution time of such a tree is 3n. So we have the termination property
Liveness The following liveness properties will be considered:
1. Stage2 must guarantee that if an instruction in the instruction tree is pushed onto the L, then it will be popped out of the L eventually.
2. Furthermore every instruction in the instruction tree should be executed before the time bound of 3n.
Proving the Correctness of the Interlock Mechanism in Processor Design
VERIFICATION OF THE EP/3
The specification of EP/3 should satisfy (imply) the requirements (properties). In this section, we will give the verifications of those properties. Here we only give the proof guidelines and omit the details. Some lemmas are used to make proofs more understandable. They are easily derived from the specifications of each EP/3 component, initial assumption and some definitions.
Proof of the Safety Property. We have to prove the following: The following compositional rulè
Initial^Stage0^Stage1^Stage2
enables us to split this into
Sp2 a And these can be proven very easily. The proof of the Termination property requires the introduction of the following lemmas. From the specification of the Stage1, we can easily get
i.e., if Ilock = 1 then the current Iy has two destinations, so Iy will not be bubble both this state and the next state. The following lemma is used for the proof that once the processor is in the final state it will remain in the final state.
Lemma 2 Final Final Verification of the EP/3
Proof ( we get the desired result
Lemma 3 Final 2Final
It is Lemma 3 that will actually be used in the proof of the termination property below. We can prove the termination property by introducing a variable C for counting the issued instructions from Stage0.
We assume that at the initial state C = 1 and thereafter C will increase until n.
Since the number of the instructions in the tree is n, C will not increase when it reaches n. So we have the following assumption
That is Iy 6 = bubble^C = n Final The other instructions must be bubble and L must be empty after the last instruction (Iy 6 = bubble^C = n). Otherwise after a few cycles Iy will not be bubble and
Proving the Correctness of the Interlock Mechanism in Processor Design
then C will become n +1. That is a contradiction with the assumption. If n = 1 then the termination property holds else we have
Proof of the liveness properties. Obviously we can prove the liveness properties L 1 and L 2 from the termination property, i.e., T L 1^L2 .
DISCUSSION AND CONCLUSION
In this paper, we have specified the EP/3 at a high level of abstraction and have also given a compositional proof of correctness for its interlock mechanism. has given a low level (at register transfer level) (executable) specification of the EP/3. has extended ITL with refinement rules which will be used to refine the abstract specification to this low level specification. In fact, we are planning to prove the logical equivalence of a variant of the pipeline specification given here and a much more distributed, algorithmic description of the EP/3 in which each instruction is modeled as a separate process which is responsible for getting itself through the pipeline. A version of this second description has already been successfully simulated using Tempura. So the whole development process from high level specification to low level executable code can be expressed in ITL.
In the proof of the termination and liveness properties in section 6 we made the assumption that the number of instructions is finite. This assumption can be dropped if we use a queue instead of a stack in Stage2. Furthermore Stage2 should be changed into (changes are underlined)
) Figure 4 shows a sample execution of the same instruction tree as used in Fig. 3 . At state 9 instruction 5 doesn't bypass the queue but enters the queue and simultaneously instruction 4 leaves the queue. This improvement ensures that, in case of an infinite tree, an instruction leaves the queue eventually, i.e., isn't overtaken by any other instruction. One can see that "the execution order" is preserved.
Discussion and Conclusion
The termination and liveness properties should then be reformulated as follows:
where i and j are not bubbles. Following the methodology of this paper, it is easy to extend the verification of EP/3 with an IO processor component. The latter is responsible for cache memory loading and external communications.
In we have embedded the ITL proof system within the Prototype Verification System (PVS). Some of the proofs generated in the paper are mechanically checked, see appendix2 for the ITL specification of the EP/3 encoded in PVS using the ITL library. Part of the refinement calculus of has also been incorporated into PVS, so that refinement can also be mechanically checked . Furthermore a link between PVS and the Tempura simulator will be built which allows executable ITL specifications derived with PVS to be executed. So a general development tool is constructed in which you can verify, refine and execute ITL specifications.
The interlock mechanism of EP/3 uses both asynchronous and synchronous signals to control the components. This demonstrates that ITL is suitable for describing both synchronous circuits and asynchronous circuits. In explicit constructs for both synchronous and asynchronous communication have been defined.
PVS specification
APPENDIX 2 PVS SPECIFICATION
The following is part of the ITL specification of the EP/3 encoded in PVS. It imports the ITL library discussed in Moszkowski 1996, Cau et al. 1997) . All the proofs presented here have been checked. Due to space limitations we will give no further details but refer to Moszkowski 1996, Cau et al. 1997) 
