ITL and Tempura aTe w e d for respectively the formal specification and simulation of a large scale system, namely the general purpose multi-threaded dataflow processor EP/3. 'This paper shows that this processor can be specified concisely within ITL and simulated with Tempura. But it also discusses some problems encountered during the rrpecification and simulation, and indicates what should he added to solve those problems.
Introduction
There has been a considerable debate about the use and relevance of formal methods in the development of computing systems (both software and hardware). Some claim that formid methods offer a complete solution to the problems eincountered in such development. Others put the claim that formal methods are of little or no use, or that their utility is severely hindered by their cost.
However, it would be over-enthusiastic to claim that a formal development technique could provide a panacea for the problem involved in developing useful computer systems. Indeed, there are too many aspects which are not amenable to formal representation or reasoning. For example, it is hard to envisage a way in which the process of requirements elicitation could be totally formalised; it is true that some requirements can be denoted using a formal notation, and possible inconsistencies found, but the completeness of the requirements (with respect to client's intentions) cannot be formally proven.
As digital devices are deployed in a growing number of high integrity applications there is increased anxiety about the dependability of such systems. Fur-thermore, the rapid growth of the VLSI market has meant that manufacturers are under pressure to deliver increasingly complex, reliable and cost-effective products within a short time scale. Formal techniques have clearly had an impact on the design of safety critical systems, and have been shown to be commercially advantageous as demonstrated in the production of Inmos IMS T800 floating point unit [8] . We believe that using formal techniques in the production of systems should be viewed as a means of delivering correctness (with respect to requirements) and hence enhanced quality.
One such formal technique is ITL which has been developed over a range of years and has been applied to a various number of systems. However it has not been applied to the design of large scale hardware systems. The aim of this paper is an analysis and discussion of the benefits of the use of ITL and its a+ sociated executable language Tempura [lo] for such large scale hardware systems. Our chosen hardware is a general-purpose multithreaded dataflow computer known as EP/3 (Event Processor/3) [3] . EP/3 is intended primarily as a vehicle for exploratory research in high-performance computer structures.
Related work
There have been a plethora of formalisms proposed and used in conjunction with digital system specification and verification. A complete overview of these formalisms is outside the scope of the present paper, however, in this section we will only highlight some of the well known formalism classes used in this field.
A number of hardware specific calculi have been developed and used. Some of these are supported by theorem proving or other checking tools. Barrow's seminal work on VERIFY [l] (a Prolog program for checking the correctness of finite state machines) and Milne's Circle (a calculus based on CCS) [9] for specifying and analysing circuit behaviour.
General purpose logics have been proposed. The
Boyer-Moore theorem prover is a notable example [2] . Recently, CLInc has demonstrated that Boyer-Moore can be used successfully for non-trivial hardware verification cases. Higher order logic was first used by Hanna and Daeche who developed the VERITAS theorem proving system [SI. HOL [5] is also a machineoriented formulation of higher logic based on Church's lambda calculus. Algebraic specification languages have also been used; a notable example is the use of OBJ specifications with hardware. OBJ-T [13] version of the language was used to specify and test hardware building blocks. UMIST OBJ [4] has also been used to specify simple devices with theorem proving support from REVE [7] .
Computer hardware description languages were the first textual descriptive techniques to be used in the design of hardware. Examples include ELLA [12] and VHDL.
Many ideas originating from reactive systems theory including temporal logics, are relevant to the specification and verification of synchronous and asynchronous digital systems. Although propositional temporal logic can be used for reasoning about hardware, interval temporal logic is particularly interesting for hardware verification.
To our best knowledge, there have not been any serious attempt to specify, verify and design a largescale hardware in interval temporal logic.
Paper Organisation
In Section 2 we will describe the EP/3 processor architecture and in section 3 we will present an overview of ITL and its associated programming language Tempura. The specification and simulation of EP/3 in ITL are discussed in section 4. We give our evaluations in section 5 and indicate future work.
The system used in this case-study is the Event Processor [3] . Its processing elements are based on the multithreaded principle, in which a main memory is available for explicit storage of data (known as static operands), in addition to the normal circulation of tokens (flow operands). The processors are designed for individual high speed (150MB5 with M 1 cycle instruction execution) with the facility for assembly into a small multiprocessing array with shared memory. The latter aims at almost 100% load-balancing efficiency by dynamic scheduling at instruction level.
For the purpose of this case-study, a single processing element will be described, as shown in figure below. 1: The Instruction Issue unit (Inst-Iss) receives instructions from the Cache unit. It will decode these instructions and issue these decoded instructions onto the MI field of the memory highway (Mhwy). All instructions consists of a Command Field and a Destination Field. In the case of multi-target instructions, the Inst-Iss unit will issue the instruction once for each destination, using separate cycle for each. An Interlock signal from the control highway (Zhwy) will prevent any further input into the Inst-Iss unit during this time.
2: The Alul unit receives executable parcels of work from the Stack unit via the processor highway (Phwy). These parcels consist of a command field PI and data operands Pda and Pdb. If the PI is a logical or arithmetic command then the Alul unit will calculate the low-order of the result within one cycle and send this to the Alu2 unit together with the command otherwise it will send one of the operands plus the command field to the Alu2 unit.
3: The Alu2 unit receives an operand and a command from the Alul unit. If the command is a shift or rotate instruction it will execute it on the operand within one cycle and send the result onto the Mda field of the Mhwy otherwise it will complete the high-order of an arithmetic operation from the Alul unit and send the result onto the Mda field of the Mhwy.
4:
The Memory Address unit (Mem-Addr) receives from the Mhwy the inritruction and its flow operand.
The Mem-Addr unit calculates the effective address of the static operand. This address together with the instruction and the flow operand are sent to the Memory unit.
5:
The Memory unit will fetch the contents of the address received from the Mem-Addr unit and will issue them (Sdb) together with the instruction (SI) and the flow operand (Sda) onto the stack highway (Shwy).
6: The Stack unit, receives complete executable parcels of work destinedl for the Alu units via the Phwy, but in order to buffer variations in the rate of flow between the Mhwy and Phwy the Stack unit is interposed. If the Interlock isignal from the Zhwy is present it will store the parcel, (and wait until some future time when the Phwy is cleair but nothing is present on the Shwy. It will then issue any stored instruction parcels.
In the absence of an Interlock signal, the Stack unit will pass the input parcel firom the Shwy straight through to the Phwy. The Stack unit also plays a special role during cache and i/o operations (see the description of the 1/0 unit).
7:
The Cache unit receives via the Phwy instructions PI. The destination field of PI is used by the Cache unit to fetch the corresponding target instruction which is sent to the Inst-Iss unit. When a target instruction is requerated which is not present in the Cache unit, a stream of memory accesses must be made to fill the relevant cache block. This loading function will be performed by the 1/0 unit.
8:
The 1/0 unit is an interface to an external 1/0 processor IOP. It accesses the Memory unit by issueing dummy instructions, together with a flow operand, onto the Mhwy. These cause the Memory unit either to read out a word onto Sdib, or to write the flow operand.
In the case of a read, the data issued by the Memory unit enters the Stack unit. A second function performed by the 1/0 unit is related to cache loading. When a target instruction is requested which is not present in the Cache unit, a stream of memory accesses must be made to fill tlhe relevant cache block. Since the 1/0 unit already contains hardware for generating memory accesses, it is convenient to use the unit for this purpose. The PI field is therefore carried into the 1/0 unit, which contains the cache tag memory and address comparator. When a cache miss is detected by the 1/0 unit, the 1nter:lock signal is asserted and several consecutive memory accesses are made onto the Mhwy. The requested data is passed out of the Memory unit into the Stack unit, as before. 1/0 and cache operations are identified by a special tag on the Mhwy and Shwy; this causes the Stack unit to ignore any such data.
1/0 transfers and cache loading each involve serial operations, and are somewhat too complex to handle with hardwired control. The 1/0 is therefore microcoded.
Interval Temporal Logic and Tempura
This section describes the syntax and informal semantics of the Interval Temporal Logic (ITL) and gives the syntax of the executable part of ITL, i.e., the Tempura language. For a more succinct exposition and the formal semantics see [lo] .
An interval is considered to be a finite sequence of states, where a state is a mapping from variables to their values. The length of an interval is equal to one less than the number of states in the interval (i.e., a one state interval has length 0).
The syntax of ITL is defined in Table 1 where a is a static variable (doesn't change within an interval), A is a state variable (can change within an interval), U a static or state variable, g is a function symbol, p is a predicate symbol. Tempura is an executable subset of ITL, its syntax resembles that of ITL. It has as data-structures 
The Specification and Simulation of the EP/3
This section describes some parts of the specification of the EP/3 in ITL. The whole specification consists of a temporal formula of about 3400 lines, so this won't be given.
T h e Specification
The description of the processor from which we started to write the formal specification, consisted of what is given in section 2 plus a Pascal program that simulates the processor. The first formal specification was therefore of sequential nature, i.e., the units of the processor were sequentially composed. This however was not a faithful specification of the processor because each unit should work in parallel. Luckily the specification was such that the transformation into a "parallel" version was not so difficult. Only the interlock signal caused some difficulties. The specification of the processor is a big ITL formula of about 3400 lines. The general structure of the formula is follows: The i n i t ( ) formula initializes the values of the variables. The repeat ( S ) u n t i l Stopped repeats 'executing" the S formula until Stopped is true which is the case if an stop instruction is executed by the processor. The S formula is an A (and) formula of 7 sub-formulas which means that these sub-formulas are "executed" in parallel. 0 The skip formula describes that we use an interval of two states namely the state of the processor before and after each clock cycle. 
The Simulation
We will simulate the execution of a small machine code program on the processor. This program consists of two threads running in parallel on the processor, one thread subtracts from 0 the value 31 and the other thread adds 31 to 0 . The machine code program is as follows:
1.
QA At stage 1 the Inst-Iss unit gets the first instruction from the Cache and because the instruction has two destinations this instruction is split into two instructions (1.1 and 1.2) , it first issues the instruction with the first destination to the Mem-Addr unit (stage 2 ) and then the same instruction with the second destination This program adds 1 to flow FO and then sends the result twice to itself (it constructs a binary tree of adding instructions).
The stages of the pipeline are illustrated in figure 3 . Whenever an instruction enters the Inst-Iss (B) it will be split into 2 instructions. Because the pipeline has a length five this will eventually result in a situation (for instance stage 12) that there is "no room" in the pipeline; the instruction (i.e., 1.2.1) in the Stack (E) unit will then be put on the stack. In this example because the stack is of finite length, this will result eventually in a stack overflow. On operating system level this problem should then be solved.
A~~~ B C D 
Figure 3:
The stages of the pipeline during execution.
Evaluation and Future Work
This section evaluates ITL and Tempura as vehicle for the specification and simulation of large scale systems. It also indicates future work.
Evaluation
ITL and Tempura are suitable for the specification and simulation of the EP/3, i.e., for large scale systems. But the following problems were encountered during the specification of the processor: the data-structures of Pascal like records, reds, files etc. Although these could be simulated using lists, it would ease usability if these data-types were explicitly exlpressed.
The units of the processor communicate with each other over the highways. This is modeled with shared variables. The use of special kind of variables like channels is more appropriate. Plus allowing communication actions to be explicitly expressed.
In normal programming languages like Pascal, a program like i f y>O then x: =x+i the value of x increases by 1 if y>O and x doesn't change otherwise. In ITL/Tempura the information that x doesn't change has to be coded explicitly: if y>O then x:=x+I e l s e x: =x, i.e., one has to state explicitly that a variable doesn't change. If one has to update one memory cell, this will be a very costly operation. This is the so-called framing problem.
Within ITL/Tempura is it hard to model timing constrains like delay or time-out. In the specification of the processor each clock cycle corresponds to a state in ITL,, i.e., the length of the interval corresponds to the: number of cycles of the processor. We couldn't model properties like: the Alu unit computes the result within 1 clock cycle.
A problem that did not show up in this example is whether to extend Tempura (the executable part of ITL) with high level constructs like the uor" for nondeterminism. The advantage is that more specifications become executab1.e but a disadvantage is that the simulator becomes then very complicated. It may be better to construct a refinement tool that refines a high level specification (written in ITL) into a executable specification (written in Tempura).
Future work
Future work will consists in extending ITL/Tempura with more data-structures and data-types, and constructs for the description of communicating and timing constrains. Furthermore the framing problem should be solved, this .will result in a n increase of the speed of the simulator because a n update of the memory would then only cost 1 statement instead of as many as there are memory cells. Also we will investigate the use of the PVS [11] system as refinement tool for high level specifications written in ITL to executable specifications written in Tempura.
An issue that isn't addressed in this paper is the correctness of the EP/3, i.e., the proof that certain properties like the pipeline doesn't "overwrite" an unit when there is for some reason "no room" in the pipeline. This kind of properties should be formalired within ITL. With the proof system of ITL these properties then can be proven.
A Simulator output of first example
The simulator has a more detailed output namely the values of the registers of the various units of the processor before the start of the cycle. The format is as follows of the output is given in table 3. T 00000000 00000000 00000000 00000000 OOOOOOlP 00000000 00000000 00000000 00000000 0 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 OOOOOOlF 00000000 00000000 00000000 00000000 0000001P 00000000 00000000 00000000 00000000 0 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0 00000000 00000000 OOOOOOlF 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0 00000000 00000000 0000001P 00000000 00000000 00000000 OOOOOOlF 00000000 OOOOOOlP 00000000 00000000 00000000 00000000 00000000 2 9 . . . OOOOOOOOOOOOWOO . 00000000 00000000 00000000 00000000 00000000 81100020FOWW12 . 00000000 00000000 00000000 00000000 0231000OPlWW16 t 00000000 00000000 OOOOOOiF 0 . 00000000 00000000 00000000 00000000 00000000 Z, y1 and y 2 corresponds to respectively Memory 1-cations 00000020, 00000024 and 00000028. o00OOOoooooo00oo . 00000000 00000000 00000000 00000000 00000000 2 1 . . . 0000000000000000 . 00000000 00000000 00000000 00000000 00000000 81100020F012F108 I 00000000 00000000 00000000 OOOOOO00 0 0000000000000000
. 00000000 00000000 00000000 00000000
0000000000000000
. 00000000 00000000 00000000 00000000 oooooooo 0000000000000000 . 00000000 00000000 00000000 00000000 00000000 2 2 1 . 1 0000000000000000 . 00000000 00000000 00000000 00000000 00000000 81100020F012F108 1 00000000 00000000 00000000 00000000 1 81100020F0000012 I 00000000 00000000 00000000 00000000 0000000000000000 . 00000000 00000000 00000000 00000000 00000000
. 00000000 00000000 00000000 00000000 00000000 2 3 .
. I 81100020F012F108 . 00000000 00000000 00000000 00000000 0 81100020Pi000008 1 00000000 00000000 00000000 00000000 8110002010000012 I 00000000 00000000 00000000 00000000 00000000
. 00000000 00000000 00000000 00000000 00000000 hading f fom 8emory [000000201 2 4 . . . 0000000000000000 . 00000000 00000000 00000000 00000000 00000000 81100020F012F108 . 00000000 00000000 00000000 00000000 0 81100020F0000012 . 00000000 00000000 00000000 00000000 81100020F1000008 I 00000000 00000000 00000000 00000000 00000000 81100020P0000012 1 00000000 00000000 00000000 00000000 0000001P Beading from RbmoryC000000201 the r6lue OOOOOOlF oooooooaoooooooo . oooooooo oooooooo oooooooo owooooo oooooooo 
000m0000000000
. 00000000 00000000 00000000 00000000 00000000 811OOO20F012F108 . 00000000 00000000 00000000 00000000 0 81100020F0000012 . 00000000 00000000 00000000 00000000 8F200024FlWW1C I 00000000 FFFFFFEI OOOOOOlF 00000000 FFFFFFEl 8F3ooo28Fl00OOlC T OOOOOOlF 00000000 OOOOOOlP 00000000 00000000
Uriting to Re~or~[000000241 thb ralub PFFFFFEI 3 s . . . 8F300028F10000lC I OOOOOOlF 00000000 OOOOOOlF 00000000 00000000 81100020F012F108 . 00000000 00000000 00000000 00000000 0 81100020F0000012 . 00000000 00000000 00000000 00000000 81100020F0000012 . 00000000 00000000 00000000 00000000 00000000 8F200024F100001C I 00000000 FPFFFFEI 0000001F 00000000 00000000 3 6 . . . 00000000 00000000 00000000 00000000 81100020F0000012 . 00000000 00000000 00000000 00000000 00000000
0000000000000000
. 00000000 00000000 00000000 00000000 00000000 
