In this article we present a framework within which hardware implementations are proven correct from speci cations given in an OCCAM-like language called Handel by the use of a robust set of mathematical transformational laws. The semantical basis for Handel and its hardware implementations are simple functions of time which are called timing diagrams. This basis allows to denote the abstract properties of the Handel programs and hence the implementations in a modal logic, called Duration Calculus. The semantical treatment by one model for all three levels including the abstract properties, Handel and the level of gates, is one of the outstanding features of our approach. The delicate mathematical model which is used is able to cope with the complex form of parallelism used in Handel and with the detailed treatment of the relation between parallelism and timing. An immediate bene t of this approach is that Handel is a language already in use by hardware designers for speci cation in addition to the potential for easy integration of the proposed transformations in existing hardware compilation systems.
Introduction
The current availability of speci cation formats with higher levels of abstraction, begins to blur the di erence in the engineering practices for the design The author has worked on the OMI/HORN ESPRIT project and is now an independent consultant.
x Authors' E-mail: fschenke,mdossisg@comlab.ox.ac.uk of software and hardware. However, for software implementations the semantic gaps between abstract speci cation code and its actual implementation are considerably smaller than for hardware implementations.
Nevertheless, the potential performance and overall system e ciency improvements that can be gained by exploiting the inherent parallelism of digital hardware in the whole range from either coarse-grained or very ne-grained levels of the architecture, has been the main drive for hardware implementations.
The eventual hardware model generated by almost all the current automatic or semi-automatic synthesis methods is usually given in the form of gate-netlists. The process of attempting to bridge the enormous abstraction gap between the speci cation and the produced direct implementations in hardware netlists, is driven in practice by design optimisations. These optimisations are based on a very diverse range of implementation constraints, such as communication protocols, heuristics of the core architecture, limitations of chip area, geometries of the layout, inter-connection facilities, granularity of silicon structure, propagation delays and achievable clock rates. The optimisations introduce a large set of additional semantics in the delivered hardware netlist, and thus they generate di erent semantic spaces between the speci cation and implementation. Although this does not cause any signi cant problems, particularly when the details of the target architecture are well-known in advance and they are xed for a large number of applications, it was proven not to be a suitable methodology, when formal veri cation of the coherence between the speci cation and implementation, as well as of the actual transformation taking place during automatic hardware synthesis, is required. The di culty to apply veri cation then, is due to the semantic inconsistencies caused by the heuristic optimisation models. Moreover, the poor semantic overlap between the two representations often deteriorates for a big number of design tools that allow for manual intervention and empirical re nement of the intermediate representations used within the design ow, by the design engineers.
An Occam-based notation has been used by the Oxford research group, for hardware speci cation and synthesis. This computer language is called Handel and a set of derivatives for its hardware compiler have been developed, used and currently extended by member researchers or the industrial engineers who collaborate with the hardware compilation group in Oxford.
The bulk of the research work with Handel addresses the compilation process from Occam-like programs onto the produced hardware netlists, which are normally prototyped on FPGA devices. FPGAs (Field-Programmable Gate Arrays) utilise a systolic-type array of con gurable logic blocks (CLBs) and a matrix of erasable interconnect modes determining the connections between them. This in conjunction with the one-to-one translation of the spatial and temporal attributes of the Handel programs into the hardware architecture, allows for total control of the intended architecture by the Handel designer.
In addition, the distributed mechanism used for the synchronous hardware designs generated by the Handel compiler, combined with the systolic-type architecture of the targeted devices, improve signi cantly the scalability of the Handel designs. Furthermore, the compilation approach is based on a reduction of the Occam programs, used for speci cation, into their normal form, which in combination with the transformational algebra CSP, 8], which supports the Handel paradigm, makes the synthesis approach fertile for further exploitation and enhancement using formal techniques, such as the one presented in this paper. Another novelty of our approach is the treatment of a parallel operator in a denotational way with synchronous and asynchronous communication in a many read / many write fashion. Some implementation examples, kept reasonably small for the purpose of this article, but related to real hardware implementations, are also included. The current approach achieves the necessary degree of delicacy in the mathematical analysis of the Handel language via the assertion of the following useful observations: 1.) It is very di cult to nd a denotational semantics for a language which has a complex parallelism like the one in Handel, which combines the OCCAM-like point to point communication with a communication via shared variables. 2.) If timing nondeterminism is retained and decisions for a parallel or sequential implementation of the speci cations are employed at a later stage during the automatic design synthesis transformations, then as a consequence the interdependencies between the timing of the parallel and sequential composition inevitably become very complex. 3.) Any approach where the semantics of all abstraction levels is expressed in a uniform model leads inevitably to a certain degree of complexity in the description of the operators.
An approach similar to the one proposed has been taken in the ProCoS project 5], where hardware veri cation was only one issue among many other aspects. However, the formal analysis of Handel transformations presented in this article was not an issue in that project. Among the very few transformation methods based on realtime re nement the ones in 13, 3] attracted a considerable amount of interest. However, the present article is among the rst in which real time re nement is used for hardware veri cation in a far-reaching approach from the very abstract requirement language level (DC) down to hardware implementation. Previous work using transformation algebra was pursued by He Jifeng et al., e.g . 7] . The broader framework of our work is given by 6, 12] . As a complementary approach one might regard the self-timed circuit implementation techniques as in 1, 10, 11] . An approach which also takes several levels of abstraction into account and is based on a modal, in their case intuitionistic logic is presented in 2]. They are, however, not interested in transformational construction, and their modal logic serves for the purpose of abstracting from potentially hazardous behaviour of the lower levels.
The Model
Our models will be based on timing diagrams. Intuitively these are functions which attribute to each point t of time the state the system is in at t. They are used to give a semantical foundation to the DC, to Handel and to hardware. In this article we are using a discrete version. The semantics of a Handel process P is given as a transformation on sets of timing diagrams. The meaning of this is that we describe, how P a ects a system which has evolved already according to some timing diagram. For reasons which will be explained later our semantics is called Quickstep-Semantics. We have to introduce some general notation:
A.) In the semantics presentation we assume a xed set of channels Chan and a disjoint set of program variables V ar. Additionally there are three distin-guished variables W; R; C which are not contained in Chan V ar. Intuitively W; R describe the sets of all variables which are written upon or read from at some point of time, C provides information about the readiness of the channels for communication. By this we remain close to the tradition of CSP 8], the scienti c predecessor of occam, which requires that channel communications are dealt with by means of so called refusal or ready sets, C will contain, however, slightly more information than just a ready set.
Then we introduce the following notations and general assumptions: The functional composition f g is de ned by f g(x) def = f(g(x)). The repeated application of composition f n is de ned by f n (x) def = f(:::f(x)), where f has been applied exactly n times.
C.) The following set of notations are connected with timing diagrams, on which our model is built: The set of Handel processes P is generated by the following grammar: P ::= SKIP j STOP j DELAY j ch?v j ch!exp j x 1 ; :::; x n := exp 1 ; :::; exp n SEQ P; P] j IF(cond; P 1 ; P 2 ) j WHILE(cond; P) j PAR P; P]
In the communications exp is the expression which denotes the transmitted value, and v is the variable where the transmitted value is stored. So we allow multiple assignment, but we require that the x i are pairwise distinct. Furthermore we allow an extension of the SEQ and PAR operators to more than two operands. This is possible, because the associativity of these operators will be among our algebraic laws.
Parallel processes: A parallel process P k = PAR P 1 ; :::; P n ] consists of n > 0 processes where the interface of the subcomponents P i is given by i . We distinguish two sorts of channels:
De nition 5.1 Let be given a set f 1 ; :::; n g of interfaces. From now on we assume that all channels appearing in the interfaces of the sequential components are internal or external. A parallel process can be observed at its external channels and its variables. We shall assume the existence of external channels only in our inductive de nitions, compilable Handel programs do not have them.
Intermediate terms: In order to cope with the problems of a proper implementation of the timing, we shall need some additional syntactical constructs whose behaviour resembles to the behaviour of the already known constructs but does not allow timing non-determinism. The reason for introducing intermediate terms will be made clear in the section on the transformation of Handel programs. The present section is only meant to exhibit the necessary syntax.
The set of intermediate terms is generated by the following grammar:
P ::= op j ch??v j ch!!e j x 1 ; :::; x n ::= exp 1 ; :::; exp n j if (cond; P 1 ; P 2 ) j while (cond; P)
where op is an arbitrary Handel operator, and the newly introduced operators are called slow operators. 6 The Semantics of Handel
Operators on Timing Diagrams
In order to describe the semantics of a language which has an elaborate parallel operator like Handel and includes timing it is not surprising that we need some auxiliary operators. It is rather more surprising that we need just so few. The semantics will be given in a continuation style, saying how some process will act on some history. Therefore in nite timing diagrams will not be changed by the Handel operators. Since the slow operators are only mixed terms, whereas the quick ones, which are a part of Handel, can change the past, it is justi ed to call them \quick". This has lead to the name \Quickstep-Semantics" for our semantics. Some of the de nitions are illustrated by examples in Subsection 6.3. For parallel composition we relinquish the occam view that parallelism is an intersection of the behaviours of the parallel components. Instead we construct an operator on functions in the at-format:
We assume that and an input on a channel ch is possible at some time n, the communication must take place as soon as possible, namely at time n + 1. This is indicated by the communication markers ch?; ch! in W. Additionally the update on v must be performed. Note that in this de nition hiding is implicitly contained, because a parallel process can be observed at its external channels and its variables. If two parallel processes t together, their common channels are not longer external. This means that they are hidden.
For the loop operators we shall give a de nition by syntactical approximation and later relate it to a x point. As usual we rst de ne the totally nondeterministic process CHAOS 
Examples
The following rule gives an example why an update as early as possible is useful: Example 6.5 For variables x; y; z we might want an implementation Basically assignment takes one time unit. This is guaranteed by the fact that a variable v which has been updated at n is put into W at n and so due to function time cannot be a ected twice at time n. This also explains the mathematical necessity of the condition td(W; 0) = V ar( ).
Example 6.6 Assume we have td with (td) = 0; td(x; 0) = x 0 ; td(y; 0) = y 0 ; td(W; 0) = fx; yg Because of x 2 td(W; 0) any assignment to x, whether slow or quick, can only work by rst extending the diagram and then changing td(x; 1) (and td(W; 1)).
Even after further extensions td(x; 1) cannot be changed again, because of x 2 td(W; 1). A new assignment, say x := y, could happen only at 2. In this case there is y 2 td(R; 1) , and no assignment to y could happen before 2, though reads are possible. and again three in nite ones, at 0 i1 (td). Now we calculate the parallel composition: The rst we notice is that at mij jj at 0 kl with j 6 = l contributes nothing for the following reason: It is ch? 2 at mij (td)(W; j) and ch! 2 at 0 kl (td)(W; l). For j 6 = l this contradicts the rst line of the de nition of the set SYN. Then the second line of this de nition in at mij jj at 0 kj is ful lled at time max(i; k). By the third line of the de nition of SYN then j = max(i; k) + 1 and v is updated at j. Since i; k 2 f0; 1; 2g we get j 2 f1; 2; 3g as possible times for the update.
So modulo C this is indeed equivalent to v := w. 2
There is a well known technique of code optimisation which also deals with the question, how an assignment can change the past. The idea is the following:
Order the assignments in the program by dependence. This is a partial order, consider the program, as if it were untimed, thus getting all sequences of assignments which are allowed by the dependence relation and nally introduce timing by sequentialising or parallelising the assignments. Apart from the fact that this technique causes problems with loops, in our setting it would give rise to unwanted rules, which is what the nal example shows. 
Implementation Relation
In order to compare DC formulae and Handel programs, we are facing three problems:
For us DC formulae are timing diagrams, whereas Handel programs are operators on timing diagrams. The arbitrarily given initial values at time 0 must be somehow taken into account. Remember that assignment takes time. So each initialisation on the program level can become e ective not earlier than at time 1. Wheras DC formulae are interpreted only over in nite timing diagrams, Handel programs are mostly dealing with nite ones.
The rst two problems are overcome the same way: We shall apply programs to diagrams of length 0. These re ect exactly the implicit initial values at time 0. To a DC formula an additional conjunct must be added which makes the initial values explicit. A diagram td of length 0 over variables var i with values val i can be conceived as a state assertion dtde d8i : var i = val i e Then the DC formula dtde; true says that the observation must start in a state in which dtde holds.
The last problem is xed by extending nite timing diagrams to in nite ones using the STOP operator.
So for any Handel process P and for any DC formula F we make the de - 
Hardware description in the Model
All hardware devices we need are latches and gates. They are given by DC formulae in the following way:
Latches:
A latch with input x, output y and latch condition cond conforms to the fol- In the second implementation law we can give two conditions each single one of which is su cient for equality:
1.) All operators in P are slow. 2.) P does not write on the variables of cond.
In both cases we prevent that P changes the variables of cond, before they can be evaluated. Finally, we can do away with all quick operators by the law that all slow operators implement the according quick ones.
Distributed Control Circuits
In this section we shall describe, how an intermediate term, which does not contain any quick operators, is directly implemented in hardware. For such a given process P we need a data-multiplexing part DM(P), a channel arbitration part CA(P) and a control part CTRL(P). Then the implementation of P is given by the formula IMPL(P) = DM(P)^CA(P)^CTRL(P)
Before explaining the details of these formulae, we make some assumptions:
1.) For each expression exp appearing in some assignment or output and for each condition cond appearing in some control structure there is some circuitry (gates, adders etc.) which implements exp, cond and is described by some term impl(exp), impl(cond). 2.) For each channel ch there is only one variable v ch in which an input can be written. So all inputs to ch are of the form ch??v ch . Furthermore such a v ch is not written upon in an assignment. These assumptions are not essential.
However they make the part more readable. The channel arbitration part will also be given as a conjunction over the channels CA(P) =ĉ where the conjunction varies over all channels and all output expressions for these channels.
The implementation of the control part CTRL(P) of a process P is done inductively by means of two DC variables s P ; f P , which indicate the start and the nal control state. The following formulae will give 1.) the connections between the start and the nal control states of the subprocesses of P, given by Connect SF (P ), 2.) the connections between the control states of the subprocesses and, respectively, the control wires of the DM part and the not yet de ned wires of the CA part, given by Connect CTRL?DM (P ) and Connect CTRL?CA (P ). Basically our control strategy is the`one-hot' encoding.
Implementation by Distributed Control Circuits
At the Handel level we only consider the data variables. The newly introduced control variables in the DC formulae describing the hardware implementation are not visible on the upper levels. So they will be hidden as usual by existential quanti cation. ch?v 2 td 0 (C; n) , P I (n) = 1 for P I (n) = rdy ch?v (n) and analogously for ch!e. Remember that rdy ch?v is a control variable in td, so this is well de ned. Now we construct td 0 by means of a sequence of td 0 n for each n 2 IN 0 starting with td 0 = td init . Assume td 0 n?1 has been constructed and td(v) changes at n. By de nition of DM(v) this can only be caused by Latch(mux v ; v; change v ), hence change v and thus one of the ctrl(exp i ) must be 1 at n ? 1. By de nition of ctrl(exp i ) this can be due either to an assignment v ::= exp i carried out at n or to a channel communication at that time. Now we collect all changes of td at time n and describe them in the at-notation: at n=v; ::: := exp; :::]. In the case of a channel communication we include the markers ch?; ch! in W. Which channel has to be taken is easily determined by CA(R), the channel arbitration part. Then td 0 n = at n=v; ::: := exp i ; :::] del(td 0 n?1 ) By this procedure we automatically de ne the values for W; R. Then td 0 n has two properties:
1.) Up to time n the values of td 0 n and td coincide on program variables. 2.) Using lemmata 6.1 and 6.2 we can assume td 0 n to be given as the application of a mapping in the at-format to del n (td init ). Then de ne td 0 to be the timing diagram of in nite length with td 0 (n) = td 0 n (n)
In order to show td 0 In this article we have described a formal transformation framework for provably -correct hardware compilation using Handel. The outstanding features of this method is the precise mathematical treatment of a complex parallel operator and its interdependencies with sequential composition with respect to time.
Our method allows to prove abstract requirements in a logical formalism, the DC, on the basis of a uniform model. Such a uniform model is also a novelty. The introduction of the quick operators enables a powerful mechanism for exploiting operation re-scheduling to deliver optimal implementations. Although this model is adequately general to demonstrate real applications in the near future, the clear and strict de nition of the operators which specify and construct the timing diagrams, which in turn constitute the transformational basis, ensures for the robustness as well as the further re nement of our model. There is ongoing work, with a number of real-world applications such as communication protocols and large coding blocks, formally designed using our framework. Results from these activities will be the subject of future publications.
