We present a denotational semantics for the hardware compilation language Handel-C that maps language constructs to a set of equations, which describe the structure of the resulting hardware. This semantics is then shown to be useful for validating various algebraic laws which should hold for Handel-C programs, as well as exposing a key principle which governs how such hardware should be operated.
Introduction
This paper describes a semantics for Handel-C which gives a program a meaning as a collection of equations describing a possible (very naive) hardware implementation -hence the term "Hardware Compiler" Semantics, in the title. This semantics, which sounds very operational in nature, turns out in fact to have a strong denotational character, albeit in an unconventional sense.
Handel-C 4 [3] is a language originally developed by the Hardware Compilation Group at Oxford University Computing Laboratory, and now marketed by Celoxica Ltd. It is a hybrid of CSP [5] and C, designed to target hardware implementations, specifically field-programmable gate arrays (FPGAs) [7] . The language has sequential and parallel constructs and global variable assignment and channel communication. The language targets synchronous hardware with multiple clock domains. All assignments and channel communication events take one clock cycle. All expression and conditional evaluations, as well as priority resolutions are deemed to be instantaneous, effectively being completed before the current clock-cycle ends.
As the Handel-C language targets hardware, it is ideal for implementing embedded systems, often in situations where high levels of assurance would be desirable [4] . There is a clear need for for both a formal semantics of Handel-C (or a reasonable subset) as well as an appropriate methodology and tool support. The research described here is part of program to provide just such an industrial-strength formal framework.
Syntax
We introduce here the "mathematical" syntax of a stripped-down version of Handel-C, which albeit simpler, has all the essential features of the synchronous core of the full language.
We have identifiers for channels (c ∈ Ch) and variables (x ∈ Var ), and we assume the existence of an expression syntax (e ∈ Exp) whose details need not concern us here. We consider all the above as having either boolean or integer type. We also have the notion of guards (g ∈ Grd ), which denote the offering and accepting of communication actions. Guards either denote expression output along a channel (c!e), variable input via a channel (c?x ), or a skip guard which always succeeds (!?).
A syntax of a process is as follows:
P , Q ::= Skip | Delay | x := e | P ; Q | P || Q | P ¡ e £ Q | e * P | g i → P i
We use notation like g i : p i as shorthand for g 1 : p 1 , . . . , g n : p n where i is assumed to index over 1 . . . n for appropriate n. In the last construct, if the !? guard appears it must appear only once, as the last guard. We can briefly summarise the behaviour of a Handel-C process as follows: Skip does nothing, in zero time; Delay does nothing, but takes one clock cycle to do it; x := e assigns the value of e into x , taking one clock cycle; P ; Q first executes P , and once it has terminated immediately starts Q; P Q runs both P and Q in lock-step parallel, terminating when they have both finished; P ¡ e £ Q evaluates e : B and executes P immediately if e is True, otherwise it runs Q; and e * P tests e : B and if True it runs P and then repeats, otherwise it terminates.
The g i → P i construct ("prialt") is an ordered sequence of guard-process pairs. Each guard is checked against the process environment to see if it is able to execute.
If no guards are so enabled, then the prialt blocks until the next clock cycle when it tries again. If one or more guards are enabled, then the first such in the list is executed, and the corresponding process is executed subsequently. An input guard (c?x ) is enabled if there is a corresponding output guard (c!e) in some other prialt executing at the same time, and v.v. The skip guard (!?) is always enabled. The input (c?x ) and output (c!e) guards perform their actions taking one clock-cycle, while the skip guard (!?) acts like Skip so the subsequent process starts execution immediately. It is this "instant" execution of !? guards that so complicates the formal semantics of Handel-C [2] .
To see the problem, consider the following process:
In order to establish the outcome here (using the operational semantics of [2] for example) we proceed as shown in Figure 1 , where n indicates a transition-sequence annotated with the number of clock cycles elapsing; [cond ] denotes a side-condition; and [[effect] ] denotes some change to internal state.
Skip || Skip Details of how requests are "lodged" and "resolved" can be found in [1] . Prialts nested inside default clauses of other prialts may become active in the same clock cycle as those enclosing prialts, which requires us to iterate this requestresolve loop several times, in any given clock cycle. Managing this micro-cycle activity severely complicates the operational semantics. However, the underlying hardware doesn't iterate, as it computes what is to be active in any given clock cycle using combinatorial logic. The "Hardware Compilation" semantics described here was initially developed to see if such a semantics would give some insight into a simpler, less "micro"-iterative operational semantics. In other words, can we find a way to compute the outcome in one (functional) step ?
Hardware Compilation
The key concept behind the hardware semantics is to recognise that the resulting hardware simply consists of a fixed bank of registers, connected by fixed combinatorial logic -in effect a large (finite-)state machine. On each clock cycle, new values for the register state are computed as a function of the current values. Program execution simply repeats this fixed calculation on every clock cycle. The hardware semantics of a Handel-C program is therefore simply a fixed function f : State → State, where State denotes the contents of all the registers. The contribution of this work is to describe how f is determined from the Handel-C language constructs in a compositional manner. We use equations that model the behaviour of the hardware to describe how f is computed.
The main features of the hardware that need to be modelled are:
• Registers, loaded on a clock edge used to store variable values and control tokens to manage control flow.
• Multiplexers used to route expression results to registers and channels (wires) and channel data to registers.
• Program statement hardware has two key signals: start : B, an input, starts execution of the statement; while done : B, an output, indicates its termination.
• A control token is a register whose input is a done signal, and whose output is fed to one or more starts.
We shall express all these components using a set of equations which distinguish between combinatorial (pure functions) and sequential (stateful) hardware. An equation simply equates a variable on its lefthand-side with either a combinatorial or sequential expression on its righthand-side:
We differentiate between combinatorial and sequential expressions by using parentheses for the former (z = f (x , y)) and square brackets for the latter (w = g[x , y]). The overall system is described as a list of such equations, Sys = P Eqn which we expect to have no "combinatorial cycles": Any circular chain of dependencies must include a sequential equation. Generally we either list the equations one to a line as follows:
or we list several on one line, separated by semi-colons:
The combinatorial building blocks provided are the usual functions over B and Z such as ∧, ∨, ¬ , +, −, /, etc.., plus multiplexers (with non-standard controls 5 ):
If no c i , or more than one, is true, then the output is undefined. The latter case corresponds to more than one process trying to update a variable in a given clockcycle. These multiplexers are required because a single process variable x may participate in many assignment statements only one of which should be active during any clock cycle (e.g.):
. . . ; x := 0; . . . ; x := x + 1; . . . ; x := y − 2 * z ; . . . ; x := −1; . . . A multiplexer connects the four pieces of hardware implementing the expressions 0, x + 1, y − 2 * z , −1 to the input of register x . The start signals for each assignment statement above control the multiplexer to determine which expression is routed through to the output. The logical-or of these start signals enables the loading of the register. All register updates occur on the appropriate edge of the global clock, which is implicit in this semantic model 6 . We use three sequential building blocks:
When load is true, in is stored at the clock edge.
• Wait Block (Control Token):
The value of fini is stored and appears on output after clock edge.
• Synchronisation Block: sync n [done 1 : B, .., done n : B] : B sync n 's output is initially false. It waits, over many clock cycles if necessary, for all n done i s to go true. Then its output goes true immediately, and reverts to false at the next clock edge.
In order to generate hardware equations we need to generate hardware equation variables (not to be confused with process variables), which is achieved by giving every process statement (atomic and compound) a unique label. So, for example, the conditional statement p ¡ e £ q might be labelled as ::(m::p ¡ c £ n::q ), where , m and n are unique labels, with labelling the entire conditional construct, while m and n label the true and false branches respectively.
The trick now is come up with a way of generating the hardware equations for a process in a compositional manner. Initially this seems impossible, simply because some of the hardware generated seems to require global knowledge about the whole process for which hardware is being produced. For example, the multiplexers that feed results into variables need to have one data and one control input for every use of the variable in the entire (top-level) process ! This seems to mitigate against a compositional semantics in this case.
However, there is a technical trick we can employ to make our semantics compositional: we generate partial hardware descriptions, and use an equation join operator ( : P Eqn × P Eqn → P Eqn) to collect equations together with appropriate merging of partial hardware elements into more complete ones. This technique for merging partial hardware descriptions is required in three cases:
• Register multiplexers:
We generate a "singleton" multiplexer: in.x = mux 1 (start m 2 )(x + 1), or an empty one: c = mux 0 ()() (for input channels). We merge them using
• Distributed-Or (used for register-load/channel-data controls) We generate either a singleton distributed-or: v = {c}, or an empty one (for some channel cases):
Fig. 2. Compiling Atomic Statements
We merge these using
We generate a register for every use:
i.e., they all refer to the same register
The merging of all other sets of equations simply involves lumping them together as a larger set of separate equations 8 .
Hardware Compilation Semantics
We are now in a position to give the hardware semantics for all the process types in our language. We introduce a semantic function which maps processes into sets of hardware equations:
The semantics of the atomic statements is given in Figure 2 . Skip asserts done the instant is is started. Delay waits for the clock cycle in which is was started to end before asserting that it is done. Hence it always takes one clock cycle to execute.
The semantics of assignment x := e is simply to route the current value of e through a multiplexer to the input (in.x ) of register x . The assignment statement's start control is used to route the multiplexer and load the register (via load .x ). We rely on the merge operator as previously described to link up the multiplexers, distributed-ors, and to merge identical register invocations to get the global hardware required.
The semantics of the standard compound statements are given in Figure 3 The conditional (¡c£) uses c to determine which branch to start. It is done when either branch is.
The loop ( * ) looks at its condition. If false it terminates immediately, otherwise it starts its body. It itself starts on an external request, or if its body has just terminated.
Sequential composition (; ) starts its first sub-statement immediately, its second the instant the first is done, and it terminates when the second does.
Parallel composition (||) starts all its sub-statements immediately once it is itself started. It is done when all its sub-statements are done, as signalled by sync 2 .
The compilation of the prialt statement is shown in Figure 4 The condition (¡g i =!?£) is a "compile-time" conditional, which does not translate into hardware. Once started, a prialt gets its first guard to "offer " to communicate. The guard will report if it is active. If not, then each next guard in sequence is made to "offer ". Once a guard gets an active response, it executes in this cycle, followed by its continuation process in the next (except for default guards, The prialt semantics also makes use of a compilation scheme for guards:
The compilation semantics for guards is described in Figure 5 The guards do not have done and start control tokens, but instead use signals offer and active respectively to offer to perform their corresponding action, and to be told that their action is to be active.
The skip guard !? is implemented with a piece of wire, since it is always active if it offers (Who cares about how it complicates the semantics !!).
An output guard c!e makes a global output offer on out.c, and becomes active if it sees a global input offer on in.c. If so, it then multiplexes its expression data onto wire c. Here we define in.c as an empty distributed-or, simply as a place-holder. Any assertions here come from input statements. We are exploiting the same merge mechanism used for assignments.
An input guard c!x makes a global input offer on in.c, and becomes active if it sees a global output offer on out.c. If active, it behaves like an assignment x := c where c is the channel data. It needs the value of c but cannot provide it, so an empty multiplexer is used to complete the semantics and avoid a dangling reference.
Where has the fixed point gone ?
A standard feature of denotational semantics is the use of fixed points to reason about recursion and iteration. However a look at the semantics of c * p shows no sign of a fixed point. Fixed points are used to ensure that the semantics so given is compositional -the semantics of a compound language construct is built up from the semantics of its components. ]. The merging of the semantics of program fragments is achieved by , which is defined at the semantic level. Secondly, note that we are defining the behaviour of the program, or indeed any well-formed fragment, by giving its computational behaviour for a single clock-cycle. The running program is characterised by a sequence of states generated on successive clock-cycles by the repeated use of f on some starting state s 0 : State
This is where the fixed point has gone -the iteration and its fixed-point semantics is effectively lifted up to a top-level, were it effectively covers the whole program's execution trace. This is why we refer to this semantics as "denotational", but admittedly in an unconventional manner.
Laws of Handel-C
We would like to be able to validate a variety of algebraic laws for Handel-C process, such as:
We consider two programs as equivalent (≡) if they both make the same variable assignments on each clock cycle. The hardware semantics makes it surprisingly easy to prove some of these laws, in particular the structural ones.
In order to perform the proofs we need to introduce the notion of a process variable denoting an arbitrary process ( ::P , say), and referring to its hardware semantics expansion as done = P [start ]
Here P [. . .] represents all the hardware equations that correspond to the semantics of P . The equation above simply serves to name the start and done signals for that hardware.
Proving Skip a unit for ;
We now consider the following three-way equation on processes:
Skip; P ≡ P ≡ P ; Skip
We introduce labels: How do we reconcile these three ? We have a label s in one, but a label t in another, which are not equivalent. The key is to define the concept of a degenerate equation as one which simply equates two variables. We then add in the concept of a degenerate label by defining such as a label for which every equation in which it appears is degenerate, and that all variables referencing it occur as the righthandside of at least one such degenerate equation. Careful examination of the equations above finds that labels s and t are degenerate. Label is not degenerate, because start does not appear on any equation righthand-side. We shall simply use appropriate equation substitution to eliminate degenerate labels -for example if n is degenerate below then
can be safely replaced by z p = f (. . . y m . . .) We can safely do this as it has no effect on the underlying hardware -in effect degenerate equations simply indicate a situation where wires have multiple names, and a degenerate label is one whose sole use is in the provision of one of these aliases. Removing them makes no difference to the underlying hardware.
If we now strip t and s out We see all three sets of equations are now identical. P
Proving commutativity of ||
We now consider the issues surrounding a proof of the commutativity of parallel composition: P || Q = Q || P We label both sides as follows:
::((p::P ) || (q::Q)) = ::((q::Q ) || (r::P ))
Firstly, for the proof we assume that P and Q have disjoint labels, and that neither use labels p or q. We assume that both instances of P have the same labelling internally (and similarly for Q).
We compile the lefthand-side to get: , we note that in general these produce not just one equation, but many, and that these may refer to common variables. However, as the operator is associative and commutative, we do not need to deal with this explicitly, so we can complete the expansion of the lefthand-side as:
This works fine, but we need to keep in the back of our minds that P and Q in the semantics also stand for zero or more additional hidden equations. We expand out the righthand-side:
The ordering of equations is irrelevant, and the only difference between the two forms is the equation for done . To complete the proof we require that the sync function be invariant on any re-ordering of its inputs:
If we assume that sync has this property then our proof is complete. In fact, we take this property of sync as a specification that sync must satisfy. The proof that sequential composition is associative is similar to that showing that Skip is a unit for composition, and requires eliminating degenerate variables in the same way. The (perhaps unsurprising) result of that proof is the following: 
which is the obvious way one would define the semantics of the three-way sequential composition construct: ::(p::P ; q::Q ; r ::R) We have seen that in order to prove some laws we need lemmas regarding properties of building blocks, such as sync. Proving b * P = P ; b * P ¡ b £ Skip requires a much more complex result: namely the Hardware Cloning Lemma.
The Hardware Cloning Lemma states that if we clone a piece of control hardware, and occasionally run that instead of the original, that the switch is unobservable. Note that only the control hardware is cloned -a reference to a variable x or channel c denotes the same hardware elements in both the original and cloned hardware.
Let done p = P [start p ] denote the hardware generated for program p::P . The cloned hardware needs to have labels distinct from those of the original, so let done r = R(P )[start r ] denote the cloned hardware, where R is a relabelling function, that maps label p to r , and maps other labels to new values. We can state the Lemma as
Here start p is true when we plan to run the original, and start r is true when we want to run the clone. Starting P with start p ∨ start r corresponds to running the original in all cases. We assert that we cannot distinguish these two cases. The proof is a long induction over the abstract syntax structure of processes, which we we omit. Why do we need the Lemma to show b * P ≡ (P ; b * P ) ¡ b £ Skip ? Simply because the lefthand-side mentions P once, but the righthand-side mentions it twice. The proof also requires the properties of building blocks shown in Figure 6 . Property Mux is combinatorial and easy to prove, and captures the fact that consistent reordering of controls and inputs does not alter the behaviour. Property SyncClone arises because the case in the proof exposes a key assumption required in order for the cloning lemma to hold, namely that for any language construct, once start is asserted, it must remain false on subsequent cycles until done is asserted. In other words, we cannot re-start hardware until it is done. We shall refer to this as the "NoPipeline principle" 9 , which is captured by the side-condition ¬ ((m ∨ n) ∧ (s ∨ t)). We consider this a nice example of how a formal theoretical analysis of an artifact (Handel-C hardware in this case) exposes a key underlying principle about how such an artifact should be operated.
This and the other properties require a formal model that captures time in order to be proven. It is to this that next turn our attention.
Register Transfer Notation
To capture time, we need to be very clear about when signals are latched into registers, something about which the equations are somewhat vague. The equation x = g[y, z ] indicates that clocked storage is used by g but is unlcear about precise timings. We shall define a subset of our hardware equation notation, called Reg- ister Transfer Notation (RTN) that explicitly defines which lefthand-side variables denote registers. The combinatorial expressions remain unchanged, but the sequential ones must now be "implemented" in terms of a single store primitive store[in] which stores its in value (boolean or integer) at every clock edge. We insist that a sequential statement can only consist of a single use of store, so must be of the form: x = store[data] which we shall simplify with the shorthand x := data. These latter equations are now referred to as storage equations. We show the implementation of the sequential building blocks in terms of RTN in Figure 7 . It is worth noting that the wait building block is in fact exactly the same as the store block just introduced.
We can give a formal semantics to RTN by translating it into state machines. Given combinatorial-cycle free RTN equations:
. . . ; w m = expr m ; . . . ; v 1 := expr m+1 ; . . . ; v n := expr m+n where expr i ranges over v 1 , .., v n and x 1 , .., x , we define the state to be the vector s = (v 1 , .., v n ) ∈ S , the output vector to be o = (w 1 , .., w m ) ∈ O, and the input vector to be i = (x 1 , ..x ) ∈ I . In effect the lefthand-sides of the storage equations constitute the state, and outputs are characterised by being the lefthand-sides of the other equations. If we want to output a state component directly (v i say), then we add a (degenerate) equation w m+1 = v i to signal this.
Given the vectors i, s and o, we can then summarise the equations as: 
For w 2 = wait[x ∨ y] we obtain equations s 2 := x ∨ y ; w 2 = s 2 and machine:
We cannot prove that run 1 = run 2 because the states have different types. Instead we prove that outputs are identical for given inputs, and corresponding initial states:
Proof: by induction on (length of) is. The base case is straightforward. The inductive step requires following two lemmas:
The Inductive Step: π2(run 1 ((x , y) : is)(x 0 , y 0 )) = " Lemma for run 1 " π2((s , (x 0 ∨ y 0 ) : os ) where (s , os ) = run 1 (is)(x , y)) = " defn. π2 (each way) " (x 0 ∨ y 0 ) : os where os = π2(run 1 (is)(x , y)) = " inductive step " (x 0 ∨ y 0 ) : os where os = π2(run 2 (is)(x ∨ y)) = " defn. π2 (each way) " π2((s , (x 0 ∨ y 0 ) : os ) where (s , os ) = run 2 (is)(x ∨ y)) = " Lemma for run 2 " π2(run 2 ((x , y) : is)(x 0 ∨ y 0 )) P State machines can be coded up in the UTP framework [6] (see Appendix A) and similar proofs can be performed in that setting.
Conclusions
We have presented a formal semantics for Handel-C which is compositional and expresses how a process denotes a chunk of hardware described by a set of equations. We have also given examples of how we can use this semantics to prove various laws regarding Handel-C, and we have sketched out a more complex result which requires a specific operating principle (No-Pipelining) to hold in order for the underlying hardware to have the correct behaviour.
We have also built a semantic bridge from Handel-C to UTP, as we can formulate a theory in UTP about state machines (see Appendix A). This results in the first UTP semantics for Handel-C.
Has the Hardware Semantics provided any insight into an "improved" Operational Semantics ? This is not immediately clear or obvious, and as the issue has to do with prialts and default clauses, we should consider a relevant examplenamely the one referred to earlier in this paper:
We attach labels to obtain: The process of selecting which guards are active has been effectively "calculated out" by the process of simplification. It is not clear how this could reflect back into operational semantics: There are chains of equations linking prialt guards in order, and cross-linking to other prialts, but there are difficult to see, even with a global overview! However, the Hardware Semantics is interesting in its own right, as it exposes clearly how a Handel-C program is really a description of a finite state machine, It also exposed the "No-Pipeline" principle, which suggests experimenting with pipelining language constructs.
We need to complete ongoing work to fully formalise the linkages between hardware equations, RTN, the state machines and the UTP semantics. It also needs to be seen what is the full range of Handel-C laws that can be verified using this hardware semantics. The predicate init(s 0 ) indicates that after a state-machine is initialised, its state is s 0 and the input and output sequences are empty, regardless of prior values. Sequential composition (; ) is straight from standard UTP theory [6] , and here is given a definition tailored to state-machine observables. The predicate II (Skip) simply describes a situation where nothing happens -it is useful as an identity for sequential composition. The predicate step ns,op (i ) describes the effect of stepping a state-machine over one input i . The predicate run ns,op (is) describes the effect of stepping a state-machine over the input sequence is. It is defined in terms of skip, step and sequential composition. We can easily show the following laws to hold true:
II; P ≡ P ≡ P ; II run ns,op (is 1 ); run ns,op (is 2 ) ≡ run ns,op (is 1 is 2 )
