Abstract: Quasi-delay-insensitive (QDI) circuits are those whose correct operation does not depend on the delays of operators or wires, except for certain wires that form isochronic forks. In this paper we show that quasi-delay-insensitivity, stability and non-interference, and strong con uence are equivalent properties of a computation. In particular, this shows that QDI computations are deterministic. We show that the class of Turing-computable functions have QDI implementations by constructing a QDI Turing machine.
Introduction
There has been a renewal of interest in the design of asynchronous circuits, motivated by the potential bene ts of designing circuits in an asynchronous fashion. Asynchronous circuits exhibit average case behavior and can therefore be optimized in a data-dependent fashion. Another bene t is that the portion of the circuit involved in the computation is the only part that dissipates power. As a result, asynchronous design methods are relevant for applications where low power consumption is important.
Various models of CMOS circuits are used to hide the electrical properties of transistors, which would otherwise complicate the design process. These models typically assume that voltages represent boolean values, and that a transistor can be thought of as a switch. Delay-insensitive circuit design assumes that the correct operation of a circuit is independent of the delay in operators and wires. It was shown in 3] that the class of circuits that are entirely delay-insensitive is quite limited. Speed-independent circuit design assumes that operators can take an arbitrary amount of time to switch, but that wires have negligible delays compared to operators. Self-timed circuits assume that wires have negligible delay compared to gates in local isochronic regions 6]. Quasi-delay-insensitive circuit design assumes that both operators and wires can take an arbitrary time to switch, except for certain wires that form isochronic forks 2] .
In this paper we show that the class of QDI circuits is Turing complete (modulo nite memory), i.e., any recursive function can be computed with QDI circuits. We show that state transitions in QDI circuits must exhibit the diamond property and, as a consequence, all QDI computations are entirely deterministic. In particular, this implies that one cannot build a QDI arbiter.
Strong con uence is closely related to the property of semi-modularity 5]. However, 5] only considers semi-modularity in the context of sequential machines. In addition, the computations considered are not semi-modular at every state, but only at some states. In 7] , it is shown that hazard-free speed-independent asynchronous circuits are deterministic, but under a di erent gate model. In particular only AND, OR, NOT gates and C-elements are considered. In addition, gates that cannot be directly realized in CMOS are permitted in their model, since they allow gates with inverted inputs. This paper is organized as follows. We introduce our circuit model and explain what a QDI circuit is in RVM33 -2 terms of this model. We then prove necessary and su cient conditions for a circuit to be QDI, and discuss various consequences of this. We give the construction of a QDI Turing machine with a semi-in nite tape as a concrete demonstration of the Turing-completeness of this class of circuits.
Circuit model
A CMOS circuit is a network of gates, where each gate can have an arbitrary number of inputs and one output. We assume that all circuits are closed: each variable of the circuit is the input of a gate and the output of a gate. An open circuit can be transformed into a closed one by representing the environment of the circuit as gates.
The output of a gate is connected to the low voltage level (used to represent the boolean false) via a transistor network (known as the pull-down), and to the high voltage level (used to represent the boolean true) via another transistor network (known as the pull-up). These two transistor networks together form the gate, as shown in Fig. 1 . A transistor is modelled as a switch that establishes or cuts electrical connections between nodes, depending on the voltage of the gate of the transistor. A pull-up/pull-down network is modelled as a network of switches that determine if two nodes are connected or disconnected. This network is represented as a boolean function which is true just when the two nodes of interest are connected.
Since the pull-up and pull-down networks are modelled as boolean functions, a gate can be represented by a pair of boolean functions. Such a representation can be expressed using production rules 2].
De nition. (production rule) (1) A production rule (PR) is a construct of the form G 7 ! t, where t is a simple assignment (a transition), and G is a boolean expression known as the guard of the PR.
A gate with output x, pull-up network B + , and pull-down network B ? corresponds to the production rules B + 7 ! x" B ? 7 ! x# x" and x# are abbreviations for the assignments x := true and x := false respectively. We use the predicate R on transitions to denote the result of a transition: R(x") x and R(x#) :x. For example, a
Muller C-element with inputs a and b and output c would be represented by production rules a^b 7 ! c" :a^:b 7 ! c# De nition. (production rule set) (2) A production rule set is the concurrent composition of all the production rules in the set.
RVM33 -3
A production rule set is used to describe a network of gates.
De nition. (execution) (3)
An execution of a production rule G 7 ! t is an unbounded sequence of rings. A ring of G 7 ! t when G is true amounts to the execution of t, and a ring with G false amounts to a skip. The ring of a production rule in a state where G^:R(t) holds is said to be e ective; otherwise, the ring is said to be vacuous. The execution of a production rule set corresponds to the weakly fair concurrent composition of the individual production rules in the set.
Although one could assume that the transitions on wires are instantaneous, a CMOS circuit does not have this property. We make the weaker assumption that transitions on wires are monotonic. Since we make this assumption, we insist that no production rule is self-invalidating.
De nition. (self-invalidating production rule) (4) A production rule G 7 ! t is said to be self-invalidating when R(t)):G.
A self-invalidating production rule corresponds to a gate whose output transition disabled itself.
If the output of a gate is at the low voltage level, it can change to the high voltage level when the pull-up network becomes conducting. This can happen as a result of the environment changing some input to the gate. If the input of the gate can change in a manner that makes the pull-up network non-conducting before the output of the gate changes, the circuit is said to exhibit a hazard since the circuit could have a glitch on the output of the gate.
De nition. (non-interference) (5)
The production rules B + 7 ! x" and B ? 7 ! x# are said to be non-interfering in a computation if and only if :B + _ :B ? is an invariant of the computation. A production rule set is non-interfering if every production rule in the set is non-interfering.
Let B + 7 ! x" and B ? 7 ! x# be a pair of production rules that de ne the gate for x. If B +^B? were true at any point in the computation, the result at the circuit level would correspond to connecting the high voltage level to the low voltage level|a short circuit! Non-interference guarantees that such a short-circuit cannot occur. (Note that a CMOS circuit implementation will have short-circuit currents when a gate switches; however, these currents are transient.)
Quasi-delay-insensitive circuits
A circuit is said to be quasi-delay-insensitive if its correct operation is independent of the delays of gates and wires, except for certain wires that form isochronic forks.
Isochronic forks and inverters
A fork in a circuit corresponds to an output of a gate being used as the input for more than one gate. As an example, consider the fork in Fig. 2 . The fork connects output x of gate G to the input x1 of gate G1 and the input x2 of gate G2. 
RVM33 -4
The fork being isochronic means that some transitions on x are not acknowledged by a transition of both y and z|the outputs of gates G1 and G2 respectively.
For instance, transition x" causes transitions x1" and x2". Transition x1" causes (is acknowledged by) transition y". But transition x2" does not cause a transition on z. Hence, the completion of transition x2" has to be justi ed by timing assumptions. We assume that, when transition x1" has been acknowledged by transition y", transition x2" is also completed. This assumption is called the \isochronicity assumption. " 3] .
A transition on a variable, say x, can complete in two ways: the voltage of x reaches a value that causes the gate of which x is an input to switch (i.e. a transition on the output takes place); the voltage of x reaches a value that prevents the gate of which x is an input from switching.
In both cases, the voltage value for which the transition is considered completed depends on the structure of the gate|transistor and switching thresholds, in particular. We can abstract from speci c physical dependencies by modelling the time behavior of the transition as a hypothetical \wire delay." It is this abstraction that allows us to say that forks are isochronic when the propagation delays on all branches of the fork are identical|hence the term \isochronic."
If all forks are isochronic, it is possible to lump the delays of all output branches of a gate into the delay of the gate and assume that all wires delays are zero. Therefore speed-independence is equivalent to assuming that all forks are isochronic, and thus ful lling the isochronicity requirement is the most practical way of implementing speed-independence.
Ful lling the isochronicity requirement is considered straightforward, except when there is an explicit inverter on the branch of the fork whose transition is not acknowledged. This means that there is an additional delay|namely, the time taken for an inverter to switch|added to the delay of the wire. Since we have already assumed that gates can take an arbitrary amount of time to switch, this implies that we can no longer meet the isochronicity requirement without making an additional timing assumption on gate delays|an assumption that we do not choose to make.
Therefore, the only gates we permit are those that do not need explicit inverters for their implementation.
Such gates are said to be directly implementable. A production-rule representation of a gate B + 7 ! x" and B ? 7 ! x# corresponds to a directly implementable gate if and only if the negation-normal form of B + contains only inverted literals, and the negation normal form of B ? contains only noninverted literals. This is a consequence of using only P-transistors for pull-up networks and only N-transistors for pull-down networks.
The introduction of inverters to generate inverted versions of certain variables may result in a circuit that is no longer QDI! The process of determining where inverters should be placed and adjusting the senses of various signals to make a production-rule set directly implementable is known as bubble reshu ing. For instance, an AND gate with inputs a and b, and output c is described by the production rule set a^b 7 ! c" :a _ :b 7 ! c# To make this production rule set directly implementable, we can invert the sense of c. The negated version of c is written as c . We obtain :a _ :b 7 ! c " a^b 7 ! c #
We can now add an inverter on the output to obtain c from c . This transformation does not a ect the QDI property of the circuit because the rest of the circuit can never examine c . This observation is general: one can invert the sense of the output of a gate and add an inverter after it without a ecting the QDI property of the circuit that uses the gate.
RVM33 -5
Suppose, instead, we decided to implement the AND gate by inverting the inputs to obtain :a^:
If the uninverted senses of a and b are used in other parts of the circuit and the fork between this AND gate and the rest of the circuit is isochronic, then the introduction of inverters to generate a and b can result in a circuit that is no longer QDI. There are tools that automatically determine where inverters can be placed so that the resulting production rule set is directly implementable 8].
Circuit malfunction
We assume that the only way a QDI circuit can malfunction is if the output of any gate in the circuit glitches. If all gates are hazard-free, then we consider the circuit to be QDI. (An error in the design of a circuit may produce a QDI circuit that implements a di erent speci cation!)
De nition. (stability) (6) A production rule G 7 ! t is said to be stable in a computation if and only if G can change from true to false only in those states of the computation in which R(t) holds. A production rule set is said to be stable if and only if every production rule in the set is stable. Note that stability is not guaranteed by the implementation of a single gate, but is a property of the entire computation. Martin's synthesis method compiles CSP programs into production rules. The synthesis procedure guarantees that the resulting production rule set is stable and non-interfering.
Theorem. (quasi-delay-insensitivity) (7)
A circuit is QDI if and only if the production rule set describing it is stable and non-interfering. Proof: Suppose the production rule set is unstable. Then, there exists a gate represented by the production rules B + 7 ! z" and B ? 7 ! z# with an unstable production rule. Without loss of generality, there is a state in which :z^B + holds, which is followed by a state in which :z^:B + holds before z changes. Therefore, the output of the gate can glitch, which implies that the circuit is not QDI. If the production rule set is non-interfering, there can be a short-circuit.
Suppose the production rule set is stable and non-interfering. Consider a gate B + 7 ! z" and B ? 7 ! z#. From stability, we know that if :z^B + holds, then B + remains true until z changes. In other words, we cannot have a state in which :z^:B + holds before z changes. Similarly, the transition z# is also hazard-free, implying that the gate is hazard-free. Since every gate is hazard-free, the circuit is QDI. 2 
Con uence, Determinism, and Arbiters
In this section we examine some of the consequences of stability and non-interference, the two properties that characterize QDI computations. The following de nition can be found in 1].
De nition. (strong con uence) (8) Let t 1 and t 2 be two transitions that can re in state s. Let s 1 be the state obtained by ring t 1 in s, and s 2 be the state obtained by ring t 2 in s. The computation is said to be strongly con uent, if t 1 can re in state s 2 and t 2 can re in state s 1 , and both alternatives lead to the same nal state. (cf. Fig. 3) Theorem. (strong con uence) (9) A computation can be described by a stable, non-interfering production rule set if and only if it is strongly con uent.
RVM33 -6
Proof: Let G 1 7 ! t 1 and G 2 7 ! t 2 be two production rules that have e ective rings in state s, i.e., s)G 1^G2^: R(t 1 )^:R(t 2 ). Now, t 1 cannot make G 2 false, since that would make the production rule unstable. Therefore, after t 1 res, G 2 7 ! t 2 can still re. Similarly, G 1 7 ! t 1 can re after t 2 changes as well.
Since all transitions are elementary assignments, the nal state does not depend on the order of the two rings. Therefore, the computation is strongly con uent. Conversely, suppose a computation is strongly con uent. For each transition t, de ne G(t) as the disjunction of all the states in which the transition has an e ective ring. Then we claim that the production rule set fG(t) 7 ! t | t is a transitiong is a stable, non-interfering production rule set that describes the computation. Let G(t) 7 ! t be a production rule that has an e ective ring in some state s. Firing any other production rule cannot disable t, since that would violate strong con uence. This implies that the rule G(t) 7 ! t is stable. Since G(t) 7 ! t is an arbitrary production rule, the production rule set is stable. The production rule set is non-interfering since both x" and x# cannot have an e ective ring in a state s, which implies that G(x")^G(x#) false. Finally, the production rule set correctly describes the computation since, by construction, a transition is enabled in the production rule set if and only if the transition had an e ective ring in the original computation.
2 Theorem (9) does not directly rule out self-invalidating production rules. However, these rules can be systematically eliminated by the introduction of new variables. Let one of B + 7 ! x" and B ? 7 ! x# be a self-invalidating rule. We can replace these rules with the following (y is fresh): B + 7 ! y" B ? 7 ! y# y 7 ! x" :y 7 ! x# These rules are no longer self-invalidating since y is a fresh variable. They also do not change the result of the computation. Therefore, ruling out self-invalidating rules does not restrict the computation in any way.
(The rules y 7 ! x" and :y 7 ! x# are implemented with two inverters.)
Consider any strongly con uent computation. Suppose we identify all the e ective rings that can take place at a particular point in the execution and arti cially prevent any other production rule from ring. Then, no matter which path was taken by the computation, the nal result would be the same. This observation holds at any point in the computation. We conclude that a strongly con uent computation is essentially deterministic. Therefore, a QDI computation will always be deterministic. An arbiter with inputs ai and bi, and outputs u and v is described by the handshaking expansion where the thin bar for the selection denotes arbitration. There is no QDI implementation of this circuit because a computation that uses an arbiter cannot be strongly con uent. In the state in which ai^bi holds, both u" and v" can re. However, after u" res, v" can no longer re. This implies that when designing an arbiter, we have to consider the electrical behavior of transistor circuits.
Compilation of a Turing machine
In this section we demonstrate that QDI circuits can be used to compute any computable function by constructing a QDI bounded-tape Turing machine. Since any computation can only use a nite amount of memory (since any physical implementation of a computation can only use nite resources), this demonstrates that restricting the design space to QDI circuits does not limit the class of functions that can be computed. The implementation we propose does not include any inverters on a single branch of a fork, and therefore doesn't contain inverters on the branch of an isochronic fork. Hence, the implementation is QDI (and therefore, also speed-independent).
Let TM = hS; K ; i be a Turing machine with a semi-in nite tape where S and K are positive integers, and is a function The Turing machine is initialized in state zero with its head located at position zero on the tape. It uses , commonly referred to as the next move function to determine the action it should take. It then updates the tape appropriately, and continues the computation. In the rest of this section, we omit the assignments s := q 0 and p := 0, since they can be performed by appropriately initializing the circuit on reset.
Process Decomposition
The computation of the Turing machine proceeds in two steps. Using the current value on the tape and the current value of the state, the next state and action is computed. Following this, the tape is appropriately updated. We therefore decompose the Turing machine into two parts, one for each distinct step of the computation. Using variable c to denote the current value of the symbol on the tape, we obtain: We do not decompose TM 1 furthur, since both next and statebuf can easily be transformed into a circuit (see below). For the remainder of this section, we concentrate on TM 2.
RVM33 -8
A computation resembles a function block when it is of the form \read inputs", followed by \produce outputs." This computation has a standard QDI implementation 4]. We make TM 2 resemble this form by introducing a tape bu er process. fulltape tapecontrol k tape
To implement tape, we consider the tape to be the concurrent composition of a linear array of tape elements as shown in Each tape element contains a register that stores the symbol in the tape element. Apart from this register, each tape element must maintain its state, s. The operation of a tape element (given its current state) can be described as follows: s = l. If a \move left" action is to be performed, it is communicated to the rest of the tape. The new state is the state of the rst process in the rest of the tape. If a write, read, or a \move right" action is performed, this action is communicated to the rest of the tape, and the state remains unchanged. s = r. A \move left", read, and write action can never happen. If a \move right" action occurs, the new state is t. s = t. A \move left" action results in state r. A \move right" action results in state l, and this action is communicated to the rest of the tape so as to move the head to the right. A read/write action is sent to the register. Since the next state of a tape element can depend on the state of the rst element in the rest of the tape, we introduce channels RTin and LTout that communicate this information.
Once again, since the tape reads and modi es its own state, we introduce a bu er to make the copy explicit. The CSP description of the tape element is: The complete tape element decomposition is shown in Fig. 5 .
In the following sections, we compile the Turing machine into production rules. The compilation will proceed by translating each CSP process into handshaking expansions and, nally, into directly implementable production rules. Notice that the compilation of this bu er completes the compilation of tapebuf , and part of tapeelement as well. The rest of TM 1 can be compiled using the standard function block compilation technique 4], and depends on the next move function . The block diagram of TM 1 is shown in Fig. 6 . 
Compilation of the tape
To simplify the handshaking expansion for the tape, we use the input on LTin as a request for data on LTout, and the output on LTout as the acknowledge for channel LTin. The protocol used on L is the usual four-phase handshake.
Compiling tapecontrol. The handshaking expansion for tapecontrol is given by: ; TTout+] The second process can be translated into a number of wires. We compile the rst process for one-bit data. This construction can be easily generalized for n-bits of data. We introduce a state variables st and sf to remove indistinguishable states. (We could have just used one variable, but the resulting production rule 
Putting the pieces together
Each component given above is QDI. Some components require the inverted senses of certain signals as input. As noted in the previous section, the introduction of inverters can compromise the QDI of a circuit.
Observe that the inverted senses of signals are required by tapecontrol and the register. However, the inputs to these processes are not forked to any other process or operator! As a result, we can safely insert inverters between the processes to obtain a QDI Turing machine (cf. Fig. 8 ). By the construction given above, we can state that Theorem. (Turing-completeness) (10) Any bounded-tape Turing-computable function can be implemented using a QDI circuit.
Conclusions

