This paper demonstrates that veri cation techniques developed for relatively large, synchronous circuits can be applied to speedindependent, self-timed circuits. We introduce local formulas which provide a natural way to specify the input/output behaviour of data-path circuits. The validity of a local formula is independent of the order in which the operations occur in a speed-independent circuit. We demonstrate our approach with the veri cation of two designs: a FIFO, and a vector multiplier chip.
Introduction
Synchronous and asynchronous designs di er primarily by the timing models used to describe their behaviour. In a pure synchronous design, computation is performed by combinational logic, and storage is performed by clocked latches. In this framework, the state of the latches after a clock event is a deterministic function of the values held in the latches before the clock event. Specialised veri cation methods, such as symbolic trajectory evaluation Seg93], exploit the fact that synchronous designs have deterministic next state functions which allow a symbolic \next state" operator to be implemented e ciently. This allows large synchronous designs to be formally veri ed with a high degree of automation Bea93, Dar94, AS95] .
Asynchronous designs, on the other hand, don't use a clock to determine the timing of state changing operations. Because the timing relations between circuit components cannot be completely determined, asynchronous designs are usually modeled using non-deterministic next-state relations. Symbolic model checkers BCo + 
94] have been used to verify small designs;
however, the state-space explosion problem prevents fully automated verication on the same scale that has been achieved for synchronous designs.
This paper introduces a framework in which speed-independent circuits without output choice can be veri ed using synchronous models. The key observation is that the sequence of operations performed by each component depends only on the initial state at the sequence of input values applied by the environment. This motivates specifying the desired behaviour in terms of input and output sequences which is a natural approach for data path circuits, and can be readily veri ed using symbolic trajectory evaluation. By examining a speci c ordering of circuit operations, the circuit can be veri ed using tools developed for synchronous designs.
By excluding circuits with output choice, we exclude designs that employ arbiters. We note that many published asynchronous designs meet this requirement. We verify a vector multiplier from SS93] to demonstrate the practicality of our approach. The Caltech Asynchronous Processor MB + 
89] and Williams' self-timed divider Wil91] are other well-known designs that do not use arbiters. A notable exception is the Sproull Counter Flow Pipeline
Processor SSM94] in which arbiters are used to coordinate the transfer of instructions and data between adjacent pipeline stages.
Properties of Speed-Independent Circuits
This section presents the properties of speed-independent circuits that we use. The key observation is that the sequence of operations performed by each component in a speed-independent circuit depends only on the initial state of the circuit and the sequence of values applied by the circuit's environment. This observation, motivates using speci cations that describe the sequences of inputs and outputs of speci c components. The design of a simple FIFO is used as an example throughout this section.
Speci cations
We verify the correct operation of a circuit for all possible initial states, inputs and relative timings of circuit components. To model all possible inputs, the circuit must be described with its environment. Typically, the description of the environment describes the handshaking protocol and allows the environment to apply arbitrary data values. Our veri cation then shows that the circuit produces the correct results for all possible data values.
We view the circuit and the environment as collections of components (e.g. gates, latches, C-elements). Each signal is connected to the output of exactly one component of either the circuit or the environment. A circuit is speed-independent if no assumptions are made about the relative delays of components, and once a component is enabled to change its output(s), it remains enabled to perform that action until it has performed some action. A FIFO in a generic environment As an example, gure 1 shows a FIFO with its environment. The data source module provides data to be input to the FIFO using a dual-rail code and a four-phase handshake protocol. When the request signal from the FIFO, r0 , is high, the source is enabled to send a value by changing in from empty to either true or false. When this value has been inserted into the FIFO, the FIFO lowers r0 , and the source responds by returning in to the empty value. The FIFO responds to this empty value by raising r0 completing the four-phase handshake. In an analogous manner, the data sink module consumes values output by the FIFO.
Our speci cation for the FIFO requires that the sequence of values output by the FIFO be the same as the sequence applied by the source. If c is a component with stable inputs, and v is an input or output of c, we say v k c is a local literal which represents the value of v after the k th operation of c. Thus, the sequence of values output by the source can be written as in is an unbounded local formula. Because local formulas describe the input/output behaviour of a circuit, they are often written using only interface signals and environment components. This means that the speci cation is independent of the implementation, and the same speci cation can be used for the veri cation of several di erent implementations.
The speci cation for a FIFO in equation 1 assumes that the FIFO is initially empty. A complete speci cation requires a local formula and an initial state predicate. An initial state predicate is a boolean valued expression whose literals are the signals of the circuit and environment. Typically, the circuit to be veri ed has internal state holding components, and the initial state predicate must describe the state of these components. Thus, the initial state predicate typically includes clauses that are speci c to the implementation being veri ed. .f to output either a true or a false value respectively. 3. Two or more components may be enabled in the same state, and any enabled component may e ect the next state change. For example, after a value is copied from the source to the rst stage of the FIFO, the next action might be to copy this value to the second stage, to lower x 0].r, or to propagate a value or request in some other stage of the FIFO. It is this non-deterministic change of circuit state that distinguishes self-timed design from traditional, synchronous designs. Our veri cation method handles the rst two causes of non-determinism by representing the initial state and inputs from the environment symbolically. In this way, traces arising from all possible initial states and all possible inputs are represented by a single boolean formula. In the remainder of this section, we show that the value of a local formula is independent of the order in which components perform their actions.
Characterising traces
Let t and u be traces of an eligible design. If t and u satisfy the same bounded local formulas, then we say that t and u are equivalent. Equivalent traces are those which di er only in the order in which their components perform their actions. Formalising and proving this claim requires a few de nitions given below.
Let t be a trace, c be a component, and k be an integer such that c is enabled to perform an action in state t(k). Because the design is speedindependent, c will remain enabled until it performs an action. However, we could imagine a trace in which c remains enabled forever and never performs an action. Allowing concurrently enabled components to perform their actions in any order models the unspeci ed delays in the design. We assume that these delays are bounded. Therefore, we assume that every time a component becomes enabled, it eventually performs an action. Traces that satisfy this requirement are called weakly fair. 
Traces of eligible designs
We can now state the theorems upon which our veri cation method is based. Informal sketches are provided instead of formal proofs. Detailed proofs are available in Wei96]. The rst theorem states that the order in which component actions are performed does not a ect the validity of bounded local formulas for an eligible design.
Theorem 1 (Equivalence of traces) Given an eligible design, let t and u be two weakly fair traces with the same initial state and consistent environment actions. Traces t and u are equivalent.
The proof is based on the observation that if a trace reaches a state where two components, c and d are enabled, the same state is reached if the action of c is performed rst and the action of d second, or if the actions are performed in the other order. Furthermore, traces that di er only by a swap of such adjacent actions can be shown to be equivalent. Now, consider the rst state where u performs a di erent action than t. Because u is weakly fair, the component that acted in t must eventually act in u. A nite sequence of swap operation will move this action to the same position in u as it was in t. Furthermore, the trace produced by these swaps is equivalent to u. In this manner, we can construct a trace that is equivalent to u and has an arbitrarily long pre x that is the same as t. Therefore, if t satis es a bounded local formula B, then u satis es B as well.
2
Recall that equivalence of traces was de ned only for bounded local formulas. As in the FIFO example, most data-path circuits are speci ed using unbounded formulas. Below, we extend theorem 1 to a class of unbounded formulas that we call \periodic." This class is large enough to specify the behaviour of most practical designs.
Let B be a bounded local formula. Let C be the set of components that appear in B and let p be a function from C to the positive integers Consider a design with initial state predicate G, and let C be a subset of the circuit and environment components. We say that the design has period p if and only if: (1) in any state satisfying G, only components in C are enabled; and (2) from every state in G and for every possible sequence of environment actions, there is a trace that returns to a state in G after each component c in C performs exactly p(c) actions. Using these de nitions, we can extend theorem 1 to the periodic extensions of bounded local formulas.
Continuing only the source component will be enabled in the initial state. Using this initial state predicate, the FIFO design is periodic with the period function given above.
Theorem 2 (Periodic local formulas) Given an eligible design, let B be a bounded local formula of the design. Let C be the set of components that appear in B and let p be a period function for B. If the design has period p and all traces of the design satisfy B, then all traces of the design satisfy P,where P = 8i 2 f0 : : : + 1g:shift(B; p; i)
Proof sketch: For any initial state and sequence of environment inputs, there exists a trace that periodically satis es the initial state predicate. It is straightforward to show that this trace satis es P. By the equivalence of traces (theorem 1), all traces of the design must satisfy P. 2
When using symbolic trajectory evaluation for veri cation, it is convenient to perform the actions of all enabled components as a single state transition. We now show that the validity and value of local formulas is preserved under such an execution model. Given a design, a synchronous trace is a sequence of states such that the rst element of the sequence satis es the initial state predicate of the design, and each pair of consecutive states in the sequence corresponds to performing the actions of all enabled components of the circuit and environment in a single trace. The nal theorem presented here provides the basis for using synchronous traces to verify speed-independent designs. Theorem 3 (Synchronous traces) Given an eligible design, let L be a bounded or periodic local formula. L is satis ed for all weakly fair traces of the design if and only if it is satis ed for all synchronous traces.
We rst show that if L is satis ed for all synchronous traces, then L is satis ed for all traces. Let t be a weakly fair trace. By swapping actions, we can construct a trace u that is equivalent to u such that u is partitioned into epochs. The set of actions performed in each epoch is exactly the actions of those components that were enabled at the beginning of the epoch. Note that each component performs at most one action during each epoch. We construct a synchronous trace s by taking the rst state from each epoch of u. It is straightforward to show that s is equivalent to u (and therefore equivalent to t). If all synchronous traces of the design satisfy L, then s must satisfy L, and therefore t satis es L as well. The proof for the other direction is similar.
3 Implementation
Given a design, an initial state predicate, a bounded local formula, and a period function, the veri cation task is to show that: (1) the design is eligible; (2) for all allowed initial conditions and environment actions there is a trace which satis es the bounded local formula; and (3) the design is periodic with the given period function. As shown in the section 2.4, verifying these three properties establishes that all traces of the design satisfy the periodic extension of the local formula. In our implementation, we use symbolic model checking to verify that the design is eligible and symbolic trajectory evaluation to verify the other two properties.
Model checking
Symbolic model checking BCo + 94] can be used to show that a property holds for all reachable states of a circuit. This approach is fully automatic, and is computationally feasible for relatively small designs. Each component is separately tested using an environment that observes the handshaking protocols but can apply arbitrary data values. In this context, the model checker veri es that the component is speed-independent, has stable inputs (if required), and has no output choice (if it is a circuit component). We also verify that the component complies with its side of the handshaking protocols. Because we are only concerned about reachable states (i.e. safety properties), verifying that each component is eligible in a generic context ensures that the entire circuit is eligible (This approach to compositional veri cation is described formally by Abadi and Lamport AL89]). Because each component is veri ed separately, model checking is only applied to small circuits, and we avoid the problem of state space explosion. This shows that the circuit is eligible.
Symbolic trajectory evaluation
Symbolic trajectory evaluation (STE) SB95] is a veri cation technique for circuits with a deterministic next state function and speci cations written in a restricted logic called \trajectory formulas." STE does not compute the reachable subset of the state space and can be applied successfully to much larger circuits than model checking. Although restrictive, trajectory formulas readily express requirements of the form \if the initial state satises a given predicate then the sequence values output by the circuit are a given function of the sequence of values applied at the inputs," where the sequences are nite. This corresponds naturally to the requirements that we write as local formulas, allowing us to show that for any allowed initial state and environment action there is a trace satisfying the desired bounded local formula.
Verifying that a design has period function p is similar. If c is a component in C p , the domain of p, we augment the model for c with counters that indicate how many operations it has performed. Furthermore, we modify the model of c to disable further operations after performing p(c) actions. Let G be the initial state predicate of the circuit. If the original circuit had a trajectory that returns to G with the correct period, the modi ed circuit will stop in that state (no further state transitions will be possible). Using STE, we verify that all trajectories that start in a state that satis es the initial state predicate, G, end in a state that also satis es G, and that for each component c in C p , the counter for c has value p(c). This shows that the design is periodic, concluding the veri cation.
Both the model checker and STE implementation use Ordered Binary Decision Diagrams (OBDDs Bry92]) to represent boolean formulas. Because we only have to model check small circuits, this approach was very e ective. However, this is likely to be a bottleneck when verifying data paths larger than those considered here. We address this point more in sections 4.
Design representation
The designs are modeled at the component level using VHDL. Each component is described using \behavioural" VHDL, and these are combined to form the complete design using \structural" VHDL. The Voss system for STE SB95] supports VHDL design descriptions. Modifying the models of environment components to test periodicity as described above is easily done in VHDL. For the model checking part of the veri cation, we use the st2 model checker LGS94]. Translation from VHDL to ST for the model checker is manual, but straightforward. Local formulas are written as trajectory formulas for the Voss system, and we extended the st2 model checker to automatically verify the eligibility of designs.
Vector Multiplier
For a data path with signi cant complexity, we chose the vector multiplier described in SS93]. This circuit computes the inner product of pairs of vectors using an array of \Multiply-Accumulate-Shift" blocks, with one block for each bit of the nal result. The accumulating sum is stored in carrysave form until it is output. The computation of an inner product is a nested iteration: the outer loop performs one iteration for each element of the vectors, and the inner loop performs one iteration for each bit of one of the vectors. Initially, the sum and carry bits of the accumulator are set to zero and rst element of one vector is right justi ed in the array. The corresponding element of the other vector is applied serially with the least signi cant bit rst. For each bit of the serial operand, a partial product is computed and added to the accumulating sum. The parallel operand and carry word are shifted to the left, and the next bit of the serial operand is applied. After the entire serial operand has been applied, the next parallel operand is entered into the MAS blocks, again right justi ed, and another product is computed. When all of the products have been computed, the serial operand is set to zero for as many iterations of the inner loop as there are bits in the accumulator. This ensures that the carry word is zero and the result is ready to be output.
A diagram of the MAS bit-slice is shown in gure 3. Unlabeled devices in the diagram represent asynchronous latches (FIFO cells). There is a source (labeled with a 0) which produces a stream of dual-rail false values, and a sink (labeled with a ). The And cell and Adder perform the partial product calculation and accumulation as described above, and the three rows of FIFO cells perform pipeline bu ering. Cells M1 and M2 are multiplexors which control the nested iterations. These multiplexors are asymmetric: when the control line is a dual-rail false, data is passed directly through; when the control line is true, the lower input is passed to the upper output and no operation is performed on the other ports.
The two control signals, Ctl1 and Ctl2, control the iterations. At the beginning of an inner product computation, both signals are true: M1 loads a bit from the parallel operand, and M2 sets the accumulated value to zero. During each cycle of a multiplication, both control signals are false: M1 receives the next less signi cant bit of the parallel operand, and M2 returns the sum to the adder for another cycle of accumulation. When a multiplication is complete, Ctl1 is set to true and the M1 cells load a new parallel operand into the MAS array, and Ctl2 is false, allowing the current accumulation to continue. When the entire inner product has been computed, both control signals are set to true again, the result is output, the accumulator is reset to zero, and a new parallel operand is loaded.
This nested iteration complicates the formalism developed in the preceding sections. Theorem 2 requires a period function p in order to show that we can verify unbounded traces. For this design, there is no clear p. The accumulator is only returned to its initial state (zero) after the complete inner product is computed. However, the circuit accommodates vectors of arbitrary length, and the number of operations performed inner product calculation depends on the length of the vectors. Thus, we chose to verify the design for various xed length vectors. Table 1 below shows veri cation times for short vectors. The rapid increase in veri cation times with the number of pairs is a result of the OBDD representations of the accumulated sum. An OBDD's size increases considerably when its value depends on the interaction of many variables, and the most signi cant bits of the accumulated sum depend in a complicated fashion on all of the bits of each operand pair. Initially, we had hoped to de ne the periodicity according to the computation of a single scalar multiply-accumulate. A di culty arises due to the fact that the MAS block may or may not output the result at the end of this computation according to the value o Ctl2. To verify this interaction with the environment, we must include an environment component that receives the result in the local formula, but this formula may perform two or zero operation depending on whether or not this multiplication was the last computation of the inner product. Accordingly, there is no suitable period function for the inner iteration.
Conclusions
The sequence of operations performed by each component of a speed-independent design without output choice is independent of the relative timing of the various components in the design. We exploited this observation by introducing local formulas. Using these formulas, data-path circuits are naturally specied according to the relationships that must hold between the sequences of values input and output by the circuit. This leads to an automatic verication approach that can be applied to designs of signi cant complexity. In our implementation, speed-independence is veri ed by model checking each component separately, and the local formulas are veri ed by applying symbolic trajectory evaluation to an equivalent, synchronous model of the circuit behaviour.
Many asynchronous designs from the literature satisfy the speed-independence conditions used in this paper. In particular, we have demonstrated the practicality of our approach by verifying a vector multiplier chip from the literature. We are exploring ways to allow period functions that depend on data values to overcome the restrictions described in section 4. We are also interested in applying these techniques to circuits with timing dependencies. If the circuit is hazard-free, then it should have the same properties with respect to local formulas as a speed-independent circuit. However, the simple model checking techniques that we used to show that circuits were speed-independent cannot be used to show that a circuit is hazard free.
