Through the use of compositionality and abstraction, it is possible to extend automatic model checking techniques so that large circuits can be verified. This paper presents a case study verification of Benchmark 22 of the IFIP WG10.5 Benchmark Suite for Hardware Verification (a systolic array multiplier containing 115 000 gates). Both the timing and functionality of the circuit are verified (a significant error was discovered in the original benchmark). This illustrates that an appropriate logical framework can support an efficient, integrated tool for verification that incorporates a number of different verification techniques. A specialised theorem prover implements a compositional theory based on symbolic trajectory evaluation (STE). STE, an efficient model checking technique that can support large state spaces because of its natural and easily used method of abstraction, provides the underlying computational engine. The rest of the compositional theory allows a human verifier to use knowledge of the structure of the circuit to overcome some of the computational limitations of model checking. Using STE with its compositional theory, large circuits can be verified in detail using reasonable computational resources.
Logical framework
This section presents the logical framework of our verification methodology. Section 2.1 presents symbolic trajectory evaluation (STE). Section 2.2 describes a compositional theory based on STE that can be used effectively to overcome computational limitations of fully-automatic methods. Section 2.3 discusses the practical aspects of a tool that implements the theory.
Symbolic trajectory evaluation
STE is an automatic model checking algorithm originally proposed by Bryant and Seger [2, 17] , and later generalised and extended by us [7, 8] . The essential components of STE are described below.
Circuits are represented by a model structure, comprised among other things by a state space S and next state function Y -this model structure is extracted automatically from the circuit description.
Properties which the verifier wants to check are described by formulas of a temporal logic, TL. The semantics of TL are given with respect to sequences of states by a satisfaction relation -for each sequence (e.g. for each run of the system) the satisfaction relation says to what degree the sequence satisfies the formula.
Verification conditions are expressed as j = hjg==h j i. The TL formula g, called the antecedent, can be thought of representing the input to the circuit, and the formula h, the consequent, can be thought of as the output of the circuit. j = hjg==h j i holds if every trajectory (every run of the model) that satisfies the formula g also satisfies the formula h. These verification conditions are called assertions. STE is an efficient algorithm for verifying assertions.
For the purpose of this paper all that is important to note is that the method of model representation supports abstraction in a very natural way. Through the choice of antecedent, the verifier can very naturally affect the amount and level of information used by the STE algorithm. For each assertion, therefore, an abstraction of the circuit is implicitly created. Not only does this lead to a computationally effective approach, but since the construction of the abstraction is implicit, it does not place a burden on the human verifier.
STE supports a good model of time, which allows timing properties of circuits to be checked. We can check timing at different levels -for example, we can check that output is produced in the correct clock cycle, as well as checking timing within each cycle. c Designing Correct Circuits, 1996
A full description of the syntax and semantics of TL is not necessary. The only syntax a reader should understand is formulas of the form Global [(f 0 ; t 0 ) ; : : : ; ( f n ; t n )] g, which is a formula that asserts that the property g holds between times f i and t i (inclusive) for i = 0 ; : : : ; n .
A full discussion of this is beyond the scope of this paper. The rest of this section briefly sketches some of the technical details, and more detail can be found in [7, 8, 17] . The partial order of the lattice is an information ordering: the higher up in the lattice the more that is known about the state of the system. We can think of a state as being an abstraction of states above it in the state space, so if s is less than s 1 and s 2 it contains the common features of s 1 and s 2 and suppresses their differences.
Technical details

Formally
The computational advantage of this approach is that given the right logical framework, proving that a property holds of a state, s, implies that the property also holds of all states above s in the information order. The partial order on states can be extended to sequences of states. One of the fundamental results of STE is that given a TL formula there is a set (often a very small set) of minimal sequences that characterise the formula. An assertion, j = hjg==h j i, is verified by computing the characterising sets for g and h, and then comparing these two sets.
Compositional Theory
STE has the property that the cost of verification is dependent more on the formula being checked than the size of the circuit. For this reason, there can often be a substantial computational benefit obtained by breaking the formula down into 'smaller' formulas, checking the smaller formulas, and then combining the smaller results. The compositional theory allows us to do that.
The compositional theory is a set of inference rules for inferring verification assertions. The idea is that the verification problem will be broken down into a number of smaller sub-problems. The individual problems will be verified with STE; these results will then be combined using the inference rules. Note that the circuit is not partitioned: while it is important to be able to identify some structure in the circuit for the compositional approach to be applicable, it is not necessary to decompose the circuit. The basic compositional rules are shown in Table 1 ; the side conditions are conditions that can be checked mechanically. A full description of the compositional theory is beyond the scope of the paper (see [7] ). The example below illustrates a few of the important compositional rules. 
Example
In this section, we present some of the compositional rules in an example taken from the larger verification presented later. We have two cells (one labelled 0:0 and one 1:1), each of which contains a multiplier, an adder and some clock-driven latches. Figure 1 gives a pictorial description of one cell. For the sake of simplicity here, the example ignores clocking, which is integral to the proof done later. These cells are part of Benchmark 22 in which sixteen of these cells are combined in a systolic array. The C Out output of Cell 1:1 is connected to the C In input of Cell 0:0, as shown in Figure 2 
In the second clock cycle, the value on C Out 11 is a 1 b 1 , and the input on A In 00 and B In 00 are, for example, a 2 and b 2 . Now Result 3 is true for all possible inputs, so in particular it will be true for particular inputs at that time. 
Tool
The prototype tools that we built to test the methodology described here used the Voss system [16] as its core. The three key features of Voss are:
A symbolic simulator which provides good models of gate-level and switch-level circuits; An efficient implementation of ordered binary decision diagrams (BDDs); A lazy functional language called FL (a dialect of ML), which acts as the interface to the verification system.
Voss can perform trajectory evaluation on assertions with antecedents and consequents of a simple structure. We have extended Voss so that it can verify a richer set of consequents -in particular negation and disjunction are fully supported. The compositional theory has also been implemented as a simple specialised theorem prover, which provides a set of inference rules to the user. One of the inference rules performs STE, the others implement the compositional rules. The theorem prover also provides some automated assistance to the user to simplify the proof process, and the proof rules can be packaged in different ways. For example, a common pattern in a proof is to combine two results by specialising the one and then using transitivity. We can combine these two steps into one step, as well as use a heuristic to find the necessary specialisation automatically. The extended Voss system including these features is called the VossProver.
FL is very important as a flexible interface to the verification system, providing a good framework for us as tool builders to extend Voss, and a good way for a verifier to interact with the tool. It is good for verifiers because it provides a simple yet powerful means of using the verification system, while at the same time promoting the safety of results since each step of the proof is mechanically checked.
The proof of correctness is written as an FL program. The verifier can work completely interactively with Voss using FL as a command language, or run a batch file, or some mixture of the two. Typically, the verifier starts working interactively with Voss, building up an FL program as a proof script that is later run as a batch file.
One of the key factors making efficient verification possible is appropriate use of data structures. BDDs are critical in this, as STE relies on their efficient implementation. However, BDDs do have their limitations. We use a symbolic representation of data such as integers. When performing STE, data is (automatically) translated into appropriate BDD structures. However, the symbolic representation allows us to use other ways of reasoning about and manipulating data. c Designing Correct Circuits, 1996
Example Verification: Benchmark 17, an integer multiplier
The major case study presented in this paper is a systolic matrix multiplier which contains, among other components, sixteen integer multipliers. Because the verification of a multiplier is in itself interesting and is an important part of the overall verification, the verification of the multiplier is outlined here. The purpose of this section is just to sketch how the verification takes place to give background for Section 4 which describes our methodology more fully. We show that if it has input c from stage 2, and external inputs a and b [2] , that it computes as output c + 2
. This is efficiently proved using STE alone, and then the results are stored symbolically.
We specialise the result to show that when given its actual input, stage 2 produces ab
This is rewritten by the VossProver, using its knowledge of integers, as ab[2 0].
This process is repeated for each stage of the multiplier. The actual proof must take into account timing and is more complicated because the implementation uses redundant representation of data at most stages in the circuit in order to reduce the circuit costs.
The important parts to note about the proof are:
STE is used at the lowest level of the circuit. This removes tedium from the verifier. In the verification of a particular stage, the choice of antecedent implicitly determines the level of abstraction of the circuit. BDDs are used for STE.
A good model of time is provided. Results are stored symbolically -this means we can use the power of BDDs where useful, and use other reasoning methods in other cases. In the proof, the VossProver uses knowledge of integer arithmetic a number of times after specialisation has been used.
c
Designing Correct Circuits, 1996
Compositional rules are used to combine results proved using STE. Each step in the proof is sound, and mechanically checked. The VossProver can help in the proof by finding candidate specialisations automatically.
A complete proof of one 32-bit multiplier takes about 170s on a DEC Alpha 3000.
Using this example, we can now summarise our methodology. STE is used to verify properties wherever possible to reduce the load on the human verifier. For each verification condition checked, an abstraction is automatically created, which makes the method tractable even for large circuits. And since the abstraction is implicitly created by the STE algorithm, no undue load is put on the human verifier. The human verifier then uses the specialised theorem prover that implements the compostional theory to combine 'smaller' verification conditions into 'larger' ones. Full details of verification of multipliers using this methodology can be found in [7, 9, 10 ].
Case Study : Systolic Array
This section describes how our methodology is used to verify a large circuit. We describe the specification and implementation of the circuit, and then describe the verification. Finally, we make some comments on the lessons of the verification.
Specification
A filter circuit, based on a design of Mead and Conway, is Benchmark 22 of the IFIP WG10.5 suite [13] .
The filter is a matrix multiplication circuit for band matrices (matrices of band width w must have zeros in certain positions and a maximum of w non-zero integer entries in any row or column). This circuit is called 2Syst (a full description can be found using the URL ftp://goethe.ira.uka.de/pub/benchmarks/2Syst/). The suite documentation gives a specification of the circuit for multiplying 44 matrices of band width, w = 4 . Let A, B and C be the 4 4 matrices given below: A = The 2Syst circuit has four inputs for coefficients of the A matrix (labelled a0 to a3), four inputs for the coefficients of B (labelled b0, : : :, b3), seven outputs for the coefficients of C (labelled c0 to c6), and a clock input. The timing of when and where the inputs must be applied and the outputs become available is critical. The essential of the circuit specifications is a table which shows when and where the inputs must be applied, and when and where the outputs may be found. c Designing Correct Circuits, 1996
Implementation
Matrix multiplication (C = A B) is computed using the following formula: Thus, the cell has two purposes: it acts as a one clock-cycle delay buffer for coefficients of the matrices (which are passed on to neighbouring cells), and performs the basic operation of an addition and multiplication. The multiplier is Benchmark 17 in the benchmark suite described previously, and the adder is a conventional 2n-bit adder. Each register has an input, an output, and a clock and select pin. By connecting the select and clock pins to the same global clock, the registers become positive-edge triggered: when the clock rises the value at the register's input is latched, output, and maintained until the clock rises again.
These cells are connected in a systolic array: in each clock cycle, cells perform an addition and a multiplication and then pass their results to their neighbours for use in the next cycle. The cells are arranged as presented in Figure 2 . The timings in the specification are designed so that cells get the right inputs at the right time. To help the description, each cell in the systolic array has been labelled by i: j.
The circuit is implemented in Voss's EXE format as a detailed gate level description, using a unit delay model. The implementation is based on the VHDL program given in the benchmark suite documentation.
Verification
The verification of this circuit for moderate bit widths is well beyond the capability of any automatic model checking method. For example, most successful model checking techniques use BDDs. Since BDDs cannot represent multiplication efficiently, the verification of even one of the cells using model checking techniques alone is out of the question.
To use the methodology proposed here, the key step is to partition the problem into smaller problems. It is necessary to use knowledge of the structure of the circuit to do this (although it isn't necessary to partition the circuit itself). The idea is to break the problem down into the largest sub-problems that can be solved using STE alone, and then to combine the mini-solutions using the compositional rules.
Here, the structure of the proof is What makes this approach effective is that we use theorem proving and STE at the right level using appropriate data structures. The theorem prover uses symbolic representation and domain knowledge in the application of the compositional rules to verify higher level properties of the circuit. STE can use the power of BDDs when performing low level verification. Moreover, each time trajectory evaluation is performed, only as much information as necessary is used. Through the choice of antecedent, the user implicitly and easily performs abstraction. The use of a tool with a flexible interface integrating both theorem proving and STE makes this all practical.
Here, the verification task can be divided into two parts, the verification of the individual components, and using the verification of the components to show that the whole array is correct.
Verifying the cells
The verification of a cell must show the multiplier, adder and registers all work correctly. Each cell must be verified individually. The verification of each cell itself is a challenging exercise since the cells perform multiplication, which means that we have to break the verification of each cell down into smaller problems.
This section describes the verification of Cell u:v, and assumes for the sake of this exposition that the clock cycle is 200ns, and the bit-width is 4. When the cells are connected together, port C In uv is connected to C Out u+1v+1 , port A Out uv is connected to A In uv+1 , and B Out uv is connected to B In u+1v .
It turns out that it useful to divide this proof into three parts:
Given value a on A In uv , b on B In uv , and c on C Out u+1v+1 , one clock cycle later ab +c appears on C Out uv ;
Given value a on A In uv , one cycle later a appears on A In uv+1 ; and Given value b on B In uv , one cycle later b appears on B In u+1v .
Of course, it is possible to combine all three into a one, stronger result. However, having three weaker results makes the proof more flexible. The costliest part of the proof is to show the multiplier works correctly, the proof of which was sketched earlier -the verification must be done for each multiplier in each cell. When this is complete, we know that for each u and v that:
The antecedent says that the inputs of the cell are held stable for 100ns (the first half of the clock cycle), and that valid output appears 22ns from the start of the cycle (this is determined by the circuit and is obviously dependent on the bit width). In the cell, the clock has an important effect; to include information of when clocking happens, the rule of consequence is often used to strengthen the antecedent of a result. For convenience, let which is the information about clocking which is needed in the verifying the circuit in the k-th cycle. Recall that each cycle is 200ns long. This TL formula says that at the start of the k-th cycle (i.e. from time 200k), the clock is low for 100ns, then high for 100ns, and then low for 100ns (at the start of the k + 1 -th cycle). 
In the next step we show that the adder works correctly and that the output of the adder is latched for the appropriate time. This can be done with one trajectory evaluation. Note that the time interval in the consequent could be made bigger, but the one given suffices. 
Overall verification
Once each of the cells has been individually verified, the proofs about the individual cells must be combined to prove that the systolic array as a whole works correctly. The proof is modelled on how the systolic array computes its results; in its development the proof traces the behaviour of the circuit as it uses its inputs, computes results, and outputs the answers. 
Analysis and comments
The FL proof script which performs the proof uses the approach outlined above. First, the behaviour of each cell is individually verified. Then, the proof proceeds by proving properties of the circuit in each clock cycle.
A two dimensional array of proofs is kept: at the start of the k-th cycle, the array's (u; The FL proof script uses STE and the inference rules to prove what the output of the circuit is at different stages. The first time the verification was attempted, a significant error was discovered in the benchmark (the specification gave wrong cycles for many outputs), which has subsequently been corrected as a result of this work. The proof script, including the proof of the correctness of all the multipliers and declarations, is approximately 650 lines long. The program itself is straightforward, although the use of a two dimensional array does not show off a functional, interpreted language at its best. The complete verification of a 4 4 systolic array of 32 bit multipliers takes just under three hours on a DEC Alpha 3000 with 512M of memory (11 hours on a Sun 10/51 with 64M of memory).
The verification is of a detailed gate-level description of a particular circuit. Unlike other approaches (e.g. [14] ) we must verify each multiplier of a given bit-width individually. Nor do we exploit the regularity of design in order to reduce computational cost (i.e. we verify each multiplier in the systolic array). Clearly, there are computational disadvantages to this as we cannot just verify one parameterised design. On the other hand, this approach means that we can deal with timing easily; this is important since the timing characteristics of a circuit can be very sensitive to the bit-width. Moreover, from a proof development point of view, this is not a problem since proofs can easily be reused (the proof script for a 4 bit multiplier and a 64 bit multiplier differs in one line only). The incorporation of inductive methods is certainly possible.
As this verification was done at the same time as circuit implementation and system development, it is difficult to estimate how much human time it took -our very rough estimate is that it took a week of work. In order to use our system effectively, it is important to be a competent FL programmer, and this will be the most important part of the learning curve for an engineer. The theorem prover itself is a fairly simple with only a limited number of proof rules. It is also possible to use Voss in a graduated way, starting off using it as a symbolic simulator, then moving to verification using symbolic trajectory evaluation, and then starting to use the compositional theory in increasingly sophisticated ways. c Designing Correct Circuits, 1996
Conclusion
The verification presented here shows a number of things. First, it provides further evidence that STE is a suitable basis for verification of very large state spaces, since abstraction is naturally supported by the lattice-structure of state space. This means that STE can be applied directly to models with large state spaces.
This paper has shown that using compositional model checking and abstraction, implemented in an integrated tool providing both STE and theorem proving, is a practical verification technique for large circuits. In the case study shown, a circuit containing 115 000 gates was verified, with reasonable computational and human cost. Importantly, both functionality and timing properties are proved -an approach that ignored timing issues would not have detected the serious problem in the original Benchmark documentation. We have applied this methodology to other case studies to confirm the method's general applicability.
