Abstract
Introduction
As microprocessor designs become more complex, the cost of validation becomes a larger fraction of the total design cost. Currently, validation consumes 25-30% of the design team and months of simulation time. Industry experts predict that there may soon be two or three validation engineers for every design engineer on major microprocessor design projects Wi951.
Although today's theorem provers could be used, in theory, to formally verify modem processors, the time and expertise required would be prohibitively expensive (indeed, such an effort might increase the length of the design cycle). Advances m BDD-based verification methods are not closing the tremendous gap m complexity between modem commercial microprocessor designs and those designs that can be automatically verified. Burch and Dill [BD94b] proposed a new method for verifying microprocessor control circuitry. The method is based on a subset of first-order logic, specifically, the quantiiier-free logic of equality with uninterpreted functions. This logic is appropriate for verification of microprocessor control because it allows abstraction of datapath values and operations. By contrast, propositional logic requires that individual bits be modeled explicitly.
Burch and Dill's verification method has two phases. The Erst phase compiles a behavioral description of the specification and implementation into a formula in the logic; if this formula is valid then the implementation is correct with respect to the specification. The second phase is a program that checks whether the formula is valid.
In this paper, we concentrate on the secondphaseof the method validity checking. We describe fast, compact data structures that significantly speed up the inner loop of the validity checker. Our experiments also explore the trade-offs of using several different heuristics.
The logic studied here is a fundamental building-block for processor verification, and is useful for a variety of different ver& cation methods. We believe that decidable logics which are more expressive than propositional logic (and programs to manipulate them) are going to be very important for verification as well as other CAD applications. Logics with uninterpreted functions are 'This research was partially supported by the Semiconductor Research Corporation under contract number94-DJ-389. The first author is supported by a National Defense Science and Engineering GraduateFellowship.
especially interesting, since the separation between control and datapath is fundamental to many design methods.
Although there have been many previous processor veriiication efforts, we discuss only those that are highly automated and have been applied to relatively complex designs. We believe that our method can deal with more complex designs than these previous efforts. For example, Beatty [Bea93] verified a switch-level non-pipelined processor description by using BDDs and symbolic simulation. In general, pipelining greatly increases the difficulty of the verification problem, so it is unclear whether Beatty's method, or other BDD-based methods, could cope with our design examples.
Although Bhagwati and Devadas [BD94a] claimed to verify a pipelined implementation of the DLX processor architecture (one of our examples) using BDDs and symbolic simulation, their method relies on the simplifying assumption that the pipelined implementation is Lde&ite. Although a correctly functioning pipeline may satisfy this assumption, design errors can result in behaviors that are not kdehite. Since the assumption is not checked, their method can miss a class of bugs that is of both theoretical and practical importance. We conjecture that eliminating the assumption k-dehiteness in this method would make verification of the pipelined DLX design infeasible. Corella et al. [C'94] describe a canonical-form representation for expressions in a subset of first-order logic which is somewhat similar to ours. The method is based on iteratively computing a symbolic representation of the reachable states of the system, similar to BDD-based verifiers. However, iteration is generally much more difficult than symbolically simulating a small, fixed number of steps, so we believe that this method will not be able to handle large designs. Furthermore, the expressiveness of their logical representation is unclear.
Decision procedures for larger subsets of first order logic have been studied by several others, generally in the context of more general theorem-proving systems. Nelson and Oppen [NO791 give a decision procedure for the quantifier-free theory of the real numbers under + and 5, arrays, list structure, and equality with uninterpreted function symbols. Later extensions included congruence closure [NO80]. Shostak [Sho79] also implemented a general decision procedure for a quantifier-free logic richer than ours. These extensions to our subset of kst-order logic are not necessary for verifying processor control. By using a more resbicted logic, we can construct a faster validity checker.
There is an extensive literature on using general purpose theorem provers to verify processor designs, including recent work on verifying pipelined processors [Cyr93, SB90, SM95, Win951. These methods require significantly more manual effort than our technique.
The logic
The quantiiier free logic of equality with uninterpreted functions is more expressive than propositional logic but less expressive than first-order logic. An example of a formula in the logic is:
The operator ite stands for "if-then-else". f is an "uninterpreted function" because we do not have in mind a particular meaning. This formula is true for every possible assignment of a function to f and values for a and b.
Our logic has the following abstract syntax: A function of no arguments is a variable, which can be written without the following parentheses. ite represents the if-then-else operator, which may appear as a formula (retuming a Boolean value), or as a term. The ite operator together with the truth constants true and false is sufficient for representing all Boolean operators. The parser for our implementation macro-expands and, or, etc. into equivalent expressions in the above syntax.
Distinct constants are automatically assumed not to be equal unless they are identical. This feature is useful in processor verification, for example to represent distinct instruction opcodes. In this paper, distinct constants are distinguished with a leading '@"; for example, "@?a" is a distinct constant.
It is helpful when verifying processors to be able to reason about stores (memories) such as register files, caches, or main memory. Formally, a store is a function of one argument (the address). There is special support for stores in the logic, in the form of two special operations: read and write (similar to the select and store operators used by Nelson and Oppen [NO79] ). The expression read(store, addr) is the value at address addr of store store. The expression write(store, addr, val) is the store thathas the value val at address addr, and the same values as store for all other addresses.
Note that the logic's view of stores is very abstract. A store contains no information about the sizes of its addresses or values. If a design can be proved correct under this model, then it is correct for any actual implementation with a known memory size.
An expression in the logic is said to be atomic if it does not contain any ire or write operations.
Validity checking
The symbolic simulation step of the verification procedure generates a logical expression which should be valid, meaning that it is true under every interpretation. Checking for validity, in essence, covers all of the cases that must be analyzed to ensure that the processor works for every possible instruction sequence and initial State.
'The validity checking problem is a generalization of the tautology checking problem for propositional formulas. However, the validity checking problem for our logic is more difficult, becauseit must take into account additional properties of equality and functions. For example, if it is known that a = b and b = c, then it is known that a = c and that f ( a ) = f(c).
Propositional case
First, let us consider a straightforward validity checker, based on Shannon decomposition, for propositional formulas (the subset of our logic that does not have equality, function symbols, or predicate symbols). The algorithm of Figure 1 can be used to check such formulas, once certain functions are explained. Indeed, many tautology checkers for propositional formulas are based on this procedure [LCDM89] .
For propositional formulas, a context is a truth assignment to a subset of the propositional variables. The atomic formulas in our logic are equivalent to propositional variables. An atomic formula can be asserted in a context (which assigns a propositional variable PushContext creates a copy of the current context which can be modified without corrupting previous contexts. PopContext discards the current context and restores the one that was current just before the corresponding PushContext.
The function FindSpli#er(e) retums a propositional variable that is a subformula of e (the choice of formula may be important for efficiency but doesn't matter for correctness). Note that variables which have been asserted (denied) will have been replaced with true valse) during the Fold operation.
The validity checker recursively checks the validity of the formulas obtained by asserting the splitter and folding, and by denying the splitter and folding. If the validity checking function is called with an atomic formula which folds to something other than true, a falsifying truth assignment to the atomic propositions can be constructed from the current context and the residual formula. Otherwise, the procedure eventually terminates, having shown that the formula is true under every truth assignment.
Full logic
The validity checker for the full logic is also described by Figure 1 , except that the basic data structures and procedures called are somewhat more complex.
The most significant extension is to the definition of a context. The contexts for the full logic capture properties of equality. Our implementation of these contexts is particularly efficient and will be discussed in more detail below.
As in the propositional case, contexts store assertions about atomic formulas, and a given context may or may not determine the truth or falsity of an atomic formula. However, in the full logic atomic formulas may be equalities or predicate formulas, where none of the subterms contain ite or write operations. Whenever the atomic formulas that have been asserted or denied in a context imply or contradict another atomic formula, the data structure is guaranteed to detectandreportthis fact, exceptfor someomissions noted below.
The functions Findsplitter andIsAtomic are as descniedfor the propostionalcase. Fold replaces each expression it encounters with the simplest equivalent expression in the current context. Simplest defines a total order on expressions, and expressions are never simpler than subexpressions. The truth constants true andfalse are the least elements, and are simpler than all others. This behavior subsumes the description of Fold in the propositional case.
This behavior of Fold also ensures that, when a = , B holds in the current context, expressions f(a) and f(P) will be replacedby the same expression (since there is only one simplest expression among the expressions equivalent to cr and p). The function Fold memoizes its results for a given context, so its complexity is linear in the size of the DAG representing its argument. Expressions containing reads and writes require a small amount of additional consideration. The validity checker is generally unable to directly prove the equivalence of two stores. Instead, whenever a user wishes to check cr = , R, where CY and / 3 are expressions yielding store values (e.g. writes), the expressions are transformed into the form read(a, arb-addr) ==read@?, arb-uddr) where arb-addr is a new constant name, distinct from all others in the expression. This formula is valid iff the original is valid, since it asserts that the values of the stores for an arbitrary address must be the same.
Finally, the following transformation is performed automatically by the validity checker:
read(write(s, (Y, U ) , p) a i t e ( a = p , -o,read(s, p)).
Otherwise, read and write can be treated like any other function symbols (in actuality, there are certain heuristics that manipulate read and write). This transformation is sufficient to eliminate all writes during the process of checking.
Implementation of contexts
In our logic, expressions can have sub-expressions of arbitrary complexity. We have chosen to implement expressions so that they are unique: whenever two expressions are isomorphic,their storage is shared. Therefore, whenever two expressions are syntactically equivalent, their pointers are the same. As in BDD implementations, uniqueness is maintained through a global hash table of all expressions.
Equality
A context keeps track of equivalences by using the well-known uniodfind algorithm [Tar75, CLR901, resulting in a data structure that is very fast and space-efficient, both in theory and in practice. The context maintains equivalence c h s e s of expressions; two expressions are equivalent if Find retums the same value for each expression. Integers are often used to differentiate equivalence classes. Instead, we use one of the expressions in the class which we refer to as the ECRep (equivalence class representative).
The Find(e) o eration retums the ECRep of an equivalence class. The UnionJi, e2) function merges the equivalence classes of two expressions. The equivalence of two expressions can be quickly determined with two Find operations.
We augment uniodfind by associating certain contextual information with equivalence classes in fields of the ECRep. "Lis information is updated during Union operations. The simplest expression in an equivalence class, used in the Fold operation above, is always in a field of the representative of an equivalence class. There is also a Boolean flag associated with the ECRep which is true iff there is a distinct constant in the equivalence class, This facilitates a quick way to recognize an inconsistency when two equivalence classes with distinct constants are merged. We implement true and fake as distinct constants.
The current validity checker is not a complete decision procedure for the logic, because we do not provide full congruence closure in the contexts. Congruence closure deals with the interactions of functions with equality [NO80]. For example, the validity checker fails to prove that f 3 ( z ) = 2 and f5(z) = z imply the equality f(z) = z. However, the validity checker i s sound -it cannotreport that a formula is valid when it is not (a false positive). The omission of congruence closure way intentional, as it has not been necessary in our proofs and is computationally expensive.
Disequalities
When CY = , R is denied, the result is a disequality'. Disequalities are more difficult to handle than equalities. Our implementation makes use of a disequality table, which is a hash table used to store unorderedpairs of expressions. At all times, it is known that e I # e2 iff (Find(el), Find(e2) ) appears in the disequality table. Hence, the disequality of two expressions can be checkedvery rapidly, in the time required for two Find operations and a hash-table lookup.
The most costly computation is updating the disequality table during a Union operation. Suppose Union( e 1 , e2) modifies the equivalence class representative for e I . Then, for every disequality in the table of the form (Find(el), z), a new pair (Find(ea), z), must be entered into the table. To accelerate this operation, all disequalities involving an expression e are stored on a list pointed to by e (so that the ECreps point to lists of all the disequalities referencing them). This list must also be updated on each Union Operation.
An assertion of disequality is inconsistent with the current context iff it results in an attempt to enter ( e , e ) in the disequality table, for some expression e .
3 Contexts and backtracking
"Le validity checking algorithm requires the ability to assert a formula in a context, then "undo" the assertion so that it can then be denied. Nelson implemented this by carefully removing the assertion from the context [Ne1811. We have a different solution to this problem.
Our solution maintains a global stack of context records. Perexpression contextual information, including the ECRep, fields holding the simplest expression and distinct bit, and the list of disequalities involving the expression, are isolated into a distinct record which we call an ACInfo (for "assumption context information"). Each AClnfo has a pointer back to its context record, and each context record has an ACInfoChain, a list of all the AClnfos associated with it.
When infomation in the ACInfo is to be changed, it is first checked whether the AClnfo points back to the current context. If not, a copy of the AClnfo is made whose context pointer points to the current context record. A pointer to the previous ACInfo is stored in a field of the new one. Figure 2 illustrates the data structure. PopContext iterates over the AClnfoChain of the context record being popped, discarding the current ACInfos and restoring the previous ones.
Each context record also has a list of all the disequality table entries that were d e h e d in the context. PopContext removes all of the entries on this list from the disequality table. Pushcontext creates a fresh context, preserving all previous context information.
Heuristics
Because validity checking by case splitting is exponential, heuristics are essential for working on large problems. The usefulness of these heuristics varies with the example and the way the expressions are constructed in the symbolic simulator.
'as opposed to an inequnliry, such as a < 0. The order in which splitting expressions are chosen greatly influences the number of steps required by the algorithm, similar to the way that BDDs are sensitive to variable ordering. The most effective splitter selection strategy we have discovered thus far is to search for splitters in large subexpressions first. This heuristic approximately doubles the performance of the validity checker for most of our examples.
As shown in Section 3.1, certain transformations are part of the framework primitives. Other transformations are more complex, and can be selectively enabled.
ITE transformations
Certain ite forms contain redundant information. Removing the redundancies when the formula is created is more efficient than removing them during calls to the validity checker. For example, consider the following two transformations:
ite (a, a, p) ===+ ite (a, true, p)
i t e ( a J p , i t e ( a J r , 6 ) ) =s-i t e ( a , p , 6 ) .
Another class of ite transformations which we have found to be useful involves recognizing a not in the if-part, and transforming the ite to remove the noc These simple ite transformations result in incremental efficiency gains; 20% is typical for our examples.
One idea we should borrow from BDD implementations, but have not yet implemented, is the use of "typed pointers," which have a bit associated with them that changes the interpretation of the expression pointed-to from positive to negative. ite(not(a), P, 7) ==+ ire(a, 7, PI.
I f w i g
If-lifting above equalities "lifts" the ite if-part(s) out of an equality, moving the equality inside of the resulting ite:
(ite (a, P, y) = ire (a, 6, E ) ) =s-ire (a, (P = 6), (y = E ) )
The first transformation demonstrates if-lifting when both arguments of the equality are ires and have identical if-parts a. The second transformation is perfomed when only one side of the equality is an ite and @ 6 is a distinct constant (recall that @ distinguishes distinct constants). This transformation pushes the literals When enabled, this transformation occurs a significant number of times. Its effect on our examples is variable, affecting performance by f40%. E-lifting mustbe performed with care, as it may destroy sharing of subexpressions. We have discovered that if-lifting in the general case (two ites with if-parts a1 and a2) results in an undesirable blow-up in the size of the resultant expressions.
Read and write transformations
In 3.2 we described a read transformation which is included in the validity checker to make it sufficiently complete. We have implemented other read and write transformations which improve efficiency. Consider:
i t e ( ( a = P ) , s , w r i t e ( s , a , S ) ) =s-write (write (s , a, 6) , P , read (s , P ) ) Our testing indicates that this transformation results in a more desirable order of splitter selection, which ultimately results in fewer case splits. For our most complex examples, the validity checker does not finish without this transformation enabled. We are currently investigating the effects of several other transformations involving reads and writes.
Experimental results
We have done extensive testing of our validity checker with two major examples: a simple RISC processor as described by Burch and Dill [BD94b] , and a processor being designed at Stanford University as part of the FLASH project [K+94]. For each state variable in the specification of each processor, we use the symbolic simulator to construct an appropriate formula which is then used as input to the validity checker (as was done by Burch and Dill) .
TheRISCprocessoris a subsetof the DLX architecture [HP90]. The subset we have verified has six instruction types: &U immediate, 3-register ALU, conditional branch, jump (unconditional branch), load and store. Our example has a 5-stage pipeline with a load interlock.
FLASH is a distributed memory multiprocessor system being developed at Stanford University. FLASH includes a custom memory and interconnect controller with a general purpose protocol processor (PP). PP is a MIPS-based, statically scheduled, fully pipelined, dual-issue RISC processor core with separate instruction and data caches and executes protocol code for shared memory and message passing. PP does not support virtual memory or precise exceptions. However, it employs simple branch prediction and load interlocks. Our PP description contains eight instruction classes: ALU immediate, 3-register ALU, branch-on-equal, jump, jump-register, jump-and-link, load, and store. The PP is a more complex processor and its model is significantly more detailed than the DLX model used by Burch and DiU [BD94b] .
For DLX, our specification checks three state variables: the register file, data memory, and program counter. For PP, our specification checks five state variables: those in DLX plus a "next" version of the program counter and a taken-branch bit. The extra state is necessary in PP because of more complex branch instruction semantics.
The heuristics speed up the verification of the DLX processor significantly, and make the verification of PP possible. The timing results for the validity checker with various combinations of heuristics enabled are contained in Table 1 .
Conclusion
There is obviously a great deal of additional work to be done to reduce the computational complexity (in practice) of the validity 
