Abstract. This paper presents a novel approach to bounded model checking. We replace the SAT solver by an extended simulator of the circuit being verified. Compared to SAT-solving algorithms, our approach sacrifices some generality in selecting splitting variables and in the kinds of learning possible. In exchange, our approach enables compiled simulation of the circuit being verified, while our simulator extension allow us to retain limited learning and conflict-directed backtracking. The result combines some of the raw speed of compiled simulation with some of the search-space pruning of SAT solvers. On example circuits, our preliminary implementation is competitive with state-of-the-art SAT solvers, and we provide intuition for when one method would be superior to the other. More importantly, our verification approach continuously knows its coverage of the search space, providing useful semi-formal verification results when full verification is infeasible. In some cases, very high coverage can be attained in a tiny fraction of the time required for full coverage by either our approach or SAT solving.
Introduction
Model checking [4, 10] has revolutionized formal hardware verification. The underlying engine for model checking has evolved from the original explicit state enumeration to symbolic model checking [3] , and then bounded model checking [1] . Although none of these approaches strictly dominates the others, each new approach has enabled applying formal verification to problems that were previously intractable.
In this paper, we present a novel approach to bounded model checking. The basic bounded model checking construction reduces temporal logic model checking into the problem of finding a satisfying input assignment for a combinational circuit. Normally, this combinational circuit is converted to CNF and handed to a SAT-solver. Our approach, in contrast, searches for a satisfying assignment by explicitly simulating input vectors on the constructed circuit. The advantage of a simulation-based engine is that the circuit itself can be compiled into efficient machine code, resulting in very fast simulation. Furthermore, our simulation-based engine can be easily extended to handle non-Boolean devices, such as tri-state drivers, whereas a SAT-solver cannot. The obvious disadvantage of a simulation-based approach is the exponential number of possible input vectors. A key contribution of this work is our extended simulation algorithm that prunes the search space analogously to the learning and conflict-directed backtracking of modern SAT-solvers, while still being amenable to compiled simulation.
As with previous model-checking innovations, our approach is inferior to existing methods on some types of problems. On other problems, though, our new approach is competitive with the state-of-the-art in bounded model checking. More importantly, our bounded model-checking engine continuously maintains a conservative bound on the fraction of the search space that has been verified, allowing our method to be used in a semi-formal manner when full, formal verification is infeasible. In some cases, very high coverage can be attained in a tiny fraction of the time required for full coverage by either our approach or SAT solving.
Background
Bounded model checking [1] forms the front-end for our verification approach, so we start with a brief review. Bounded model checking consists of three key insights. First, many practical verification properties are specified over finite-length sequences of states, so one can define a restricted -but still practically useful -temporal logic with only bounded temporal semantics. Doing so avoids expensive fixpoint computations in the model checking algorithms. Second, since the temporal logic has boundedtime semantics, it is possible to convert the temporal logic model checking problem into a non-temporal logic problem, and a bounded model checking algorithm for some temporal logic must specify how to perform this conversion for any formula in that logic. For example, to verify that pUq holds for the next three clock cycles in a sequential circuit, one could "unroll" the circuit three times, creating a purely combinational circuit with three copies of the inputs and outputs (one for each clock cycle), and then build a small combinational network to check that pUq holds in all three cycles. (See Figure 1 .) The third key insight is that modern SAT solvers have become efficient enough to solve the resulting combinational problem in many instances of practical importance. This third insight is simply enabling technology for the practical relevance of bounded model checking and is not integral to the idea. Indeed, a similar approach has been reported using an ATPG tool rather than a SAT solver [2] . In the present work, we rely on the first two insights of bounded model checking, but replace the SAT solver with an engine that offers competitive performance (but with different strengths and weaknesses), and also provides coverage information to allow semi-formal, incomplete verification.
Although our method replaces the SAT solver, the motivation, algorithms, and weaknesses in our approach can be better understood against the backdrop of the techniques and inefficiencies in typical, modern Boolean SAT solvers. The field of Boolean satisfiability checking has a long and extensive research literature, but all of the leading, freely available, non-commercial SAT solvers used for bounded model checking (e.g., [8, 12, 9] ) are based on the approach of Davis, Putnam, Logemann, and Loveland [6, 5] . The basic idea is to choose heuristically a good variable on which to case split, assign a value to that variable and propagate any constraints that can be logically deduced from the assignment, backtrack if our choices and deductions lead to an obviously unsatisfiable formula, and possibly learn relationships among the variables by memorizing variable choices that guarantee a non-satisfying truth assignment. This process is repeated until either a satisfying assignment is found, or the entire search Standard SAT solvers work on formulas in conjunctive normal form (CNF), so if we wish to find an input assignment that makes the output true, the typical translation creates the CNF formula: 1
The first three clauses ensure the AND gate behaves as an AND gate, the next three clauses handle the OR gate, and the last clause specifies that the output must be true. The last clause has only the single literal e. Such clauses are called "unit clauses", and all SAT solvers immediately assign unit clauses to their forced values, simplify the resulting formula, and look for newly generated unit clauses to continue this process. For example, after the unit clause e has been propagated, we get the simpler CNF formula:
At this point, the choice heuristic might choose to try making d true, and unit clause propagation will result in the satisfying assignment in which a and b are true as well.
The basic SAT algorithm appears to be little more than an explicit search through the possible truth assignments. Progress on SAT solving, however, has produced intelligent heuristics for choosing the variables for case-splitting, faster implementations for propagating constraints, clever ways to backtrack more efficiently, and heuristics for adding new clauses in order to learn not to repeat previous mistakes [8, 12, 9] . The resulting tools can be amazingly efficient on many SAT instances.
Let us now compare SAT-solving to a brute-force attack for the problem of finding an input assignment that satisfies a combinational circuit. The brute-force approach would be to systematically try all possible input assignments to the circuit, evaluating the circuit on each input assignment and looking for a satisfying assignment. Such an approach actually has several advantages over the SAT solver. First, the search space is much smaller, corresponding to only the inputs of the circuit, rather than to all the variables the SAT solver uses to model the internal wires of the circuit. Next, given an input assignment, propagating the results of that assignment from inputs to outputs can be implemented extremely efficiently -for example, the circuit could be compiled into straight-line code that needs at most a few machine instructions to evaluate each gate. In contrast, constraint propagation for a SAT solver dominates the run time (over 90% [9] ), and is slow, typically requiring several non-sequential (i.e., cache-miss-prone) memory accesses to walk through the data structures storing the formula, and several data-dependent (i.e., hard-to-predict) branches. On modern processors, the penalty for an L2 cache miss is around 50-100 cycles, and on a Pentium 4, the branch mispredict penalty is at least 19 cycles, so the compiled circuit simulation enjoys an enormous speed advantage. On the other hand, the SAT solver has several advantages over the brute-force attack. First, the SAT solver has the freedom to choose any variable in the system for case-splitting, and the choice of the right splitting variable can sometimes simplify a problem enormously. Empirical results, however, suggest that for bounded model checking, an excellent strategy is usually to choose the variables in a breadth-first manner moving exclusively forward from the inputs to the outputs, or exclusively backwards from the outputs to the inputs [11] . In the forward case, the strategy is essentially a very slow implementation of circuit simulation. The backward case, on the other hand, does give the SAT solver an option unavailable to the brute-force solver. The important advantages in favor of the SAT solver are the backtracking and learning strategies. In particular, modern SAT solvers use some form of non-chronologic or conflict-directed backtracking, in which the tool backtracks all the way back to a relevant decision that could avoid the unsatisfiable sub-problem, rather than simply to the most recent decision. Learning allows the SAT solver to remember combinations of decisions that led to unsatisfiable sub-problems, so that they can be avoided in the future. Our work essentially adds non-chronologic backtracking and learning to the brute-force solver, in a manner that still permits compiled simulation.
Verification Algorithm
We first present the brute-force compiled simulation algorithm, and then show how it can be modified to incorporate intelligent backtracking and learning.
Brute-Force Compiled Simulation
We assume we are given a gate-level sequential circuit, an initial state, a verification wire, and a time bound k. The verification problem is to find a sequence of inputs that causes the verification wire to be true at time k. Different bounded model checking constructions can be handled by pre-unrolling the circuit into a combinational circuit, and then using our algorithm with k 0.
More formally, let C be a sequential circuit with n input variables x 0 x n 1 and m state variables s 0 s m 1 . We use superscripts to denote time indices, so the initial state I is an assignment of Boolean values to s 0 i , for i 0 m 1. Label the verification wire f , so we seek an input sequence that causes f k 1.
The brute-force approach would take an instance of the verification problem and generate a program with the following structure: The heart of the program is the simulate_circuit function. For a combinational circuit, the code generator declares a variable for each wire in the circuit, then does a topological sort on the gates, and generates code that evaluates the output of each gate as a function of its inputs. For example, if our circuit contains the gate a = AND(b,c), the emitted code could be as simple as:
circuit.a.value = circuit.b.value & circuit.c.value; which would compile to as little as one instruction and at most a few instructions in machine code. The generated simulator has no expensive data structure for storing and manipulating the circuit and performs no traversals over the circuit; instead, the only representation of the circuit is embedded in the evaluation code itself. To simulate several cycles of a sequential circuit, we simply simulate the next-state logic combinationally, update the state variables, and repeat. The simplest way to choose an untried vector and record unsuccessful trials is to count sequentially through all possible vectors. Obviously, we will need to introduce more effective ways to do this, but any method that makes progress on each iteration will produce the correct answer.
Skip Cubes
The key idea behind the advanced backtracking techniques used in modern SAT solvers is that many of the decisions (assignments to variables) made before reaching a conflict have no effect at all on that conflict. Therefore, the backtrack should not bother revising irrelevant decisions. Analogously, we will now introduce a mechanism, which we call "skip cubes", by which the circuit simulator can tell which input variables did not affect the value of f k . Note that this computation is done for the specific input vector being simulated, so this reduction is more specific than the cone-of-influence reduction, which can only eliminate portions of the circuit which do not affect f k for any possible input vector. For each vector simulated, therefore, we also compute a potentially large set of other vectors that are guaranteed to produce the same result and can therefore be pruned from further consideration.
Define the universal set U to consist of all binary vectors of length N n´k · 1µ.
An element v v 0 v N 1 ¾ U corresponds to an input sequence over k · 1 time steps with x t i v tn·i , so we will use the terms "vector" and "input sequence" interchangeably.
Starting in initial state I and given some v ¾ U, let w t v represent the value on a wire w of the circuit C at time step t.
Definition 1 (Skip Set) The skip set of a wire w at time t with respect to input vector v is defined
Intuitively, S v´w t µ is the set of all vectors that cause w t to have the same value as when C is simulated with the input sequence v. Specifically, simulating the circuit with a vector v will drive f k to the same value as any other vector in S v´f k µ, so if f k v is false, we may skip any subset of these vectors when searching for a satisfying assignment.
Computing S v´w t µ for each gate output could be done in a straightforward manner at the same time that the output value is computed. For example, if w is the output of an AND or NAND gate with inputs a and b, then
Other gate rules are similar. Note, however, that S v´w t µ is simply either the on-set or the off-set of w t , so any exact computation of skip sets amounts to computing the exact functionality of each wire, which will blow-up for many practical examples. We may express a cube B as a length N vector over the alphabet 0 1
, where B i is the specified value of bit i if specified, or " " if unspecified. We now defined the skip cube of a wire: 
Proof:
The intersection of two cubes is always either another cube or the empty set. The latter case occurs only if the two cubes disagree on at least one specified bit. In Definition 3, all specified bits always agree with the input vector v. 
The base cases are that the skip cube for a primary input wire has that input bit specified and all other bits unspecified, which is clearly in the skip set for that input wire, and that the skip cube for a latch at time 0 is completely unspecified, which is clearly in the skip set for that wire (because the inputs don't affect the reset state of the latch). For the inductive step, assume that the skip cubes for the inputs of any gate are contained in their skip sets, and therefore that any vector in that cube would not change the value of that input. Then, the skip cube for the output of the gate computed according to Definition 3 contains only vectors that would not change the value of that output, and are therefore contained in the skip set. This can be easily verified by a case analysis of all the rules. We now consider how the computation of the skip cubes can be efficiently integrated into compiled simulation. During simulation of C against input v, we store v as a string of N bits in memory, padded to the nearest machine word boundary. A skip cube B can also be stored as a same-sized bit string, with bit i 1 in memory if and only if bit i is specified in B. If B i is specified, the specified value is v i by Corollary 1 and is thus readily available. Note from Definition 3 that the skip cube computation propagates from the inputs to the outputs of each gate, exactly as the value computation does. Accordingly, the code generator can allocate a (value, skip cube) pair for each wire in the original circuit, and the simulate_circuit function will contain code to compute both the value and the skip cube for each wire. In most cases, computing the skip cube of w t is a straightforward copy of the array storing the skip cube of one of the gate inputs. The cases pertaining to a primary input or an initial state variable are also trivial to compute. The cube intersection operations required in cases 1 and 5 of Definition 3 can be achieved by computing a bitwise OR of the bit strings for the respective gate input skip cubes. The max operation of case 1 is the only slower operation, consisting of selecting the skip cube with the fewer specified bits. We implement this step by performing a population count on the skip cube bit strings.
For example, consider the circuit of Figure 2 . This circuit has two latches s and r with initial states s 0 r 0 0. The table gives the skip cubes for all relevant wires with respect to the input vector v 011110100, where the bits of v from left to right respectively give the input values for x 0 y 0 z 0 x 1 y 1 z 1 x 2 y 2 and z 2 . The leftmost two columns give the wire name/time index and the bit value, respectively. The column labeled "skip cube" gives the skip cube for the wire. The rightmost two columns state the source of the skip cube and the rule from Definition 3 applied to obtain the skip cube. This example demonstrates the power of the skip cube technique. Suppose we wish to verify that latch r must be 0 at time 2, i.e., r 2 0. Observe that the skip cube for r 2 is A v´r 
Learning and Coverage
Upon simulating any vector v and finding that the value at time k of the verification wire f k v is false, the skip cube A v´f k µ that we have simultaneously computed gives us a set of vectors that also would have made f k v false. The search procedure should remember this skip cube to ensure that it will never again try any vectors in this cube, thereby pruning the search space analogously to the learning and non-chronologic backtracking of conventional SAT procedures. For example, if some input x 0 i always causes the verification wire to be false regardless of the other inputs, the first vector that we simulate with x 0 i true will generate a skip cube for f k v that shows this fact, and our search procedure will never try any other vectors with x 0 i true. Thus, the search procedure has effectively backtracked non-chronologically to the decision for x Fig. 2 . Skip Cube Computation Example. We assume an input vector v 011110100, which provides the values for the three inputs x, y, and z, over three time steps 0, 1, and 2. The table shows the skip cubes that are computed by our algorithm.
Our current implementation maintains a BDD which represents the covered set V of input vectors that have been either explicitly simulated already or else covered via skip cubes. After each simulation iteration, the resulting skip cube A v´f k µ is disjoined into V , and the next simulation vector is chosen randomly from the complement set V . Note that V is not directly related to the functionality of the circuit under verification, and thus the BDD for V does not necessarily blow up even when the BDD for the circuit would have exponential size. The algorithm completes when the entire space U is covered, or when an input sequence is discovered to make f k true.
The use of a BDD to store the covered set has several advantages. First, all the operations needed by the algorithm can be done efficiently: converting a cube to a BDD, disjoining a BDD for the cube into the BDD for the covered set, and retrieving a random vector not in the covered set. BDDs also provide an easy way to incorporate input don't-cares into the verification method, by initializing V to include all don't-care vectors. Most important is the fact that the ratio c S U can be computed in time linear in the number of BDD nodes. The coverage ratio c tells us the fraction of the search space that has been explored and eliminated by our algorithm. Periodically computing c allows the algorithm to communicate a progress metric to the user. Furthermore, in the event of a time-out or space-out, the coverage ratio provides an informative verification result and increases confidence that the property being verified holds. Having an accurate measure of progress and coverage greatly enhances the usability of verification tools, especially on challenging problems that can't be verified quickly or completely.
Experimental Results
We have implemented our algorithm to test its performance. The tool takes a sequential circuit in a slightly modified version of ISCAS89 format and outputs the simulator for that circuit as a C++ program. This translation step is virtually instantaneous. The simulator is then compiled and run to perform the verification. We report compile and run times for the simulator. All experiments were conducted on a PC with a 1.5Ghz Intel Pentium 4 processor and 1GB of RDRAM. Memory usage is not reported, as it was never significant. The operating system was Linux 2.4.9, and the compiler was g++ version 2.95.2 using the -O3 optimization level. The compiler missed an obvious peephole optimization (two adjacent addl instructions modifying the stack pointer), so we used a simple Perl script to perform this optimization, resulting in a performance improvement of a couple percent.
For comparison, we ran against a leading, free, non-commercial SAT solver for bounded model checking, Z-Chaff. Our experiments were conducted with version Z2001.2.17. We used our own translator from ISCAS89 format to CNF, but ignore the negligible translation time. We believe our translator produces CNF comparable to other bounded model checking tools. For example, the multiplier in Section 4.1 is closely modeled on the example presented by Biere et al. [1] , and Chaff is able to solve our generated CNF formulas slightly faster than the ones supplied by Biere. 
Original 16 ¢16 Multiplier Example
Our first example is a 16 ¢16-bit multiplier with 16-bit output. We designed this problem instance closely following the one reported in the original bounded model checking paper [1] . The specification verified waś
where done is asserted when the output register has converged to the final value, over f low is asserted if the product exceeds 16 bits, and out b and out ¼ b are the bth output bits of a reference combinational multiplier and the sequential multiplier under verification, respectively. Separate runs were performed for b 0 15, and the time bound used in each case was k b · 1. Table 1 gives the results for our tool and for Chaff. For semi-formal verification (rightmost three columns), our tool gives very high coverage extremely quickly. This illustrates the effectiveness of the skip cube propagation at quickly eliminating large parts of the search space. For complete, formal verification, our tool is competitive with Chaff up to bit 9, but then, surprisingly, the Chaff run times drop sharply. One normally expects output bit n 1 of an n ¢n multiplier to be the most difficult bit, but this curious behavior can be explained by the presence of over f low in the antecedent of the specification (Eq. 1) in conjunction with the time bound b · 1. Although the circuit correctly computes the values of all output bits for all input values, almost all input word pairs actually raise over f low, making the specification vacuously true. A SAT solver Table 2 . Full-Sized 16¢16 Multiplier Results with Specification (Eq. 2). "time" indicates timeout after 1 hour. When our tool times out, the attained coverage is indicated in parentheses. For bit 10, we actually ran Chaff to completion, which took over 17 hours. Our result for bit 10 with 0.9999 coverage is anomalous, taking slightly longer than full coverage. This might be explained by extra floating-point comparisons performed by our tool when a target coverage is specified, and we have also observed slightly different page fault behavior, but we are still investigating.
can propagate constraints backwards from the overflow flag, quickly pruning the circuit down to essentially an 8 ¢8 multiplier, making the high-order bits easy to verify.
Full-Size 16 ¢16 Multiplier Example
In the preceding multiplier example, the combinational reference multiplier is actually a full-size 16¢16 multiplier with 32-bit output. Similarly, the sequential multiplier would correctly compute all 32 output bits if the output register were wider. Accordingly, we removed the overflow logic from the preceding example, creating a true, full-sized 16 ¢16 multiplier, and verified the specification: Table 2 presents the results for this experiment. This problem is much more difficult than that of Section 4.1. Here, we observe our approach running about as fast as Chaff for the low-order bits, and beating Chaff for bits 8, 9, and 10. For the higher-order bits, both tools timeout, but the compiled simulator provides a high coverage while the SAT solver reveals no information.
SRT Divider Example
Our last experiment is the most difficult. We verify a 2n-bit by n-bit radix-2 SRT divider with redundant quotient representation [7] against a combinational divider. The specification asserts that if the SRT divisor is normalized, and if the the combinational divider does not overflow, then the two dividers produce the same result. In particular, we verify all bits of the quotient and remainder in a single run.
The results for n 4 8 are given in Table 3 , which clearly demonstrate that our approach is more effective than Chaff on this problem. For n 7, we find our approach to be almost 4 times faster than Chaff even when compilation time is included. For n 8, both tools time-out (again set at 1 hour), but our tool reports the coverage attained. 
Conclusion and Future Work
We have presented a novel approach to bounded model checking. Our search procedure has competitive performance with state-of-the-art SAT solvers on many problems. Intuition and experimental results suggest that SAT solvers have the advantage on smaller circuits and on circuits in which clever case-splitting heuristics can quickly establish unsatisfiability, whereas our new method has the advantage for larger circuits that aren't amenable to such attacks. Furthermore, our method continually provides coverage information, which is useful as a progress indicator for lengthy verification runs, and as a semi-formal verification result for runs that time out. Our work provides a valuable additional tool for model checking when other methods (e.g., BDDs, SAT) fail.
We believe our implementation could be substantially optimized. For example, our implementation generates C++, which introduced many inefficiencies. A production tool should generate the simulator machine code directly, bypassing the compiler, which is not tuned for the very large, simply structured simulate_circuit function that we generate. There is little need for global optimizations, so the code generation would be straightforward.
More algorithmic directions for further research are to explore various design tradeoffs. For example, we could compute more conservative approximations of the skip cubes using branchless code, which might run faster, but need more vectors. Alternatively, we could compute more accurate approximations of the skip sets, reducing the number of vectors needed, but slowing down the simulation as well. In some cases, it would be useful to shift between strategies, starting with skip cubes, for example, and then switching to an alternative if the skip cubes become too small.
From a theoretical perspective, we would like to understand what factors influence the rate of convergence of the coverage ratio. Intuitively, if there exists a vector that produces a large skip cube, it is plausible that many other vectors (such as the other vectors in the skip cube) would also generate a large skip cube, so large skip cubes would be covered early. If this intuition is true, one could estimate statistically the total run time based on the first few coverage ratios computed, which would further enhance the usability of the model checker.
