Bounded Model Checking (BMC) searches for counterexamples to a property φ with a bounded length k. If no such counterexample is found, k is increased. This process terminates when k exceeds the completeness threshold CT (i.e., k is sufficiently large to ensure that no counterexample exists) or when the SAT procedure exceeds its time or memory bounds. However, the completeness threshold is too large for most practical instances or too hard to compute. Hardware designers often modify their designs for better verification and testing results. This paper presents an automated technique based on cut-point insertion to obtain an over-approximation of the model that 1) preserves safety properties and 2) has a CT which is small enough to actually prove φ using BMC. The algorithm uses proof-based abstraction refinement to remove spurious counterexamples.
Introduction
In the hardware industry, formal verification is well established. Introduced in 1981, Model Checking [10, 12] is one of the most commonly used formal verification techniques in a commercial setting. However, it suffers from the state explosion problem. In case of BDD-based symbolic model checking this problem manifests itself in the form of unmanageably large BDDs [7] .
This problem is partly addressed by a formal verification technique called Bounded Model Checking (BMC) [6] , introduced by Biere and others. In BMC, the transition relation for a complex model M and its specification φ are jointly unwound up to a depth k to obtain a formula, which is then checked for satisfiability using a propositional SAT procedure such as Chaff [25] . In the case that φ is a safety property, the formula is satisfiable iff there exists a counterexample of length k, i.e., M |= k φ. If not so, k is increased to search for longer counterexamples. This process terminates either if the SAT procedure exceeds its time or memory bounds, a counterexample is found, or k exceeds a completeness threshold CT [17] . In the later case, k is sufficiently large to ensure that no counterexample exists, and thus, we conclude M |= φ. BMC has been used successfully to find subtle errors in very large industrial circuits [27, 14] .
The disadvantage of BMC is that it is typically only applicable for refutation; the best known completeness threshold for properties of type Gp is the reachability diameter of M , i.e., the longest shortest path from any initial state to any reachable state in the state graph. In practice, the diameter is usually too hard to compute, and furthermore, is often exponential in the number of state variables in the model. The recurrence diameter [6] is an over-approximation of the reachability diameter. However, it is still difficult to compute and typically much larger than the reachability diameter.
Thus, in practice, the principal method for proving safety properties is abstraction. Abstraction techniques reduce the state space by mapping the set of states of the actual, concrete system to an abstract, and smaller, set of states in a way that preserves the relevant behaviors of the system.
In the hardware domain, the most commonly used abstraction technique is localization reduction [19, 28, 8] . The abstract modelM is created from the given circuit by removing a large number of latches together with the logic required to compute their next state. The latches that are removed are called the invisible latches. The latches remaining in the abstract model are called visible latches. The initial abstract model is created by making the latches present in the property as visible, and the rest as invisible.
The abstract model is then passed to a model checker, typically BDDbased, such as SMV. Localization reduction is a conservative over-approximation of the original circuit for reachability properties. This implies that if the abstraction satisfies the property, the property also holds on the original circuit. The drawback of the conservative abstraction is that when model checking of the abstraction fails, it may produce a counterexample that does not correspond to any concrete counterexample. This is called a spurious counterexample.
In order to determine if the counterexample can be simulated on the concrete model, a Bounded Model Checking instance is typically formed: the concrete transition relation for the design and the given property are jointly unwound to obtain a Boolean formula. The number of unwinding steps is given by the length of the abstract counterexample. The Boolean formula is then checked for satisfiability using a SAT procedure [28] . The transitions in the abstract trace are sometimes added to reduce the search space. The disadvantage is that other counterexamples of the same length may only be detected with additional refinement. If the instance is satisfiable, the counterexample is real and the algorithm terminates. If the instance is unsatisfiable, the abstract counterexample is spurious, and abstraction refinement has to be performed.
The basic idea of the abstraction refinement technique is to create a new abstract model which contains more detail (e.g., more visible latches) in order to prevent the spurious counterexample. This process is iterated until the property is either proved or disproved. There are numerous methods to refine the abstraction. If the abstract counterexample is used for refinement, the process is known as the Counterexample Guided Abstraction Refinement framework, or CEGAR for short [19, 2, 9, 15, 28] .
Thus, successful application of abstraction refinement with localization reduction usually requires three components:
(i) A BDD-based model checker that has enough capacity for the abstract model,
(ii) a Bounded Model Checker with enough capacity to perform the simulation of the abstract trace, (iii) a way to refine the abstraction in case the simulation fails.
In practice, despite of the abstraction, the first step often turns out to be the bottleneck, especially if the property depends on many latches. This paper proposes the use of a technique commonly applied by many hardware engineers in an informal and manual setting: If a design is too complex for either simulation or verification, engineers cut or partition the circuit. Formally, this corresponds to removing parts of the circuit and replacing the missing signals by non-deterministically chosen inputs. Such cut-points do not necessarily remove latches, and also may preserve logic dependent only on latches that were removed. The resulting circuit is an over-approximation of the original circuit with respect to safety properties.
Contribution
This paper proposes to use cut-point insertion [18] in order to compute an abstract modelM with two features: 1)M over-approximates M , and thus, safety properties are preserved, and 2) we can syntactically (and thus, efficiently) identify a completeness threshold CT that is small enough to allow BMC with bound CT . Thus, if no counterexample is found, we can conclude M |= φ. If a counterexample is found, we check if it is spurious. If so, the cut-points are refined in order to eliminate the spurious trace. Similar to [22] , we use the proof of unsatisfiability of the failed simulation run for refinement.
We therefore can omit the BDD-based model checker in the abstraction refinement loop, and rely on BMC as the only reasoning engine. This allows proving many properties with BMC only.
Related Work
Baumgartner et al. [5] perform a structural analysis similar to the one used for this paper in order to obtain a completeness threshold. In contrast to the algorithm proposed in this paper, an abstraction of the circuit in order to obtain a smaller completeness threshold is not applied. The results are extended in [4] .
The concept of the completeness threshold for BMC was introduced in [17] . A completeness threshold for arbitrary LTL properties is given in [11] . Optimizations to the diameter test that take the predicates in the property into account are given in [1] .
Another popular technique to obtain a complete version of BMC is to use BMC to prove an inductive invariant [26] . The technique uses constraints to enforce simple (i.e., loop free) paths that are similar to the constraints used to perform recurrence diameter tests.
Somenzi et al. [20] use such constraints to obtain a complete BMC to be used on an abstract model in an abstraction refinement framework. As noted in [20] , the depth that has to be searched using BMC can be exponentially larger than the reachability diameter.
Numerous methods have been proposed to refine an abstraction done by localization reduction. In [13] , Clarke et al. propose the use of ILP solvers and machine learning techniques to choose a suitable set of latches for the abstract model. Details on how to improve the simulation step beyond the basic BMC instance are given in [3] .
In [8] , Chauhan et al. propose to analyze the conflict graph of the failed BMC run to obtain refinement information. A similar approach is used by McMillan [22] : the unsatisfiable core of the failed BMC run is analyzed to obtain the new set of latches used for localization reduction. The abstract model is verified using BDDs.
The first complete model checking approach based on SAT without any abstraction is presented by McMillan in [21] . A SAT solver is modified to perform pre-image computation. The approach enumerates states in the preimage. Explicit state enumeration is avoided with an enlargement of the assignment which is derived from the conflict graph.
In [23] , McMillan presents the use of interpolants in order to obtain a complete model checker based on a BMC-like reasoning engine.
Outline
In section 2, we provide background information about bounded model checking, the completeness threshold, localization reduction, and automatic abstraction refinement. We describe the abstraction we apply in section 3. Experimental results are reported in section 4.
Background

The Completeness Threshold and the Diameter
Let M denote a finite transition system defined by a finite set of states S, a set of initial states I ⊆ S, and a transition relation R ⊆ S × S. By M |= φ we denote that any computation of M satisfies the property φ, and by M |= k φ we denote that all computations of length k or less do not violate φ. [17] , denoted by CT , for a finite transition system M and a property φ, is any natural number such that if there is no computation of length CT that violates φ, φ holds for any computation done by M :
Definition 2.1 The Completeness Threshold
then the smallest such CT is 0, and otherwise it is the length of the shortest counterexample. Thus, computing the smallest CT is as hard as determining if M |= φ holds. In practice, one therefore aims at computing over-approximations of the smallest CT .
Definition 2.2
The Diameter of a finite transition system M , denoted by d(M ), is the length of the longest shortest path (defined by its number of edges) between any two reachable states of M .
Definition 2.3
The Initialized Diameter of a finite transition system M , denoted by d I (M ), is the length of the longest shortest path from any initial state to any reachable state of M .
It was already observed in [6] that d(M ) is a sufficiently large bound to prove properties of the form AGp. This bound can be improved by using the initialized diameter d I (M ). A bound for properties of the form AFp was identified in [17] . A method to compute a CT for arbitrary LTL properties is found in [11] .
Computing the Diameter
Testing if a particular k is the diameter corresponds to a QBF instance. Despite of the progress QBF solvers made, attempts to solve such instances have failed so far. Biere et al. suggested in [6] the use of SAT to compute the recurrence diameter, which is an over-approximation of the diameter. However, for most interesting circuits, the recurrence diameter is either too large or too hard to compute. Mneimneh and Sakallah [24] modify a SAT solver to compute the diameter by path enumeration.
Over-Approximating the Diameter with Structural Analysis
Model checking is frequently applied to circuits, which are typically given as a net-list. Baumgartner et al. [5] suggest to exploit the structure of these net-lists in order to compute an over-approximation of the diameter.
Definition 2.4
A Net-list is a directed graph (V, E, T ), where V is a finite set of vertices, E ⊆ V × V is the set of edges, and T (v) is the type of the vertex v ∈ V . The type is one of and (AND-gate), inv (inverter), reg (register), inp (primary input). The in-degree of vertices of type and is at least one, of type inv and reg exactly one, and of type inp exactly zero.
Notation
Given two vertices v 1 and v 2 , we write
We write v 1
and there is a path from v 1 to v 2 in E that only goes through vertices (gates) of type {and, inv}. We require any such path to be acyclic, i.e., the logic between the latches must be combinational.
The definition of semantics for such a net-list is straight-forward. The conversion of circuits given in Verilog to such a net-list corresponds to synthesis.
Definition 2.5
The Latch Dependency Graph (LDG) of a net-list N = (V , E , T ) is a directed graph (V, E), where V = {v ∈ V | T (v) = reg} is the set of latches in N , and there is an edge between two latches v 1 and v 2 in the LDG iff there is a path from v 1 to v 2 in N that only uses gates, i.e., v 1
Definition 2.6 A Component inside a circuit is a connected subgraph of the LDG. The Component Graph is the graph generated by replacing each component by a single vertex.
We denote the bound we derive for the diameter of a component C by ∆(C). Obviously, 2 k is such a bound if k is the number of latches in C.
. . .
. . . Fig. 1 . Sequential composition of two components. A bound for the diameter of the composition is the sum of the individual diameters.
In [5] , bounds for the diameter for various types of components are derived that are based on the structure of the component, e.g., for ROMs, constant latches, and acyclic components. In particular, it is observed that the sum of the bounds of the diameters of two components that are composed sequentially is a bound for the composition: Theorem 2.7 Let C 1 and C 2 be two components, and ∆(C 1 ) and ∆(C 2 ) be bounds for the diameter of C 1 and C 2 , respectively. The sum of the two bounds is a bound for the diameter of the sequential composition C 1 → C 2 ( Figure 1 ):
Abstraction via Cut-Point Insertion
Cut-Point Insertion corresponds to replacing a signal in the net-list by a new primary input [18] . The resulting circuitM is an over-approximation of the original circuit M , and a conservative abstraction for reachability properties. As already noted in [4] , the completeness threshold ofM is not a completeness threshold for M ; the abstract circuit typically has a much smaller diameter. The diameter never increases by cut-point insertion. Figure 2 shows an overview of the technique used in this paper. The algorithm follows the proof-based abstraction refinement loop used in [22] . We use cutpoint insertion as described in section 2.3 as the abstraction technique. As initial abstraction, we insert cut-point such that all cycles in the net-list of the abstract model are eliminated. This results in a very small completeness threshold.
A Complete BMC with Over-Approximation
Overview
In contrast to most related papers that implement abstraction refinement, we do not use a BDD-based model checker to verify the abstract modelM . Instead, we compute a completeness threshold CT ofM . This is described in detail in section 3.2. We then perform BMC onM with bound CT . If the property holds onM , we can conclude it also holds on M , and the algorithm terminates. Otherwise, we obtain an abstract counterexample from the BMC run. The loop then proceeds as in the related work. The refinement step is slightly different and described in section 3.3.
Computing CT in the Presence of Cycles
We extend the results introduced in [5] in order to obtain a completeness threshold for a larger class of designs. The main issue for the diameter overapproximation are cycles in the latch dependency graph. For cycle-free components, the most important results are summarized in section 2.2.
Thus, consider a circuit with cycles in the latch dependency graph. Such cycles are very common and typically arise from counters, or from forwarding logic in pipelined circuits. In order to over-approximate the diameter of such circuits, we define the concept of the weighted component graph.
Definition 3.1
The weighted component graph is a component graph (as in definition 2.6) in which a weight ω is assigned to each edge. We write C 1 → ω C 2 iff there is an edge from C 1 to C 2 with weight ω. Let V 1 denote the set of latches in C 1 such that there is a path to a latch in C 2 in the LDG. Let V 2 denote these latches in C 2 . The weight corresponds to the number of signals that connect V 1 and V 2 .
As a special case, consider a circuit I with a diameter ∆ I . We assume that the circuit can be represented by a pipeline with n := ∆ I stages. Now add a single-bit feedback loop (Figure 3) , which forms circuit O. The signal that forms the feedback loop is computed in the last stage of I and used as input for the first stage of I. There are arbitrary connections from stage i to stage i + 1, but no other connections are permitted. We provide a proof of claim 3.2 in the appendix. This result can be generalized by eliminating the outer cycles bit by bit. 
The proof is done by re-arranging the components such that the desired edge represents the back-cycles and then by applying Claim 3.2 k times. If the cycle has more than one component, it is desirable to break the cycle on an edge with minimal weight, as the bound is exponential in the weight of the edge. This is depicted in Figure 5 : The cycle can be removed using either edge. Figure 6 shows two cycles that share a component. Such a cycle cannot be removed with the method above. The diameter is approximated using the number of latches in all components. Fig. 5 . The cycle can be broken using either the edge with weight k or j. The edge with minimal weight should be chosen.
Fig . 6 . Two cycles sharing a vertex C2. The cycles cannot be removed.
Refining the Abstraction
If a spurious counterexample is detected, we obtain the unsatisfiable core of the BMC instance used for the simulation. Similar to in [22] , we identify the signals that are in this core (in [22] , the latches are identified). We refine the cut-points by removing those cut-points that correspond to a signal found in the core. We do not introduce new cut-points.
Experimental Results
We have implemented the algorithm described in section 3. We make our implementation available for experimentation by other researchers. We apply the algorithm to various circuits already used in [16] to determine its effectiveness. The benchmarks are taken from an implementation of an out-of-order RISC microprocessor with Tomasulo scheduler. We compare the performance of the new algorithm with the performance of plain Bounded Model Checking. All experiments are performed on an Intel Xenon machine with 2.5 GHz running Linux.
Bounded Model Checking is used for refutation only, i.e., it cannot conclude that there is no error trace. Instead, it checks the property up to a given number of cycles. In [16] , the property checked was consistency with a C program. We check safety properties instead, which is easier. Table 1 summarizes the experimental results. A short description of each circuit can be found in [16] .
In conclusion, traditional BMC typically outperforms the new algorithm if the property is to be refuted. This is to be expected, as refutation is done using a regular BMC instance in the refinement loop. However, the experiments also show the benefit of the technique if the property is to be shown. In many cases, the refinement loop can show the property with a small bound.
Conclusion
We present an abstraction refinement loop that solely relies on BMC as its only reasoning engine. We use cut-point insertion in order to obtain an abstract model with a small completeness threshold. The completeness threshold is over-approximated with a structural analysis that permits cyclic circuits. If the abstraction is too coarse, cut-points are removed, which results in fewer spurious behavior but also a larger completeness threshold. Our preliminary experimental results show that the technique performs well on circuits that implement a pipeline. We make our implementation available for experimentation by other researchers. 
Future Work
The algorithm presented in this paper requires severe restrictions of the shape of the latch dependency graph of the circuit. As future work, we plan to extend the algorithm that computes CT in order to allow arbitrary latch dependency graphs. We also plan to improve the refinement algorithm such that cut-points are also added, not only removed.
Thus, the data value x ∈ D i in stage i will produce γ i (x) as feedback bit once it arrives in the last stage. γ n−1 (P n−1 (t)) = γ i (P i (t − j + i))
The proof of this lemma is easily done by induction on n − 1 − i. Let ι(t) denote the value of the primary inputs in cycle t. The value computed for the first stage depends only on the feedback bit and these primary inputs. Let the value computed for the first stage be denoted by f 0 (ι, γ) ∈ D 0 . Lemma A. 3 The value in any stage at time t ≥ n only depends on primary inputs and on the color of the same stage n cycles earlier.
Proof. First, observe that P i (t) with t ≥ n only depends on the value of primary inputs during cycle t − i − 1 and on the value of the feedback bit at time t − i − 1:
By expanding the definition in Eq. A.2, we obtain:
• f 0 (ι(t − i − 1), γ n−1 (P n−1 (t − i − 1))) (A.5)
We can use Lemma A.2 with j = n − 1 to rewrite Eq. A.5 and obtain:
The color of stage i at time t − n is γ i (P i (t − n)), which concludes the claim. 2 We now show the main claim 3.2.
Proof. [Claim 3.2] We show that we can bring the pipeline into any reachable state s within 2 · n clock cycles or less.
If s is reachable, there must be a path s 0 , . . . , s t from an initial state s 0 to state s = s t . Let t denote the length of the path. If t ≤ 2 · n, there is nothing to show.
Otherwise, we bring the circuit into state s as follows: (1) We start with the same initial state s 0 . (2) In the next n cycles, by picking appropriate primary inputs, we bring the pipeline into a state such that the colors at time n match the colors in state s t−n . Such primary inputs exist, or otherwise, s t−n is not reachable. (3) In cycles n to 2n−1, we bring the pipeline into the desired state by simply re-playing the primary inputs used to obtain s t−n+1 , . . . , s t . This is sufficient according to lemma A.3.
2
