Abstract-The state explosion problem limits formal verification to small-or medium-sized sequential circuits partly because BDD sizes heavily depend on the number of variables dealt with. In the worst case, a BDD size grows exponentially with the number of variables. Thus reducing this number can possibly increase the verification capacity. In particular, this paper shows how sequential equivalence checking can be done in the sum state space.
I. INTRODUCTION

S
EQUENTIAL equivalence checking plays a crucial role in VLSI design to ensure functional correctness. It has been greatly advanced since symbolic techniques [4] were used in formal methods based on state space traversal. However, these formal methods cannot be scaled as easily with the increasing complexity of system designs due to the state explosion problem, which says that the state space grows exponentially in the number of state variables. Therefore recent research [3] , [11] has focused on reducing the number of state variables by retiming [13] , with the hope that verification can be conducted on the reduced circuits. Unlike these circuit-based transformations, this paper reduces the register count in the verification construction. Moreover, the verification itself is structure-independent, that is, neither circuit similarities nor register correspondences [6] are assumed.
In this paper, we reason about sequential equivalence based on the fact that two finite state machines (FSMs) are equivalent if and only if their initial states are equivalent. To identify equivalent states of an FSM, binary decision diagrams (BDDs) [2] were used in [16] , [14] , [8] for symbolic execution. The fixpoint computation in [16] , [14] is carried out on a product machine constructed over two identical copies of the FSM. As shown in [7] , when the product machine is constructed over two FSMs under comparison, the same computation can be used for sequential equivalence checking. In addition to the approach of [16] , [14] , the computation in [8] for equivalent state identification is done on the original FSM without constructing a product machine. However, an # -state FSM in [8] is represented by
BDDs. This representation may be expensive in practice. In contrast, we identify equivalent states by applying BDD-based functional decomposition [12] to keep the computation in the original FSM without any special representation. Since the computation is in a single FSM, we introduce the multiplexed machine to combine two FSMs into one. Thereby we can transform the sequential equivalence checking problem to the state equivalence problem of a multiplexed machine.
Our equivalence checking technique avoids state traversal, by partitioning the state space based on equivalence relations among states [10] . Rather than reason about the sequential equivalence in the product state space of two sequential machines under comparison, we achieve this attempt in the sum state space. Compared to product machine based verification, the proposed approach almost halves the number of state variables. More precisely, checking the equivalence of two . Hence, the sizes of BDDs in our verification technique could be much smaller than those in product machine based techniques.
Unlike previous verification techniques of [4] and [7] , the efficiency of our approach depends heavily on the encountered number of equivalence classes of states. Since each equivalence class is represented by a BDD node, our approach is limited to instances with less than e C g f h equivalence classes per output. Fortunately, it is applicable in most practical applications. On the other hand, because the number of equivalence classes in the reachable state subspace is invariant, our technique tends to be more robust than previous approaches in verifying different implementations of a design. For high-speed designs, registers are mostly added to reduce cycle time not to increase the number of equivalence classes. (For example, backward retiming cannot increase equivalence classes.) In such designs, our proposed technique should be preferable to those of [4] and [7] .
The contributions of this paper are as follows. We apply BDD-based functional decomposition to the identification of equivalent states. Two important consequences are the elimination of universal and existential quantifications, and the possible simplification with respect to the reachable state subspace. To extend the above computation for sequential equivalence checking, we introduce the multiplexed machine such that the verification can be done in the sum state space. In addition, several techniques are proposed to enhance the computational robustness; several properties are analyzed to contrast different verification techniques.
The remainder of this paper is organized as follows. Prelimi-naries and definitions are given in Section II. After introducing the technique for equivalent state identification in Section III, we present our equivalence checking algorithm in Section IV and analyze its properties in Section V. Experimental results are then given in Section VI, and conclusions in Section VII.
II. DEFINITIONS, NOTATIONS AND PRELIMINARIES
A. Equivalence Relations and Partitions
An equivalence relation is a binary relation on a set, 
B. Functional Decomposition
In this paper we adopt functional decomposition [17] for partitioning the state space to identify equivalent states and to verify sequential equivalence. In functional decomposition, variables of a Boolean function are divided into two disjoint subsets, the bound set and the free set. In BDDbased functional decomposition [12] , bound set variables are ordered above free set ones. A cutset of the BDD is the set of (downward) edges which cross the boundary defined by the bound set and free set variables. A node is called an equivalence node if there exists an edge, , directed to it. 
if and only if their corresponding paths in the BDD of ¦ lead to the same equivalence node.
III. IDENTIFICATION OF STATE EQUIVALENCE
To find a minimum state FSM, equivalent to a given one, equivalent states are identified. Since each state in an equivalence class (of reachable states) can represent the entire class, the number of states of the minimum state FSM equals the number of the equivalence classes of the original FSM. This section proposes a more direct, in the sense that we deal with equivalence classes instead of equivalence relations, approach than those of [16] , [14] to locate equivalent states. Given an FSM, we show that BDD-based functional decomposition can be exploited to extract equivalence classes of states.
Our approach seems conceptually similar to that in [8] . Paths from the root to an equivalence node are states in a corresponding equivalence class. At this point we can ignore the functions represented by these equivalence nodes. That is, we can get rid of the BDD structures below these nodes. By re-encoding these nodes using alphabet°, (introducing 
is the equivalence relation of p % p '
, the proof follows.
(º ) From
, we obtain . On the other hand, although IDES3 keeps all the characteristic functions along iterations, it has maximal flexibility to arrange the combination of them to reduce peak memory consumption.
C. Robust Equivalent State Identification
The limitations of equivalent state identification using BDDbased functional decomposition result from the explicit representation of equivalence classes and the restricted BDD variable ordering. In this section we propose some possible techniques to reduce BDD sizes.
Using any underestimated unreachable states as the don't care set, we can assign each such unreachable state to any equivalence class of reachable states. This flexibility enables the simplification of characteristic functions. However, because these algorithms use the number of equivalence classes to decide fixpoints, the number of equivalence classes with solely unreachable states should be kept as a constant during the iterations. (Note that if unreachable states are not used as don't cares, there is no such restriction.) Otherwise, we have to complicate the fixpoint condition by testing if an equivalence class is contained in the don't care set. Claim 1 shows BDD constrain [5] is a good simplification operator satisfying this requirement. On the contrary, BDD restrict [5] violates it. However, a BDD restrict followed by a constrain is a good operation. eliminates all equivalence nodes whose corresponding equivalence classes are contained in the don't care set, and preserves all other equivalence nodes.
Proof: Since BDD structures below equivalence nodes are irrelevant, we can think of to be another function
± is the set of equivalence nodes. As constrain
has its range equal to the image
, equivalence nodes not in this image disappear from the range and those in this image remain in the range. (On the other hand, the restrict operator could increase . Although equivalence nodes in the original image are kept, some with solely unreachable states might exist.)
To reduce the impact of the restricted BDD variable ordering, we can use the following strategy. Within the allowed threshold of BDD size, find the variable ordering such that the lowest state variable is as high as possible. Treat this variable and those above it as bound set variables; all others are the free set. Then compact the BDD such that every node under the cutset is an equivalence node. Work on the new smaller BDD, and apply variable re-ordering to it based on the same strategy, incrementally throwing away unnecessary variables. On the other hand, since this ordering restriction emerges only from functional decomposition, arbitrary ordering can be used in other BDD manipulations. This restricted ordering is needed only when counting the number of equivalence classes and in constraining BDD with respect to the reachable state subspace.
Directly building a single hyper-function of a set of (binary) functions
may be impractical. Fortunately, this can be avoided by computing equivalence classes incrementally. For instance, first perform functional decomposition on § %
. For each resultant equivalence class, use it as the care set and others as the don't care set. Hence there is a greater chance to build a hyper-function for the simplified functions of a '@ g s t s t s g @ y .
(If it fails, we can deepen the recursion level to extract more don't cares.) Conducting functional decomposition on it, the equivalence classes in the care set are encoded using new binary functions. In this way, BDD sizes are kept small. This approach trades time for memory.
We can also explore flexibility to reduce a partition before using it to compute a new partition. Given two partitions p % and p '
, we say any 
) and 
In the light of Theorem 3, an algorithm can be implemented by modifying IDES2 and IDES3 as follows. Keep a set of characteristic functions to represent the overall partition. Compute new partitions based only on an essential partition, which consists of equivalence classes that refine the previous overall partition. In this manner, the BDD size is kept small and the iterative computation is sped up.
IV. VERIFICATION OF SEQUENTIAL EQUIVALENCE
The proposed technique can be applied for sequential verification. The following two propositions form the basis of our equivalence checking. The first states a property that two equivalent FSMs must have. Based on Proposition 4, we can extend the identification of state equivalence to sequential equivalence checking. In order to pose the problem of verification as the identification of state equivalence, the multiplexed machine is introduced.
A. The Multiplexed Machine
To check equivalence between two FSMs . This pair is then multiplexed before being fed to a register, whose output is then demultiplexed to recover the current state variables for if they are equivalent. In the verification, we can imagine that aux is in a superposition status, possessing values 0 and 1 simultaneously. (Note that, without changing its functionality, the multiplexed machine can be simplified by omitting the demultiplexers. That is, replacing each demultiplexer, we directly connect its input to outputs. Also it is worth mentioning that choosing any subset of the next state variables of $ & % to be paired is valid. Suppose, in the extreme case, we choose an empty subset. Then aux and the multiplexers for outputs are unnecessary. The multiplexed machine, therefore, degenerates into two separate machines. The corresponding verification is discussed in Section IV-E.) . By iterative refinement of the state space as in the identification of state equivalence, equivalence classes of states for
B. Algorithm for Sequential Equivalence Checking
can be derived whenever the fixpoint has been reached. According to Lemma 5, both conditions are checked. However, the first condition implies that we need to know reachable states of both $ % and $ '
. Fortunately, the first condition is redundant, i.e. as long as the second condition is satisfied, so is the first. This property is stated in Theorem 4. As a result, reachability analysis can be completely eliminated. Further, rather than checking that the condition of Theorem 4 is satisfied in the overall partition of the state space, validity can be verified on the new partition at each iteration. The correctness of this variant is based on Proposition 1. As the BDD representation of the current partition is obtained, it is of linear time complexity in the number of state variables to test if two initial states are within the same equivalence class. Consequently this checking can be done efficiently in each iteration. Figure 6 outlines the overall procedure for sequential equivalence checking.
Remark: In theory 
C. Robust Sequential Equivalence Checking
To make the verification procedure more robust, the techniques and restrictions listed in Section III-C are also applicable here. Instead of repeating them, this section is concerned with those that are particular to verification.
Verifying each primary output and/or characteristic function separately could substantially reduce the number of encountered equivalence classes. The numbers of equivalence classes induced by individual primary outputs may be exponentially smaller than those induced by all of the primary outputs. The correctness of this separation is inferred from Lemma 2. It is interesting to notice that the cone of inference reduction has been automatically taken care of due to this separation, i.e., irrelevant state variables with respect to the considered primary output disappear.
Although reachability analysis is unnecessary, any underestimation of unreachable states of can be used as a don't care set to simplify BDD expressions and to reduce unnecessary state refinements. Theorem 5 shows the correctness of such simplification and the maximal don't care set for the multiplexed machine. However, as mentioned in Section III-C, the fixpoint condition should be preserved to ensure the algorithm terminates. 
Besides don't care simplification, the partitioned state space can be reduced further according to the following theorem. 
state variables respectively. Then by verifying each output separately, the total number of variables in our verification is at most
. In the construction of the multiplexed machine, a multiplexer, selecting state variables, pairs a state variable from $ % with any unpaired one from $ '
. Since this pairing is arbitrary (and thus can be adaptively changed on-the-fly), an optimization problem is to maximize the BDD sharing between $ % and $ '
, and to simplify the consequent BDD manipulations. Heuristics can be derived based on the cone of inference reduction and functional similarity. The former pairs two state variables which are supports of two similar sets of primary outputs; the latter pairs two state variables with similar transition functionalities. In the extreme case, when comparing two identical copies of an FSM, we can possibly reduce the BDD such that it is as if these is only one machine. above the cutset. Notice that, although the number of state variables in this case is the same as for the product machine, the verification is still in the sum state space.
D. Error Tracing and Shortest Distinguishing Sequence
F. State Space Partitioning on Product Machine
Verification by state space partitioning also works for the product machine as well. It can be done by slight modifications of [16] , [14] , previously known as the backward state traversal [7] . We refer to it as state space partitioning on the product machine.
When compared to state space partitioning on the multiplexed machine, this approach has more flexibility in BDD variable ordering. However, this flexibility prevents simplification by the restrict or constrain operator with respect to the reachable states because this might corrupt the represented equivalence relation.
V. ANALYSIS
This section consists of two parts. First, some verification properties, independent of the implementation of a design, are analyzed. Second, we discuss circuit implementation related effects on the sequential equivalence checking problem.
A. Implementation-Independent Aspects
Given an FSM taking a total of # iterations in state space partitioning, its partition structure is defined as an ordered sequence í î Ù q T î %@î 9 '@ t s g s t s g @î ï b
, where î v denotes the accumulated number of equivalence classes at the ¤ iteration. Thus
, and respectively. Notice that u w ù may not satisfy reflexive and symmetric laws. Nevertheless, the transitive law holds for the ordered pairs of states. Since the transitive law is maintained during the fixpoint computation, it is clear that once one machine converges, so does the product machine. On the other hand, this state partitioning procedure does not refine the state subspace
. Hence it could converge in less than
steps. . State space partitioning on the combined machine has no effect on the partition of the state subspace spanned by any individual FSM. Once each subspace of $ % and $ '
has reached a fixpoint in state partitioning, so has the space of their combined machine. Therefore the combined machine converges in exactly
steps. When the state space is reduced by Theorem 6 in each iteration, the fixpoint computation does not refine the state subspace spanned by the collapsed equivalence classes. The state space is partitioned in the same way as that of the product machine. Hence the multiplexed and product machines converge in the same step in state space partitioning.
In contrast, for state traversal of an FSM, although we can similarly define a traversal structure to be the sequence of numbers of reached states, we can not use it as a signature. Moreover, even if the traversal depths for two FSMs are known, they merely provide a lower bound on the depth of the product machine. No strong argument like Theorems 8 and 9 is possible.
The following theorem shows the connection between the number of refinements in state partitioning and the depth of state traversal. 
B. Implementation-Dependent Aspects
Retiming [13] is an important technique in sequential circuit optimization. There are two types of atomic moves in retiming, namely forward (from inputs to outputs) moves and backward (from outputs to inputs) moves across functional blocks. 
Proof:
The theorem follows from Proposition 5. Similar arguments of Theorems 11 and 12 were used in [19] for the discussion of the validity of retiming.
VI. EXPERIMENTAL RESULTS
Using the VIS [1] environment, we compared three equivalence checking techniques, namely, STPM -state traversal on the product machine, SPPM -state partitioning on the product machine, and SPMM-state partitioning on the multiplexed machine. The experiments were conducted on a Linux machine with a Pentium III XEON 700 MHz CPU and 2 Gb of RAM.
For STPM and SPPM the VIS sequential verification command is used. Dynamic variable reordering is turned on and the hybrid method [15] , considered the state-of-the-art technique for image computation, is used. For SPMM, variable reordering is enabled when appropriate.
To demonstrate the relative power of the three techniques, we first compare a set of benchmark circuits against themselves. (Although combinational checking suffices in this circumstance, we are only interested in sequential methods.) In general, combinational equivalence checking should be tried in situations where there is structural similarity. The techniques of this paper aim at situations where there is no such similarity. The self-comparison benchmarks are used to compare the methods on a large set of examples. Care is taken not to exploit similarity by using a method for pairing state variables which considers only the cones of inference of the primary outputs. To further emphasize that no similarity is being exploited, a second set experiments is done comparing circuits against their retimed versions.
An argument why self-comparison is sufficient for the experiments is Proposition 3, which states that two different implementations, Tables III and IV , we provide the characteristics of the benchmark circuits in Tables I and II. Table I gives the profiles of the selected benchmarks from ISCAS89, LGSYNTH91, TEXAS97, VIS and TEXAS. Columns 2, 3 and 4 indicate the number of inputs, outputs and registers respectively. In addition, the number of reachable states and the corresponding traversal depth are provided in Column 5. (Here we reset uninitialized state variables to zero.) Also the information of equivalence classes is included in Table II . As mentioned in Section IV-C, we can verify sequential equivalence by examining each primary output separately instead of treating them as a whole. The advantage is that we can reduce the peak memory requirements recording encountered equivalence classes. To provide strong evidence, Table II Circuit s991 is an example where separating verification tasks for each output makes a substantial reduction in the number of encountered equivalence classes. In the extreme case, the number of equivalence classes induced by all outputs can be exponentially (in the number of outputs) larger than those induced by individual outputs. Usually the separation of verification tasks lengthens the required refinement. However, as BDD manipulations could be simplified substantially, the run time can still be reduced in most cases. Further, within each part we compare the number (in the column marked whole) of equivalence classes in the whole state space to the number (in the column marked reach) of equivalence classes in the reachable subspace. As can be seen, in most instances this subset is fairly small when compared to the entire space.
Since SPMM directly benefits from these reductions, it can easily verify some large instances which are unverifiable for STPM and SPPM as indicated in Tables III and IV, where the results for SPPM and STPM report the best of verifying combined outputs and verifying each output separately. From experience, SPPM has better results in verifying combined outputs for most circuits while SPMM has the opposite results. This might be explained by the fact that the performance of SPPM is not directly related to the encountered number of equivalence classes while that of SPMM is.
From the experiment in Table III we observe that, for SPMM, using a monolithic BDD as a characteristic function suffices for all verifiable benchmarks. The only exception is sbc, where an array of characteristic functions need to be maintained. Because using multiple characteristic functions usually complicates the fixpoint computation, it is in general more time consuming. Also we find that SPMM takes longer time than STPM and SPPM for circuits, such as s382, s420.1, etc., with numerous equivalence classes and deep refining processes. It is understandable because SPMM enumerates each equivalence class in every refining process.
For circuits like s420.1, where the depths of traversal and refinement are both exponential in the size of inputs, none of the three techniques is competent. However, for s420.1, since the depth of refinement is half of that of traversal, SPPM is about twice as fast as STPM. Notice that, as analyzed in Section V-A, although the product machine has traversal depth 65535 (due to self-comparison), we can conclude the equivalence by traversing states at the 32768 step even before the fixpoint is reached.
For cbp and minmax series of circuits, where depths are shallow, STPM and SPPM performs much better than SPMM, which needs to take care of numerous equivalence classes as listed in Table II . On the other hand, for minmax circuits, as discussed in [7] , SPPM has a polynomial complexity in input sizes while STPM has an exponential one. In comparison, SPPM is the best choice for these cases.
Circuits key and bigkey are another extreme, which has a few equivalence classes. SPMM verifies them quite easily while both STPM and SPPM fail. In general for control logic SPMM performs much better than the other two. Microprocessor 8085 is an example, where SPMM verifies all the outputs except the sixteen for the address bus. (The results of 8085 in Tables II and III exclude these unverifiable outputs.) Other examples are control, IFetchControl2 and IFetchControl3. On the other hand, due to the large number of outputs in IFetchControl2, IFetchControl3, clma, sbc, etc., SPMM takes a long time to verify them because it processes each output once at a time. Fortunately, these tasks can be parallelly verified to minimize the total completion time. In Table IV , the equivalence between a circuit and its retimed implementation is checked. Retimed circuits were obtained by using SIS [18] , except for TEXAS benchmarks, s641-retime and tbk-retime. Other circuits, which are included Table III but absent from Table IV , either take too long for SIS to retime, or have incompatible initial states, created by the retiming. Table IV suggests that SPMM does not benefit particularly when self-comparison is done. (This is due to the fact that state variables are paired only by cone of inference of outputs. Otherwise, corresponding state variables are avoided to be paired together. Doing so destroys BDD sharing in the experiments of self-comparison.) This supports that the results of Table III are relevant for comparing the three methods. Also observe from Table IV that SPMM is relatively stable when moving from self-comparison to comparing against retimed versions. For example, for s526 and s526n, the results in Tables III and IV are similar for SPMM but STPM and SPPM yield substantial variances. The stability of SPMM derives from the fact that it depends mainly on the maximum number of registers in the two designs plus the number of equivalence classes encountered. Tables III and IV is shown in Table V , where the second and the third columns denote the numbers of wins in terms of smaller memory and time usage respectively, and the last gives the number of examples on which the method failed. This analysis indicates that SPMM is on average more efficient and more rugged than the other two methods. We did not experiment with the equivalence checking between inequivalent circuits. However the expectation is that, according to Theorem 10, all of the three verification techniques can report the non-equivalence in the same iteration, say in the . This difference results from the fact that, in SPMM, the input information of the previous iterations is thrown away when equivalence nodes are re-expressed using newly introduced variables.
Another view of
To summarize the results, the major limitation of SPMM is the encountered number of equivalence classes during verification. In contrast, STPM and SPPM do not suffer the same limitation because equivalence classes are not explicitly represented in the BDDs. For a circuit with a not-so-deep depth of refinement and a "reasonable" number T q e C f h g b of equivalence classes per output, SPMM has a great chance of verifying it. On the other hand, due to the fact that the number of equivalence classes in the reachable state subspace is invariant under different implementations, SPMM tends to be the most robust verification technique.
VII. CONCLUSIONS
This paper consists of two parts: the identification of equivalent states and the verification of sequential equivalence. We show that the former can be done efficiently by BDD-based functional decomposition. By introducing the multiplexed machine, we can verify sequential equivalence by means of state partitioning in the sum space, a new possibility to do formal equivalence checking. In high speed designs, a great portion of registers are for timing speed-up rather than increasing the number of equivalence classes of states. In such cases, state space partitioning would become preferable to state space traversal.
A major advantage of the new verification technique is the substantial reduction in the number of state variables.
Compared to product machine based techniques, our approach almost halves the number of state variables. Although there is an intrinsic restriction on the BDD variable ordering, to overcome it and minimize the BDD sizes, several techniques are proposed. These make our algorithm even more promising.
ACKNOWLEDGMENTS
The presentation of this paper was greatly strengthened through anonymous reviewers' comments.
