Abstract-We propose a methodology for Boolean matching under permutations of inputs and outputs (PP-equivalence checking problem) -a key step in incremental logic design that identifies large sections of a netlist that are not affected by a change in specifications. Finding reusable sections of a netlist reduces the amount of work in each design iteration and accelerates design closure. Our approach integrates graph-based, simulation-driven and SAT-based techniques to make Boolean matching feasible for large circuits. Experimental results confirm scalability of our techniques to circuits with hundreds and even thousands of inputs and outputs.
INTRODUCTION
Boolean matching is the problem of determining whether two Boolean functions are functionally equivalent under the permutation and negation of inputs and outputs. This formulation is usually referred to as the generalized Boolean matching problem or PNPN-equivalence checking (PNPN stands for Permutation and Negation of outputs and Permutation and Negation of inputs); however, different variants of the problem have been introduced for different synthesis and verification applications. The matching problem that we discuss in this paper is PP-equivalence checking: two Boolean functions are called PPequivalent if they are equivalent under permutation of inputs and permutation of outputs. The simplest method to determine whether two n-input m-output Boolean functions are PPequivalent is to explicitly enumerate all the ! ! possible matches and perform tautology checking on each. However, this exhaustive search is computationally intractable.
PP-equivalence checking finds numerous applications in verification and logic synthesis. In many cases, an existing design is modified incrementally leaving a great portion of the design untouched. In these cases, large isomorphic sub-circuits exist in original and slightly modified circuits [15] . Identifying such subcircuits and reutilizing them whenever possible saves designers a great amount of money and time. Due to the fact that modifications to the original circuit are introduced by changing certain specifications and the fact that even a slight change in specifications can lead to large changes in implementation [10] , PP-equivalence checking helps designers identify isomorphic and other equivalent sub-circuits.
Specifically, PP-equivalence checking can be used to find the minimal set of changes in logic, known as logic difference, between the original design and the design with modified specification. DeltaSyn [11] is a tool developed at IBM Research that identifies and reports this logic difference. The current version of DeltaSyn uses a relatively inefficient and unscalable Boolean matcher that only exploits the symmetry of inputs to prune the search space.
Incremental Sequential Equivalence Checking (SEC) is another application of PP-equivalence checking where isomorphic subcircuits can be used to create a number of highly-likely candidate equivalent nodes [15] . The current implementation of incremental SEC tires to find isomorphic subgraphs by performing extended simulation and finding structural similarities. Although the Boolean approach presented in our paper does not fully exploit the structural similarities between two circuits, we believe that our techniques combined with structural verification techniques create a much more powerful tool for detecting isomorphic subgraphs.
Motivated by the practical importance of PP-equivalence checking in many EDA applications, we develop fast and scalable Boolean matching algorithms and implement them in the ABC package -an established system for synthesis and verification of combinational and sequential logic circuits [5] . The collection of all these techniques creates a powerful Boolean matching module that can be integrated into Combinational Equivalence Checking (CEC) to enhance its functionality. To this end, CEC requires two designs whose primary I/Os match by name. Our work allows one to relax this requirement with the help of a Boolean matcher. We call the new command Enhanced CEC (ECEC). Figure 1 shows how our Boolean matcher is integrated with CEC. In general, algorithms for Boolean matching fall into two major categories: signature based and canonical form based. A signature is a property of an input or an output that is invariant under permutations and negation of inputs. The goal of signature based matching is to prune the Boolean matching space by filtering out impossible I/O correspondences [4] [1] . On the other hand, in matching based on canonical forms, first canonical representations of two Boolean functions are computed and then compared against each other to find valid I/O matches [3] [2] . Here, our PP-equivalence checking method first prunes the search space using graph algorithms and simulation signatures, then it invokes SAT-solving until exact I/O matches are found.
Main contributions of our work include: 1. Analyzing functional dependency. In a Boolean network with multiple outputs, some inputs affect only a fraction of the outputs, and different outputs are affected in different ways. Hence, by analyzing the functional dependency of outputs on inputs, we can distinguish the I/Os. 2. Exploiting input observability and output controllability. We use the observabilty of inputs and the controllability of outputs as (1) effective matching signatures, and (2) ordering heuristics for our SAT-based matching. 3. Building a SAT-tree. When information about controllability, observability, and all simulation-based information are exhausted, we resort to SAT-solving and optimize the efficiency of SAT calls. This is accomplished through the concept of a SAT-tree, which is pruned in several ways.
4. Pruning SAT-tree using SAT counterexamples. In our SAT-based matching, the SAT-solver returns a counterexample whenever it finds an invalid match. The information in these counterexamples is then used to prune the SAT-tree. The remainder of this paper is organized as follows: Section 2 provides relevant background and discusses previous work on Boolean matching. Section 3 gives an overview of proposed signature based techniques. Section 4 describes our SAT-based matching approach. Section 5 validates our method in experiments on available benchmarks and Section 6 concludes our work.
BACKGROUND AND PREVIOUS WORK
In this section, we review necessary background and discuss relevant work.
Definitions and Notation
In the following definitions, an input set of a Boolean network refers to the set of all the inputs of . Similarly, an output set of refers to the set of the all outputs of . An I/O set is either an input set or an output set. 
And-Inverter Graphs
Recent tools for scalable logic synthesis, e.g., ABC, represent Boolean functions using the And-Inverter Graph (AIG) data structure. An AIG is a Boolean network composed of two-input AND gates and inverters. Structural hashing of an AIG is a transformation that reduces the AIG size by partially canonicalizing the AIG structure [14] . Representing a Boolean function in its AIG form is preferable to its Binary Decision Diagram (BDD) form mainly because AIGs result in smaller space complexity. Also, functional simulation can be performed much faster on AIGs, but AIGs are only locally canonical.
Boolean Satisfiability (SAT) and Equivalence-checking
Boolean Satisfiability (SAT) is the problem of determining whether there exists a variable assignment to a Boolean formula that forces the entire formula evaluate to true; if such an assignment exists, the formula is said to be satisfiable and otherwise unsatisfiable. Pioneering techniques developed to solve the SAT problem were introduced by Davis, Putnam, Logemann and Loveland in early 1960s. They are now referred to as DPLL algorithm [7] [8] . Modern SAT solvers, such as MiniSAT [9] , have augmented DPLL search by adding efficient conflict analysis, clause learning, back-jumping and watched literals to the basic concepts of DPLL.
SAT is studied in a variety of theoretical and practical contexts, including those arising in EDA. CEC is one of the main applications of SAT in EDA. If two single-output Boolean functions and are equivalent, then must always evaluate to 0, and vice versa. Now, instead of simulating all input combinations, we take advantage of SAT solvers: if is unsatisfiable, then is zero for all input combinations and hence and are equivalent; and if is satisfiable, then and are not equivalent and the satisfying assignment found by the SAT-solver is returned as a counterexample.
is called the miter of and [13] . If and have more than one output, say outputs , … , and , … , , is first computed for all and then is constructed as the miter of and . In our approach, instead of building one miter for the entire circuit and handing it off to the SAT solver, we try to find equivalent intermediate signals by simulation, and use SAT to prove their equivalence. Counterexamples from SAT are used to refine simulation.
Previous Work
Research in Boolean matching started in the early 1980s with main focus on technology mapping (cell binding). A survey of Boolean matching techniques for library binding is given in [4] . Until recently, Boolean matching techniques scaled only to 10-20 inputs and one output [6] [2], which is sufficient for technology mapping, but not for applications considered in our work. For example, in 2008, Abdollahi and Pedram presented algorithms based on canonical forms that can handle libraries with numerous cells limited to approximately 20 inputs [2] . Their approach uses generalized signatures (signatures of one or more variables) to find a canonicity-producing (CP) phase assignment and ordering for variables.
A DAC 2009 paper by Wang, Chan and Liu [16] offers simulation-driven and SAT-based algorithms for checking Pequivalence that scale beyond the needs of technology mapping. Since our proposed techniques also use simulation and SAT to solve the PP-equivalence checking problem, we should articulate the similarities and the differences. Firstly, we consider the more general problem of PP-equivalence checking where permutation of outputs (beside permutation of inputs) is allowed. In PPequivalence, the construction of miters must be postponed until the outputs are matched, which seems difficult without matching the inputs. To address this challenge, we develop the concept of SAT-tree which is pruned to moderate the overall runtime of PPmatching. In addition to our SAT-based approach, we also use graph-based techniques in two different ways: to initially eliminate impossible I/O correspondence and to prune our SATtree. Furthermore, we have implemented three simulation types; two as signatures for outputs (type 1 and type 3) and one as a signature for inputs (type 2). While our type-2 simulation is loosely related to one of the techniques described in [16] , the other two simulations are new. We additionally introduce effective heuristics that accelerate SAT-based matching.
SIGNATURE-BASED MATCHING TECHNIQUES
We now formalize the PP-equivalence checking problem and outline our Boolean matching approach for two n-input m-output Boolean networks. Given two input sets and and two outputs sets and of two Boolean networks and , the goal of PP-equivalence checking is to find two complete mappings and such that those mappings make and behave functionally the same. In other words, there are two input sets and two outputs sets given, while the objective is to partition or refine these I/O sets based on some well-defined ordering criteria such that impossible I/O matches are identified and removed in each step until exact I/O matches are found. Furthermore, Definition 3.1 implies the following lemma.
Lemma 3.2. At any point in the refinement process of two I/O sets and of two Boolean networks and , if | | | | or | | | | for some , we conclude that and behave differently and we stop the Boolean matching process.
As mentioned earlier, refinement at each step requires a welldefined ordering criterion, tailored to the specific refinement technique used. Therefore, whenever we introduce a new matching technique, we also explain its ordering criterion. Furthermore, the following techniques are applied to the two input circuits one after another.
Computing I/O Support Variables
Definition 3. The goal here is to find a list of outputs that might be functionally affected by a particular input and a list of inputs that might functionally affect a particular output. Here, we contrast functionally matching with structurally matching in the sense that two structurally different circuits with the same functionality should have the same I/O support. In general, the lack of structural dependency between an output and an input precludes a functional dependency, and the presence of a structural dependency most often indicates a functional dependency − this can usually be confirmed by random simulation, and in rare cases requires calling a SAT-solver [12] .
Example 3.1.3. Consider a 4-bit adder with input set , , … , , , … , and output set , … , . The ripple-carry realization of this 4-bit adder is shown in Figure 2 . It is evident from the above circuit that can affect the values of , … , and can affect the value of , … , . So, , … , and , … , . Similarly, the value of is only affected by the value of , and ; so, , , . 
Initial refinement of I/O clusters

Scalable I/O Refinement by Random Simulation
Functional simulation holds the promise to quickly prune away unpromising branches of search, but this seems to require a matching of outputs. Instead, we find pairs of input vectors that sensitize comparable functional properties of the two circuits. 
Intuitively, two consistent random input vectors try to assign the same value to all potentially matchable inputs of the two Boolean networks. In the next three subsections, we distinguish three types of simulation based on pairs of consistent random input vectors that help us sensitize certain functional properties of the two circuits.
Simulation type 1
Lemma 3.4.4. Let be a proper random input vector and let , … be its corresponding output vector under . Two outputs and in one output cluster are distinguishable if . Ordering criterion 3.4.5. The output subcluster of all 0s precedes the output subcluster of all 1s.
Simulation type 2
Definition 3.4.6. Let be a proper random input vector and let , … be its corresponding output vector under . Let be another input vector created by flipping the value of input in and let , … be the corresponding output vector under . The observability of input with respect to denoted by is defined as the number of flips in the outputs caused by , i.e., the number of times . Lemma 3.4.7. Two inputs and in one input cluster are distinguishable if . Ordering criterion 3.4.8. Let and be two inputs in one input cluster and let . Then, the input subcluster containing precedes the input subcluster containing .
Simulation type 3
Definition 3. After matching I/Os using random simulation, we check if any progress is achieved in refining I/O clusters. If a new cluster is added, the algorithm continues refining based on random simulation. The procedure terminates when no new refinement occurs in input or output subclusters after a certain number of iterations.
SAT-BASED SEARCH
The scalable methods we introduced so far typically reduce the number of possible matches from ! ! to hundreds or less, often making exhaustive search (with SAT-based equivalencechecking) practical. However, this phase of Boolean matching can be significantly improved, and the techniques we develop facilitate scaling to even larger instances.
SAT-based Input Matching
The basic idea in our SAT-based matching approach is to build a tree data structure called SAT-tree that matches one input at a time from the remaining non-singleton input clusters. Subsequently, after an input is matched, all the outputs in its support which are not matched so far are also matched, one by one. In other words, we build a dual-purpose SAT-tree that repeatedly matches inputs and outputs until exact I/O matches are found. We take advantage of the following lemma to build our SAT-tree:
Lemma 4. and , we try to expand and back to and by matching one input at a time. Let and be the first two non-singleton input clusters of and and let . The goal here is to match with one of the | | inputs in . Assume that , and we pick as the first candidate to match . Now, in order to reflect our matching decision, we partition and to make and two singleton clusters; so, is partitioned to , and , and is partitioned to , and , . Complying with our previous notation, now , , … , and , , … , are the new non-singleton clusters. We then build two Boolean networks and from and by putting all the inputs in non-singleton clusters to either constant 0 or constant 1, and we pass the miter of and to the SAT-solver. The result of the SAT-solver might be either satisfiable or unsatisfiable. If the result is:
• unsatisfiable:
and are functionally equivalent. In other words, and has been a valid match so far. So, first try to match the outputs in the supports of and (only the outputs that have not been matched so far) and then match the next two inputs in , and , .
• satisfiable:
and are not functionally equivalent. In other words, cannot match . Backtrack one level up and use the counterexample to prune the SAT-tree.
Pruning Impossible Input Matches
Pruning the SAT-tree using counterexamples produced by SAT is a key step in our Boolean matching methodology. Continuing the scenario in Section 4.1, assume that the miter of and is satisfiable. Suppose that the SAT-solver returns an input vector , … , as the satisfying assignment. This input vector carries a crucial piece of information: the matching attempt before matching and was a successful match; otherwise, we would have backtracked in the previous level and we would have never tried matching and . Thus, the input vector sensitizes a path from and to the outputs of the miter.
According to Lemma 4.1.1, if we repeatedly calculate negative and positive cofactors of and with respect to the values of , … , in vector , we obtain two new Boolean networks and that must be functionally equivalent under some ordered partition , … , and , … , . In other words, and are two smaller Boolean networks that only contain the inputs of and that have not found exact match so far. Since and are computed with respect to the values of , … , in and since is a vector that sensitizes a path form and to the output of the miter, we conclude that there exists an output in that is functionally dependent on . Existence of such an output ensures that 0. We can now apply our simple filtering signature from Lemma 3.2.1, to prune the SAT-tree. Specifically, can match to q j only if in and . Example 4.2.1. Consider two 8-to-1 multiplexers with input sets , … , , , , and , … , , , , and outputs and . Refining and based on the techniques explained in Section 3 would result in two ordered partitions , … , , , , , , and , … , , , , , , (refer to Example 3.4.12). In order to find exact input matches, we build our SAT-tree and we first try matching and . The SAT-solver confirms the validity of this match; so, we continue. Then, matches and matches . These two matches are also valid. So far, , … , , ,
, . Now, we look at the next non-singleton input cluster and we match and . Our SAT-solver specifies that matching and do not form a valid match and then it returns vector in which 0, 0, 1, 0, 0, 1 as a counterexample. In order to see why is a counterexample of matching and , we look at the cofactors of the two multiplexers, and , where all the inputs in non-singleton clusters are put to 0: and ́ ́ ́ ́ ́ . Applying to and would result in 1 and 0. Since we know that does not match , we use the counterexample to prune the SAT-tree. Specifically, we compute cofactors of the two multiplexers, and , with respect to the values of matched inputs in . So, and ́ ́ . In and , 1. This means that can only match . In other words, we have pruned the SAT-tree by not matching to any of inputs , , and . We continue matching inputs of the multiplexer until we find valid matches.
SAT-based Output Matching
Let and be the output sets of two Boolean networks and and let , … , and , … , be two ordered partitions defined on them. Continuing the scenario in Section 4.1, assume that is a support variable of , is a support variable of , and and are two nonsingleton output clusters of and . In order to verify if and match under current input correspondence, we add to the current miter of and and we call SAT-solver once again. If SAT returns unsatisfiable, i.e., matches , we continue matching the unmatched outputs in the support of and . If the result is satisfiable, we once again use the counterexample returned by SAT to prune the search space.
Pruning Impossible Output Matches
When output does not match output , the counterexample returned by SAT is a vector that makes 1 and 0 or vice versa. This means that matches output only if under . This simple fact allows us to drastically prune SAT-tree whenever an invalid output match occurs.
A Heuristic for Matching Candidates
In order to reduce the branching factor of our SAT-tree, we first match I/Os of smaller I/O clusters. Also, within one I/O cluster, we exploit the observability of the inputs and controllability of the outputs, to make more accurate guesses in our SAT-based matching approach. Heuristically, the probability that two I/Os match is higher when their observability/controllability are similar. We observed that, in many designs, the observability of control signals is higher than that of data signals. Therefore, we first match control signals. This simple heuristic greatly improves the runtime -experiments indicate that once control signals are matched, data signals can be matched quickly.
EMPIRICAL VALIDATION
We have implemented the proposed approach in ABC and we have experimentally evaluated its performance on a 2.67GHz Intel Xeon CPU running Windows Vista. Table 1 shows the runtime of our algorithms on ITC'99 benchmarks. In Table 1 , #I is the number of inputs, #O is the number of outputs and |AIG| is the number of nodes in the AIG of each circuit. The last two columns show the execution time for P-equivalence and PPequivalence checking problems. Specifically, Columns 1, 2, 3 and 4 under each checking problem demonstrate the initialization time (computing I/O support variables, initially refining I/O cluster and refining based on I/O dependencies), simulation time, SAT time, and overall time. Furthermore, (I%) and (I%,O%) show the percentage of inputs and I/Os that are matched after each step. Note that for each testcase we generate 20 new circuits each falling into one of the two following categories: (1) permuting inputs for verifying P-equivalence (2) permuting both inputs and outputs for verifying PP-equivalence. The results given in Table 1 are the average results over all the generated testcases for each category. Furthermore, the AIGs of the new circuits are reconstructed using ABC's combinational synthesis commands to ensure that the new circuits are structurally different from the original ones.
In Table 1 , 18 circuits out of 20 have less than a thousand I/Os. Checking P-equivalence and PP-equivalence for 12 out of these18 circuits take less than a second. There is only one circuit (b12) for which our software cannot match I/Os in 5000 seconds. The reason is that, for b12, 1033 out of 7750 input pairs (13%) are symmetric and since our implementation does not yet account for symmetries, our SAT-tree repeatedly searches symmetric branches that do not yield valid I/O matches. For b20, b21 and b22 and for b17 and b18 with more than a thousand I/Os, computing functional dependency is the bottleneck of the overall matching runtime. Note that checking PP-equivalence for b18 results in a very large SAT-tree that cannot be resolved within 5000 seconds, although our refinement techniques before invoking SAT find exact matches for 3123 out of 3357 inputs (93%) and 3111 out of 3343 outputs (93%). In order to compare our work to [16] , we have tested our algorithms on circuits from [16] that have more than 150 inputs. Results are listed in Table 2 . For the results reported from [16] , Orig, Unate and +Symm respectively show the runtime when no functional property is used, only functional unatness is used and, both unateness and symmetries are used. Note that experiments reported in [16] used 3GHz Intel CPUs, while our runs were on a 2.67GHz Intel CPU. To make the numerical comparisons entirely fair, our runtimes would need to be multiplied by 0.89. However, we omit this step, since our raw runtimes are already superior in many cases. According to Table 2 , our matching algorithm times out in 5000 seconds on C2670, i2 and i4. This is again due to the symmetries that are present in the inputs of these circuits. Note that the approach in [16] cannot solve these three circuits without symmetry search, either. For some other circuits, such as C7552, our approach verifies P-equivalence in less than 10 seconds but the approach in [16] cannot find a match without invoking symmetry finder. It is also evident from the results that checking P-equivalence for very large circuits, such as s38584 and s38417, is 3.5-11 times slower when symmetry finding and unateness calculations are performed during Boolean matching. This confirms our intuition that symmetry and unateness are not essential to Boolean matching in many practical cases, although they may occasionally be beneficial.
SUMMARY AND CONCLUSION
In this paper, we proposed techniques for solving large-scale PP-equivalence checking problem. Our approach integrates graph-based, simulation driven and SAT-based techniques to efficiently solve the problem. Empirical validation on available benchmarks confirms its scalability to circuits with thousands of inputs and outputs. Future advances in Boolean matching, as well as many existing techniques, can also be incorporated into our framework to improve its scalability.
