Abstract. SAT solvers are often challenged with very hard problems that remain unsolved after hours of CPU time. The research community meets the challenge in two ways: (1) by improving the SAT solver technology, for example, perfecting heuristics for variable ordering, and (2) by inventing new ways of constructing simpler SAT problems, either using domain specific information during the translation from the original problem to CNF, or by applying a more universal CNF simplification procedure after the translation. This paper explores preprocessing of circuitbased SAT problems using recent advances in logic synthesis. Two fast logic synthesis techniques are considered: DAG-aware logic minimization and a novel type of structural technology mapping, which reduces the size of the CNF derived from the circuit. These techniques are experimentally compared to CNF-based preprocessing. The conclusion is that the proposed techniques are complementary to CNF-based preprocessing and speedup SAT solving substantially on industrial examples.
Introduction
Many of today's real-world applications of SAT stem from formal verification, test-pattern generation, and post-synthesis optimization. In all these cases, the SAT solver is used as a tool for reasoning on boolean circuits. Traditionally, instances of SAT are represented on conjunctive normal form (CNF), but the many practical applications of SAT in the circuit context motivates the specific study of speeding up SAT solving in this setting.
For tougher SAT problems, applying CNF based transformations as a preprocessing step [6] has been shown to effectively improve SAT run-times by (1) minimizing the size of the CNF representation, and (2) removing superfluous variables. A smaller CNF improves the speed of constraint propagation (BCP), and reducing the number of variables tend to benefit the SAT solver's variable heuristic. In the last decade, advances in logic synthesis has produced powerful and highly scalable algorithms that perform similar tasks on circuits. In this paper, two such techniques are applied to SAT.
The first technique, DAG-aware circuit compression, was introduced in [2] and extended in [11] . In this work, it is shown that a circuit can be minimized efficiently and effectively by applying a series of local transformations taking logic sharing into account. Minimizing the number of nodes in a circuit tends to reduce the size of the derived CNFs that are passed to the SAT engine. The process is similar to CNF preprocessing where a smaller representation is also achieved through a series of local rewrites.
The second technique applied in this paper is technology mapping for lookuptable (LUT) based FPGAs. Technology mapping is the task of partitioning a circuit graph into cells with k inputs and one output that fits the LUTs of the FPGA hardware, while using as little area as possible. Many of the signals present in the unmapped circuit will be hidden inside the LUTs. In this manner, the procedure can be used to decide for which signals variables should be introduced when deriving a CNF, leading to CNF encodings with even fewer variables and clauses than existing techniques [14, 15, 9] .
The purpose of this paper is to draw attention to the applicability of these two techniques in the context of SAT solving. The paper makes a two-fold contribution: (1) it proposes a novel CNF generation based on technology mapping, and (2) it experimenally demonstrated the practicality of the logic synthesis techniques for speeding up SAT.
Preliminaries
A combinational boolean network is a directed acyclic graph (DAG) with nodes corresponding to logic gates and directed edges corresponding to wires connecting the gates. Incoming edges of a node are called fanins and outgoing edges are called fanouts. The primary inputs (PIs) of the network are nodes without fanins. The primary outputs (POs) are nodes without fanouts. The PIs and POs define the external connections of the network.
A special case of a boolean network is the and-inverter graph (AIG), containing four node types: PIs, POs, two-input AND-nodes, and the constant TRUE modelled as a node with one output and no inputs. Inverters are represented as attributes on the edges, dividing them into unsigned edges and signed (or complemented) edges. An AIG is said to be reduced and constant-free if (1) all the fanouts of the constant TRUE, if any, feed into POs; and (2) no AND-node has both of its fanins point to the same node. Furthermore, an AIG is said to be structurally-hashed if no two AND-nodes have the same two fanin edges including the sign. By decomposing k-input functions into two-input ANDs and inverters, any logic network can be reduced to an AIG implementing the same boolean function of the POs in terms of the PIs.
A cut C of node n is a set of nodes of the AIG, called leaves, such that any path from a PI to n passes through at least one leaf. A trivial cut of a node is the cut composed of the node itself. A cut is k-feasible if the number of nodes in it does not exceed k. A cut C is subsumed by C ′ of the same node if C ′ ⊂ C.
Cut Enumeration
Here we review the standard procedure for enumerating all k-feasible cuts of an AIG. Let ∆ 1 and ∆ 2 be two sets of cuts, and the merge operator ⊗ k be defined as follows:
Further, let n 1 , n 2 be the first and second fanin of node n, and let Φ(n) denote all k-feasible cuts of n, recursively computed as follows:
This formula gives a simple procedure for computing all k-feasible cuts in a single topological pass from the PIs to the POs. Informally, the cut set of an AND node is the trivial cut plus the pair-wise unions of cuts belonging to the fanins, excluding those cuts whose size exceeds k. Reconvergent paths in the AIG lead to generating subsumed cuts, which may be filtered out for most applications.
In practice, all cuts can be computed for k ≤ 4. A partial enumeration, when working with larger k, can be achieved by introducing an order on the cuts and keeping only the L best cuts at each node. Formally: substitute Φ for Φ L where Φ L (n) is defined as the trivial cut plus the L best cuts of ∆ 1 ⊗ k ∆ 2 .
DAG-Aware Minimization
The concept of DAG-aware minimization was introduced by Bjesse et. al. in [2] , and further developed by Mishchenko et. al. in [11] . The method works by making a series of local modifications to the AIG, called rewrites, such that each rewrite reduces the total number of AIG nodes. To accurately compute the effect of a rewrite on the total number of nodes, logic sharing is taken into account. Two equally-sized implementations of a logical function may have different impact on the total node count if one of them contains a subgraph that is already present in the AIG (see Figure 1) .
In [11] the authors propose to limit the rewrites to 4-input functions. There exists 2 16 = 65536 such functions. By normalizing the order and polarity of input and output variables, these functions are divided into 222 equivalence classes. DAG-Aware Minimization. Perform a 4-feasible cut enumeration, as described in the previous section, proceeding topologically from the PIs to the POs. During the cut enumeration, after computing the cuts for the current node n, try to improve its implementation as follows: For every cut C of n, let f be the function of n in terms of the leaves of C. Consider all the candidate implementations of f and choose the one that reduces the total number of AIG nodes the most. If no reduction is possible, leave the AIG unchanged; otherwise recompute the cuts for the new implementation of node n and continue the topological traversal.
Several components are necessary to implement this procedure:
-A cut enumeration procedure, as described in the previous section.
-A bottom-up topological iterator over the AIG nodes that can handle rewrites during the iteration.
-An incremental procedure for structural hashing. In order to efficiently search for the best substitution candidate, the AIG must be kept structurallyhashed, reduced and constant-free. After a rewrite, these properties may be violated and must be restored efficiently.
-A pre-computed table of good implementations for 4-input functions. We propose to enumerate all structurally-hashed, reduced and constant-free AIGs with 7 nodes or less, discarding candidates not meeting the following property: For each node n, there should be no node m in the subgraph rooted in n, such that replacing n with m leads to the same boolean function. Example: "(a ∧ b) ∧ (a ∧ c)" would be discarded since replacing the node "(a ∧ b)" with its subnode "b" does not change the function.
-An efficient procedure to evaluate the effect of replacing the current implementation of a node with a candidate implementation.
The implementation of the above components is straight-forward, albeit tedious. We observe that in principle, the topological iterator can be modified to revisit nodes as their fanouts change. When this happens, new opportunities for DAGaware minimization may be exposed. Modifying the iterator in this way yields an idempotent procedure, meaning that nothing will change if it is run a second time. In practice, we found it hard to make such a procedure efficient. A simpler and more useful modification to the above procedure is to run it several times with a perturbation phase in between. By changing the structure of the AIG, without increasing its size, new cuts can conservatively be introduced with the potential of revealing further node saving rewrites. One way of perturbing the AIG structure is to visit all multi-input conjunctions and modify their decomposition into two-input And-nodes. Another way is to perform the above minimization algorithm, but allow for zero-gain rewrites.
CNF through the Tseitin Transformation
Many applications rely on a some version of the Tseitin transformation [14] for producing CNFs from circuits. For completeness, we state the exact version compared against in our experiments. When the transformation is applied to AIGs, two improvements are often used: (1) multi-input Ands are recognized in the AIG structure and translated into clauses as one gate, and (2) if-then-else expressions (MUXes) are detected in the AIG through simple pattern matching and given a specialized CNF translation. The clauses generated for these two cases are:
x ↔ And(a 1 , a 2 , . . ., a n ). Clause representation:
If-then-else with selector s, true-branch t, false-branch f. Clause representation:
The two clauses labeled "red" are redundant, but including them increases the strength of unit propagation. It should be noted that a two-input Xor is handled as a special case of a MUX with t and f pointing to the same node in opposite polarity. This results in representing each Xor with four three-literal clauses (the redundant clauses are trivially satisfied). In the experiments presented in section 7, the following precise translation was used:
-The roots are defined as (1) And-nodes with multiple fanouts; (2) Andnodes with a single fanout that is either complemented or leads to a PO; (3) And-nodes that, together with its two fanin nodes, define an if-then-else.
-If a root node defines an if-then-else, the above translation with 6 clauses, including redundant clauses, is used.
-The remaining root nodes are encoded as multi-input Ands. The scope of the conjunction rooted at n is computed as follows: Let S be the set of the two fanins of n. While S contains a non-root node, repeatedly replace that node by its two fanins. The above clause translation for multi-input Ands is then used, unless the conjunction collected in this manner contains both x and ¬x, in which case, a unit clause coding for x ↔ False is used.
-Unlike some other work [7, 9] , there is no special treatment of nodes that occur only positively or negatively.
CNF through Technology Mapping
Technology mapping is the process of expressing an AIG in the form representative of an implementation technology, such as standard cells or FPGAs. In particular, lookup-table (LUT) mapping for FPGAs consists in grouping Andnodes of the AIG into logic nodes with no more than k inputs, each of which can be implemented by a single LUT. Normally, technology mapping procedures optimize the area of the mapped circuit under delay constraints. Optimal delay mapping can be achieved efficiently [3] , but is not desirable for SAT where size matters more than logic depth. Therefore we propose to map for area only, in such a way that a small CNF can be derived from the mapped circuit. In the next subsections, we review an improved algorithm for structural technology mapping [12] .
Definitions
A mapping M of an AIG is a partial function that takes a non-PI (i.e. And or PO) node to a k-feasible non-trivial cut of that node. Nodes for which mapping M is defined are called active (or mapped), the remaining nodes are called inactive (or unmapped). A proper mapping of an AIG meets the following three criteria: (1) all POs are active, (2) if node n is active, every leaf of cut M(n) is active, and (3) for every active And-node m, there is at least one active node n such that m is a leaf of cut M(n). The trivial mapping (or mapping induced by the AIG) is the proper mapping which takes every non-PI node to the cut composed of its immediate fanins.
An ordered cut-set Φ L is a total function that takes a non-PI node to a nonempty ordered sequence of L or less k-feasible cuts. In the next section, M and Φ L as will be viewed as updateable objects and treated imperatively with two operations: For an inactive node n, procedure activate(M, Φ L , n) sets M(n) to the first cut in the sequence Φ L (n), and then recursively activates inactive leaves of M(n). Similarly, for an active node n, procedure inactivate(M, n), makes node n inactive, and then recursively inactivates any leaf of the former cut M(n) that is violating condition (3) of a proper mapping.
Furthermore, nFanouts(M, n) denotes the number of fanouts of n in the subgraph induced by the mapping. The average fanout of a cut C is the sum of the number of fanouts of its leaves, divided by the number of leaves. Finally, the maximally fanout-free cone (MFFC) of node n, denoted mffc(M, n), is the set of nodes used exclusively by n. More formally, a node m is part of n's MFFC iff every path in the current mapping M from m to a PO passes through n. For an inactive node, mffc(M, Φ L , n) is defined as the nodes that would belong to the MFFC of node n if it was first activated.
A Single Mapping Phase
Technology mapping performs a sequence of refinement phases, each updating the current mapping M in an attempt to reduce the total cost. The cost of a single cut, cost (C), is given as a parameter to the refinement procedure. The total cost is defined as sum of cost(M(n act )) over all active nodes n act .
Let M and Φ L be the proper mapping and the ordered cut-set from the previous phase. A refinement is performed by a bottom-up topological traversal of the AIG, modifying M and Φ L for each And-node n as follows:
-All k-feasible cuts of node n (with fanins n 1 and n 2 ) are computed, given the sets of cuts for the children:
-If the first element of Φ L (n) is not in ∆, it is added. This way, the previously best cut is always eligible for selection in the current phase, which is a sufficient condition to ensure global monotonicity for certain cost functions.
-Φ L (n) is set to be the L best cuts from ∆, where smaller cost, higher average fanout, and smaller cut size is better. The best element is put first.
-If n is active in the current mapping M, and if the first cut of Φ L (n) has changed, the mapping is updated to reflect the change by calling inactivate(M, n) followed by calling activate(M, Φ L , n). After this, M is guaranteed to be a proper mapping.
The Cost of Cuts
This subsection defines two complementary heuristic cost function for cuts:
Area Flow. This heuristic estimates the global cost of selecting a cut C by recursively approximating the cost of other cuts that have to be introduced in order to accommodate cut C:
Exact Local Area. For nodes currently not mapped, this heuristic computes the total cost-increase incurred by activating n with cut C. For mapped nodes, the computations is the same but n is first deactivated. Formally:
In standard FPGA mapping, each cut is given an area of 1 because it takes one LUT to represent it. A small but important adjustment for CNF generation is to define area in terms of the number of clauses introduced by that cut. Doing so affects both the area flow and the exact local area heuristic, making them prefer cuts with a small representation.
The boolean function of a cut is translated into clauses by deriving its irredundant sum-of-products (ISOP) using Minato-Morreale's algorithm [10] Fig. 2 . Irredundant sum-of-product generation. A cover (= SOP = DNF) is a set, representing a disjunction, of cubes (= product = conjunction of literals). A cover c induces a boolean function func(c). An irredundant SOP is a cover c where no cube can be removed without changing func(c). In the code, boolfunc denotes a boolean function of a fixed number of variables x1, x2, . . . , xn (in our case, the width of a LUT). L and U denotes the lower and upper bound on the cover to be returned. At top-level, the procedure is called with L = U . Furthermore, topVariable(L, U ) selects the first variable, from a fixed variable order, which L or U depends on. Finally, cofactors(F , x) returns the pair (
in Figure 2 ). ISOPs are computed for both f and ¬f to generate clauses for both sides of the bi-implication t ↔ f (x 1 , . . . , x k ). For the sizes of k used in the experiments, boolean functions are efficiently represented using truth-tables. In practice, it is useful to impose a bound on the number of products generated and abort the procedure if it is exceeded, giving the cut an infinitly high cost.
The Complete Mapping Procedure
Depending on the time budget, technology mapping may involve different number of refinement passes. For SAT, only a very few passes seem to pay off. In our experiments, the following two passes were used, starting from the trivial mapping induced by the AIG:
-An initial pass, using the area-flow heuristic, cost AF , which captures the global characteristics of the AIG.
-A final pass with the exact local area heuristic, cost ELA . From the definition of local area, this pass cannot increase the total cost of the mapping.
Finally, there is a trade-off between the quality of the result and the speed of the mapper, controlled by the cut size k and the maximum number of cuts stored at each node L. To limit the scope of the experimental evaluation, these parameters were fixed to k = 8 and L = 5 for all benchmarks. From a limited testing, these values seemed to be a good trade-off. It is likely that better results could be achieved by setting the parameters in a problem-dependent fashion.
Experimental Results
To measure the effect of the proposed CNF reduction methods, 30 hard SAT problems represented as AIGs were collected from three different sources. The first suite, "Cadence BMC", consists of internal Cadence verification problems, each of which took more than one minute to solve using SMV's BMC engine. Each of the selected problem contains a bug and has been unrolled upto the length k, which reveals this bug (yielding a satisfiable instance) as well as upto length k − 1 (yielding an unsatisfiable instance). The second suite, "IBM BMC", is created from publically available IBM BMC problems [16] . Again, problems containing a bug were selected and unrolled to length k and k − 1. Problems that MINISAT could not solve in 60 minutes were removed, as were problems solved in under 5 seconds.
Finally, the third suite, "SAT Race", was derived from problems of SAT-Race 2006. Armin Biere's tool "cnf2aig", part of the AIGER package [1] , was applied to convert the CNFs to AIGs. Among the problems that could be completely converted to AIGs, the "manol-pipe" class were the richest source. As before, very hard and very easy problems were not considered.
For the experiments, we used the publically available synthesis and verification tool ABC [8] and the SAT solver MINISAT2. The exact version of ABC used in these experiments, as well as other information useful for reproducing the experimental results presented in this paper, can be found at [5] .
Clause Reduction. In Table 1 we compare the difference between generating CNFs using only the Tseitin encoding (section 5) and generating CNFs by applying different combinations of the presented techniques, as well as CNF preprocessing [6] (as implemented in MINISAT2). Reductions are measured against the Tseitin encoding. For example, a reduction of 62% means that, on average, the transformed problem contains 0.38 times the original number of clauses.
We see a consistent reduction in the CNF size, especially in the case where the CNF was derived using technology mapping. The preprocessing scales well, although its runtime, in our current implementation, is not negligible.
For space reasons, we do not present the total number of literals. However, we note that: (1) the speed of BCP depends on the number of clauses, not literals; (2) deriving CNFs from technology mapping produces clauses of at most size k + 1, which is 9 literals in our case; and (3) in [6] it was shown that CNF preprocessing in general does not increase the number of literals significantly. Table 2 we compare the SAT runtimes of the differently preprocessed problems. Runtimes do not include preprocessing times. At this stage, when the preprocessing has not been fully optimized for the SAT context, it is arguably more interesting to see the potential speedup. If the preprocessing is too slow, its application can be controlled by modifying one of the parameters (such as the number or width of cuts computed), or preprocessing may be delayed until plain SAT solving has been tried for some time without solving the problem. Furthermore, for BMC problems, the techniques can be applied before unrolling the circuit, which is significantly faster (see Incremental BMC below).
SAT Runtime. In
Speedup is given both as a total speedup (the sum total of all SAT runtimes) and as arithmetic and harmonic average of the individual speedups. For BMC, we see a clear gain in the proposed methods, most notably for the Cadence BMC problems where a total speedup of 6.9x was achieved not using SATELITEstyle preprocessing, and 5.3x with SATELITE-style preprocessing (for a total of 22.3x speedup compared to plain SAT on Tseitin). However, the problems from the SAT-Race benchmark exhibit a different behavior resulting in an increased runtime. It is hard to explain this behavior without knowing the details of the benchmarks. For example, equivalence checking problems are easier to solve if the equivalent points in the modified and golden circuit are kept. The proposed methods may remove such pairs, making the problems harder for the SAT solver.
CNF Generation based on Technology Mapping. Here we measure the effect of using the number of CNF clauses as the size estimator of a LUT, rather than a unit area as in standard technology mapping. In both cases, we map using LUTs of size 8, keeping the 5 best cuts at each node during cut enumeration. The results are presented in Table 5 . As expected, the proposed technique lead to fewer clauses but more variables. In these experiments, the clause reduction almost consistently resulted in shorter runtimes of the SAT solver.
Incremental BMC. An alternative and cheaper use of the proposed techniques in the context of BMC, is to minimize the AIG before unrolling. This prevents simplification across different time frames, but is much faster (in our benchmarks, the runtime was negligible). The clause reduction and the SAT runtime using DAG-aware minimization are given in Table 4 . In this particular experiment, ABC was not used, but an in-house Cadence implementation of DAG-aware minimization and incremental BMC. Ideally, we would like to test the CNF generation based on technology mapping as well, but this is currently not available in the Cadence tool. For licence reasons, IBM benchmarks could not be used in this experiment. Instead, 5 problems from the TIP-suite [1] were used, but they suffer from being too easy to solve.
Conclusions
The paper explores logic synthesis as a way to speedup the solving of circuitbased SAT problems. Two logic synthesis techniques are considered and experimentally evaluated. The first technique applies recent work on DAG-aware circuit compression to preprocess a circuit before converting it to CNF. In spirit, the approach is similar to [4] . The second technique directly produces a compact CNF through a novel adaptation of area-oriented technology mapping, measuring area in terms of CNF clauses.
Experimental results on several sets of benchmarks have shown that the proposed techniques tend to substantially reduce the runtime of SAT solving. The net result of applying both techniques is a 5x speedup in solving for hard industrial problems. At the same time, some slow-downs were observed on benchmarks from the previous year's SAT Race. This indicates that more work is needed for understanding the interaction between the circuit structure and the heuristics of a modern SAT-solver.
Acknowledgements
The authors acknowledge helpful discussions with Satrajit Chatterjee on technology mapping and, in particular, his suggestion to use the average number of fanins' fanouts as a tie-breaking heuristic in sorting cuts. Avg. red.
-29% 32% 47% 46% 56% 57% 62% Table 1 . CNF generation with different preprocessing. "(orig)" denotes the original Tseitin encoding; "D" DAG-Aware minimization; "T" CNF generation through Technology Mapping; "S" SATELITE style CNF preprocessing. On the left, the number of clauses in the CNF formulation is given, in thousands. On the right, the runtimes of applied preprocessing are summed up. No column for the time of generating CNFs through Tseitin encoding is given, as they are all less than a second. The "Cdn" problems are internal Cadence BMC problems; the "ibm" problems are IBM BMC problems from [16] ; the remaining ten problems are the "manol-pipe" problems from SAT-Race 2006 [13] back-converted by "cnf2aig" into the AIG form. Table 2 . SAT runtime with different preprocessing. "(orig)" denotes the original Tseitin encoding; "D" DAG-Aware minimization; "T" CNF generation through Technology Mapping; "S" SATELITE style CNF preprocessing. Given times do not include preprocessing, only SAT runtimes. Speedups are relative to the "(orig)" column. Table 3 . SAT runtime with different preprocessing (cont. from Table 2 ).
Nodes before and BMC runtimes before Problem after minimization and after minimization Table 4 . Incremental BMC on original and minimized AIG. The above problems all contain bugs. Runtimes are given for performing incremental BMC upto the shortest counter example. In the columns to the right of the arrows, the design has been minimized by DAG-aware rewriting before unrolling it. The node count is the number of Ands in the design. Note that in this scheme, there can be no cross-timeframe simplifications. The experiment confirms the claim in [2] of the applicability of DAG-aware circuit comparession to formal verification. The original paper only listed compression ratios and did not include runtimes. Table 5 . Comparing CNF generation through standard technology mapping and technology mapping with the cut cost function adapted for SAT. In the adapted CNF generation based on technology mapping (righthand side of arrows), the area of a LUT is defined as the number of clauses needed to represent its boolean function. In the standard technology mapping (lefthand side of arrows), each LUT has unit area "1". In both cases, the mapped design is translated to CNF by the method described in section 6.4, which introduces one variable for each LUT in the mapping. The standard technology mapping minimizes the number of LUTs, and hence will have a lower number of introduced variables. From the table it is clear that using the number of clauses as the area of a LUT gives significantly fewer clauses, and also reduces SAT runtimes.
