Abstract. A propositional satis ability tester is needed as a subroutine for many applications in automated veri cation, automated theorem proving, and other elds. Such applications may generate very large formulas, some of which are beyond the capabilities of known algorithms. This paper investigates methods for partitioning such formulas e ectively, to produce smaller formulas within the reach o f k n o wn algorithms. CNF formula partitioning can be viewed as hypergraph partitioning, which has been studied extensively in VLSI design. Although CNF formulas have been considered as hypergraphs before, we found that this viewpoint w as not productive for partitioning, and we i n troduce a new viewpoint in the dual hypergraph. Hypergraph partitioning technology from VLSI design is adapted to this problem. The overall goal of satis ability testing requires criteria di erent from those used in VLSI design. Several heuristics are described, and investigated experimentally. Some formulas from circuit applications that were extremely di cult or impossible for existing algorithms have been solved. However, the method is not useful on formulas with little or no structure", such as randomly generated formulas.
Introduction
The propositional satis ability decision problem arises frequently as a subproblem in other applications. Typically, these applications incorporate a satis ability tester, as a subroutine, that performs well for most of the formulas generated by the application. However, for some formulas it keeps on running beyond acceptable time limits. What can the application do? Some applications can a ord to give up" and try something else. In other cases, failure to solve this formula is critical, and the whole application fails. Our research is directed toward providing a satis ability tester of last resort", to be brought i n o n critical formulas where standard methods have failed.
The main idea is to partition a large di cult formula into smaller formulas that in the worst case must each be solved. However, due to the exponential behavior of all known satis ability decision algorithms, the smaller formulas may be many orders of magnitude easier for the standard satis ability subroutine. Due to the overhead of formula partitioning, this method would only be invoked when the standard subroutine was unable to solve a problem within reasonable resource limits.
We present t wo partitioning heuristics for large CNF formulas. The rst heuristic can be combined with any complete satis ability algorithm. The second heuristic requires limited interaction with the underlying satis ability algorithm, and can be combined with most model-searching algorithms, such a s v ariants of the Davis-Putnam-Loveland-Logemann DPLL scheme DP60, DLL62 . Experiments indicate that both heuristics can be very e ective, but the second heuristic seems to be e ective more often.
Our heuristics are based on partitioning the input formula into two or more subformulas. Partitioning an input formula naturally ts into the hypergraph cut problem and represents a process of analyzing the structure of the input formula. To be useful, the cut must achieve some degree of balance in the resulting connected components, and must be small in some sense. Except for the hyperedges that occur in multiple subformulas, the structural analysis of the input formula results in subformulas that are independent of each other. Heuristics based on this observation can reduce the size of the search space. Our study combined with an existing tester program showed greatly increased e ciency on several circuit formulas that were extremely di cult or impossible for other known methods.
CNF formulas have been studied as hypergraphs before GU89, GLP93 . The normal approach is to de ne each clause as a hyperedge connecting all the variables, or perhaps the literals, that occur in the clause. From this viewpoint the hypergraph cut problem consists of nding a favorable set of cut" clauses, such that, if these clauses are removed from the formula, the remaining variables the vertices of the hypergraph fall into two or more groups connected components that are not related by a n y remaining clause. This natural method has not proven successful on large formulas, for reasons discussed in Section 3.
The approach i n troduced here considers the dual of the above h ypergraph, which i s a l s o a h ypergraph. In this new viewpoint, each variable is de ned as a hyperedge connecting all the clauses in which it occurs. Each clause is a vertex now. In this context the hypergraph cut problem consists of nding a favorable set of cut" variables, such that, if these variables are removed from the formula, the remaining clauses fall into two or more groups connected components that are not related by a n y remaining variable.
Summary of Results
Our algorithm is not really another SAT tester. Rather, it employs an existing SAT tester as a part of its algorithm. When a SAT tester encounters hard formulas, the same SAT tester with our algorithm can show a signi cant gain of speed. Our heuristics presented in Section 4 and Section 5 are implemented as control programs that incorporate existing SAT testers. The rst heuristic can be used with any complete SAT tester, and the second heuristic can be used with DPLL-style, and other model-search, testers. Both heuristics rely on nding a useful hypergraph partition with a manageable number of cut" variables.
Methods from VLSI design have been adapted to this problem e ectively. The idea is to search the space of satisfying assignments to the cut" variables to nd a compatible assignment for all partitions, or to ensure that no compatible assignment exists.
The main innovation presented is the second heuristic, which begins by trying to satisfy one of the partitioned formulas while delaying the bindings to cut" variables. When the formula can be satis ed with just a few cut variables bound, there is a potential to greatly reduce the search space for a compatible assignment. Intuitively, the reason is that there are many don't cares" among the cut variables, making it more exible to satisfy the second partitioned formula. Experimental evidence bears out this intuition.
Representation of propositional models with equivalence classes of literals permits a new branching scheme to be used with the second heuristic Section 5.3, for the purpose of searching the assignment space of the cut" variables e ciently. That is, the view of assignment" is generalized to include both r = true and q = p, where p is the leader" of a set of equivalent literals. Viewing the assignment as a set of equations o ers a di erent w ay to search the assignment space. Instead of dividing the space by q = 1 and q = 0, one can divide it by q = p and q 6 = p. Some sophisticated SAT testers employ literal equivalence detection, and are able to output a description of a model that includes such generalized assignments. The intuition here is that, if the SAT tester constrained q = p in one partitioned formula, then it is more likely that q 6 = p will lead immediately to unsatis ability, making it unnecessary to test q = true; p = false and q = false; p = true separately. Again, experimental evidence suggests that this technique is e ective, but a de nite conclusion is not supported.
Overview of Methodology
The main idea is to partition a CNF formula F into two subformulas, F 1 and F 2 such that they share few variables and are about the same size. Since SAT testers are exponential time, working with smaller formulas can result in a huge time reduction see Section 4.2 for discussion. However, because of the overhead of partitioning, this method is intended to deal only with hard formulas. The partition of F into F 1 and F 2 naturally ts into a hypergraph cut problem see De nition 2 in Section 2.1.
The variables that occur in both partitions are called cut variables. For each assignment to the cut variables, F 1 and F 2 simplify into formulas that have n o variables in common. They can then be tested independently. F is satis able if and only if there is some compatible assignment to the cut variables that makes the resulting simpli cations of F 1 and F 2 satis able. It is also possible that F 1 or F 2 is unsatis able in its own right, without binding any cut variables. More sophisticated possibilities are discussed later.
Related Work in Hypergraph Partitioning
In VLSI PCB CAD, the hypergraph cut problem has been studied extensively. The hypergraph min-cut bisection problem see Section 2.1 is viewed as a natural abstraction of VLSI and PCB clustering placement problems Don88 . Finding a min-cut bisection of a hypergraph is an NP-hard problem GJ79 .
One approximation method is to transform the hypergraph into a graph representation, hoping to take advantage of the existing graph partitioning algorithms. However, there is no assurance that good solutions transfer from one problem to the other. Moreover, nding the min-cut bisection of a graph is also an NP-hard problem GJ79 .
There are several classes of algorithms for nding approximate min-cut bisections of a graph KL70, K GV83, Don88, Kah91 . Spectral" methods have been studied recently Kah91, CSZ94 , which use eigenvalues of matrices that are derived from a graph representation. Even when the bisection criterion is relaxed to a ratio-cut criterion WC89 see Section 2.1, the problem is still an NP-hard problem by reduction from Bounded Min-Cut Graph Partition GJ79 .
The iterative method of Fiduccia and Mattheyses FM82 deals with a hypergraph directly instead of a graph representation. This is the method we adapted for the implementation reported here.
Preliminaries
In this section, we present some basic de nitions regarding hypergraphs in Section 2.1. We review the basic concepts regarding CNF formulas and review the basic behavior of DPLL tester in Section 2.2 and in Section 2.3.
Hypergraph De nitions
De nition1. A h ypergraph is a pair H = V;E where V = fv 1 ; : : : ; v m g is the set of vertices, and E = fe 1 ; : : : ; e n g is the set of hyperedges. Each e i V .
If each h yperedge has cardinality exactly 2, then H is an undirected graph.
De nition2. A cut in H is a partition of V into two disjoint nonempty sets V 1 and V 2 . A h yperedge e i crosses the cut, and is called a cut hyperedge, i f i t contains vertices in both V 1 and V 2 . The set of cut hyperedges is called the cut set, and is denoted as E c .
De nition3. A bisection of H is a cut which satis es jC V1 , C V2 j 1 where C Vi = jV i j. Given a weighting W of the hyperedges, the minimum bisection or min-cut bisection of H is the bisection with minimum weight WE c = P We i ; e i 2 E c .
We use We i = 1 for all e i 2 E in this study. De nition4. The minimum ratio cut problem is that given H = V;E and a weighting W of the hyperedges, nd the partition of V into two disjoint V 1 and V 2 such that WEc CV 1 CV 2 is minimized.
Notation for CNF Formulas
A positive literal is a positive occurrence of a Boolean variable, and a negative literal is the negation of a Boolean variable. The negation, or complement, of a literal q is written as ,q, with the understanding that , , q = q. A clause of a CNF formula is a disjunction of literals. A CNF formula is a conjunction of clauses.
A partial truth assignment is a partial function that maps some variables of a formula into ftrue, falseg; i t i s c o n veniently represented by the set of literals that are mapped into true. It extends to a partial function on clauses: given literal q in a partial assignment, a clause C containing q is mapped to true, and is said to be satis ed; a clause is mapped into false if all of its literals are mapped into false; otherwise the clause is not mapped by the partial assignment. If all the clauses in a formula are satis ed by some partial assignment, the formula is satis able, and the partial assignment is called a model. A model is a total model if it is a model and is a total assignment. A model can be extended to a total model by arbitrarily assigning truth values to the unassigned variables. A model is conventionally represented by the set of unit clauses that are mapped into true by the partial assignment. Note that this representation is a very simple formula. In a setting where a model is required, fq 1 ; : : : ; q k g denotes the formula of unit clauses fq 1 ; : : : ; q k g.
De nition5. In the context of a CNF formula, two literals p and q are said to be equivalent to each other if there exist two binary clauses of the following form, p; ,q and ,p; q.
The detection of an equivalence relation is useful since we can substitute one literal in the equivalence relation by the other literal in the relation, reducing the number of variables in the formula.
DPLL Style Tester
A basic model-search satis ability tester was described by D a vis et al., and the satis ability tester, S, w orks as follows. The unit clause rule and pure literal rule are examples of simpli cation rules because the formula can be made smaller without changing its satis ability status.
splitting rule If no simpli cation rule applies, choose some literal q and recursively try to add q to the partial assignment, then if that fails to produce a model, recursively try to add ,q to the partial assignment.
After applying a rule that binds a literal q into the partial assignment, a new formula is generated in which each clause containing q is deleted, and each occurrence of literal ,q in clause is removed, shortening the clause. If an empty clause is created, the new formula is unsatis able, and the procedure backtracks. If all clauses are deleted, a model has been found, and the procedure terminates.
The tester S chooses a branching literal q based on its branching heuristics. The forward guess is the assignment of true to q. I f S recursively detects a contradiction, it tries ,q alternative guess. If the alternative guess also recursively detects a contradiction, S backtracks to the previous branching variable. If there exist no more previous branching variables, S returns unsatis able.
Branching algorithms generate an enumeration tree ET. A branching heuristic determines which v ariable will have what truth assignment at each n o d e o f an ET. If the truth assignment made to a branching variable is clever, it causes a recursive application of the simpli cation rules, and therefore the size of an ET is reduced.
3 Partitioning F into F 1 and F 2
As mentioned in the introduction, the usual way to view a CNF formula as a hypergraph associates each clause with a hyperedge. However, we shall adopt a di erent view, in which each variable is associated with a hyperedge, which produces the hypergraph dual of the usual view. The reason to prefer the new view lies in the eventual application, after a partition has been found.
At a high level, the partition is used as follows: For each partial assignment required" by the cut set, apply the assignment to the induced subformulas F 1 and F 2 , making them independent. Now try to nd models independently for F 1 and F 2 . If this process ever succeeds, a model for the entire formula has been found. However, to demonstrate unsatis ability it is necessary to show that the process fails for all required" partial assignments.
The di erence between the two h ypergraph views lies in what partial assignments are required". For the usual view, the cut set is a set of clauses, and all partial assignments that satisfy this set of clauses are required". The number of variables in this cut set can be signi cantly larger than the number of clauses, and the number of satisfying partial assignments can be exponential in the number of variables involved. The number of required" partial assignments is not directly related to the cardinality of the cut set.
For the new view, the cut set is a set of variables. The required" partial assignments are all partial assignments to these variables that satisfy the clauses if any that consist entirely of variables positive or negative in the cut set. While this number is exponential also, it is directly related to the cardinality of the cut set, so an algorithm to nd small cut sets is more likely to achieve a useful partition.
As further motivation for the new hypergraph view, consider that a formula typically has more clauses than variables. In VLSI design, there are many more gates, which correspond to vertices, than wires, which correspond to hyperedges. Thus we expected that partitioning algorithms from that domain would transfer more e ectively for the new hypergraph view.
In Section 3.1, we illustrate the concept of partitioning.
1
We show a formulation of a hypergraph from the input formula in Section 3.2.
Illustration of the Concept of Partitioning
Given an input formula F , w e w ant to classify all the variables in F as a member of one of the following classes: V c , V 1 , and V 2 . The resulting classi cation of variables must guarantee that there exists no clause that contains both V 1 and V 2 variables. 
A Hypergraph Formulation
The partition of F into F 1 and F 2 can be viewed as hypergraph cut problem, and we derive a h ypergraph from a test formula. The derivation of a hypergraph from F is as follows: a clause corresponds to a vertex of a hypergraph, and the set of all clauses where a variable, v i , occurs is a hyperedge in our formulation.
If we restrict jm F1 , m F2 j 1 where m Fi is the total number of clauses in F i , the V c variables are the bisection of our hypergraph. 
The First Heuristic Based On Partitioning
In Section 4.1, we present the rst heuristic that can be combined with any existing complete SAT testers. Compared to the original SAT tester, the derived SAT tester can show increased e ciency in determining satis ability if the size of cut set is usable. In Section 4.2, we discuss the reason behind the predicted increase of e ciency, and the relationship between balancing the two subformulas and the number of cut variables is investigated.
Trying All Combinations
The basic idea of the rst heuristic is following. Assume that there are k cut variables as result of partitioning F into F 1 and F 2 . Then there exist 2 k possible combinations of truth assignments for the k cut variables. Each combination C i can be viewed as a set of prede ned constraints of unit clauses to F 1 and F 2 .
Given a combination C i , let F i 1 be C i combined with F 1 that is simpli ed, and let F i 2 be C i combined with F 2 and simpli ed. If both F i 1 and F i 2 are satis able, the formula F is satis able since there exist no con icting truth assignments between F i 1 and F i 2 . The following is the description of the rst heuristic, and we call the rst heuristic as tryAll.
1. Determine satis ability o f F 1 and F 2 . If either F 1 or F 2 is unsatis able, return unsatis able and halt. 2. Generate a combination C i . If there exists no more unique combination, return unsatis able and halt. 3. With the resulting combination C i of step 2, generate F i 1 and determine satis ability o f F i 1 . I f F i 1 is unsatis able, go to step 2. Otherwise, generate F i 2 and determine satis ability of the F i 2 . I f F i 2 is satis able, return satis able and halt. Otherwise, go to step 2. Any complete SAT tester can be used to determine satis ability in the step 1 & 3 . T h us, tryAll is a control program that uses any complete tester as a subroutine.
The reason behind Expected Increase of E ciency
Assume that a SAT tester S typically determines satis ability of a formula F after Tn running time, where F has n variables n = n Vc +n V1 +n V2 . Assume that T n is given by T n = A 2 n for some constants A and . F or convenience, the time unit is chosen to make A = 1. The value indicates hardness of the formula class.
Assume hardness classes of F i 1 and F i 2 be and respectively. Let n i V1 and n i V2 be the total numb e r o f v ariables in the subformula F i 1 and F i 2 respectively. Then, the expected running time of tryAll with S is following:
T tryall n = 2 nV c 2 n i V 1 + 2 n i V 2 In the above expression, 2 nV c indicates that the tryAll routine examines all the possible combinations of truth assignments of V c variables in the worst case.
The relationship between n Vc and balancing the numb e r o f v ariables in F 1 and F 2 is derived by trying to satisfy the following inequality, 2 2 nV c 2 n i V 1 + 2 n i V 2 2 n 1 In Eq. 1, we assume that F i 1 and F i 2 are in the same hardness class as F, i.e., = = . Let d = jn V1 , n V2 j. A su cient condition for Eq. 1 is n Vc n , d
2 2
This gives an heuristic criterion for trading o cut size and balance of the partition. In Eq. 2, as d value becomes smaller, we can have more V c variables and still improve the predicted performance by the same amount.
The Second Heuristic Based On Partitioning
We show the motivation behind the second heuristic in Section 5.1. The second heuristic to be combined with DPLL style testers is presented in Section 5.2. The existing DPLL style tester program needs to be modi ed, and the necessary modi cation is discussed in this section.
Motivation
When the resulting cut size is not small, the rst heuristic becomes impractical to apply. H o wever, we could greatly reduce the exponential search space of compatible assignment if the bindings to the cut variables are delayed. If F 1 is satis able, the model of F 1 may not have bindings to all cut variables. Then the don't care" variables unassigned cut variables can have a n y binding when searching for a model of F 2 . No matter what truth assignments are made to the don't care" variables in F 2 , those assignments cannot be con ict variables between F 1 and F 2 since there are no truth assignments made to don't care" variables in F 1 . The following example illustrates the idea. This is possible since the cut variable v 1 is a don't care" variable.
The existing DPLL style tester is modi ed to output the partial assignment instead of a total model when a formula is satis able.
Delaying Cut Variable Binding
The following is the description of the second heuristic, and we call the second heuristic as MSAT. The idea is to try to satisfy the partitioned formulas while binding as few cut variables as possible. If there exists no satisfying assignment with the few bindings, try to search among those variables, and broaden the search only when forced to do so.
1. Determine satis ability o f F 1 and F 2 . If either F 1 or F 2 is unsatis able, return unsatis able and halt. If both F 1 and F 2 are satis able, let M 1 be bindings made to the cut variables while satisfying F 1 , and let M 2 be bindings made to the cut variables while satisfying F 2 . Depending on the number of elements in M 1 and M 2 , the two subformulas and their cut bindings will be renamed so that jM 1 j j M 2 j. If an existing tester can output a partial assignment when the input formula is satis able, it is well suited for using our second heuristic. MSAT can employ the tester as a subroutine for determining satis ability i n t h e s t e p 1 & 3 .
A Modi ed Tester
MSAT w as implemented by modifying 2cl of Tsuji and Van Gelder VGT95 . 2cl can detect the equivalent literals. If two unassigned cut variables are equivalent, the equivalence must be added to the testing subformula in addition to the unit clause constraints. Otherwise, MSAT can nd M F2 that can violate the equivalence. In this case combining M F1 and M F2 does not result a model for F. Therefore, the program adds the equivalence relation two binary clauses between unassigned cut variables into M 1 or M 00 1 . When the program backtracks, a equivalence relation is negated by complementing a literal involved in the equivalence relation.
To reduce the number of assigned cut variables, we i n troduce the priority concept. The priority o f V 1 variables have higher priority v alue than the priority of V c variables. The modi ed 2cl is restricted to select a branching variables among V 1 variables rst. Later, 2cl branches among the V c variables when there is no V 1 variables left. 
Experimental Results
This section reports experimental results on di cult formulas from a variety o f sources. Nemesis, a test-pattern generation program described by Larrabee Lar92 , produced the rst ve formulas in Figure 3 , from the ISCAS-85 benchmark circuits. The rst two formulas are circuit diagnosis formulas, and the other three formulas are single stuck-at formulas. Most of formulas generated by Nemesis were relatively easy for 2cl, and did not require partitioning. The last two formulas in Figure 3 are DIMACS benchmark formulas from the second DIMACS Implementation Challenge. All of the seven formulas are unsatis able formulas. Before partitioning the input formulas, all the test formulas were simpli ed using the simpli cation rules, and the resulting formulas are shown in Figure 3 . Partitioning result of the test formulas is shown in Figure 3 , and it was obtained by the implementing linear time algorithm of Fiduccia and Mattheyses FM82 . If one of the resulting subformula is unsatis able by itself, we consider the number of branches taken to be 0. The number of branches taken is shown in the column 9 of the table Figure 3 . Both the formula c5315-1 and the formula c5315-3 were determined unsatis able at initial step of MSAT. The the size of cut set for the formula ssa2670-127 is 10. Thus there are 1024 possible combination of truth assignments for the cut variables. However, the actual number of combinations inspected by MSAT is 90. Although the number of branches taken by the formula c2670-18 is less than the formula c2670-16, the total CPU time is about 5 times more. It shows that the subformulas of the formula c2670-18 is more di cult than the subformulas of the formula c2670-16 for the given formula n m n Vc n V1 n V2 m F1 Figure 4 . In general, MSAT resulted in signi cant speed gain. For example, 2cl spent about 200 CPU hours to determine the satis ability of c5315-3, but for MSAT it took only less than 5 CPU minutes. The extreme increase of e ciency for some formulas was possible because the partitioning step extracts the structural information of the input formulas, and MSAT a voids forcing unnecessary combination of truth assignment of the cut set.
The formula pret150-75 took about 4 CPU hours for MSAT, but it only made 9 branch alternation. From this data, we observe that the partitioned subformulas are hard. However, we can also predict that MSAT can do better if we decompose the subformula of pret150-75 further. Currently, the input formula is only partitioned into two subformulas, but we could partition the subformula if the subformula is hard to solve. Therefore, we h a ve some control over hard formulas when they are encountered.
Conclusion
We h a ve i n troduced two heuristics that are based on partitioning an input formula. These two heuristic are control programs that use an existing SAT tester as a subroutine. For some of the circuit formulas, the two heuristics showed signi cant of gain of e ciency with littleor none modi cation of the existing SAT tester. This supports our intuition that dealing with subformula can be within the reach of existing SAT testers although the original formula may not be. 
