Partitioning Methods for Satisfiability Testing on Large Formulas  by Park, Tai Joon & Van Gelder, Allen
Information and Computation 162, 179184 (2000)
NOTE
Partitioning Methods for Satisfiability Testing
on Large Formulas
Tai Joon Park and Allen Van Gelder
Computer Science Department, 237 BE, University of California, Santa Cruz, California 95064
E-mail: avgcse.ucsc.edu
Methods for partitioning large propositional formulas are investigated,
with the goal of producing a set of smaller formulas whose satisfiability
can be determined within reasonable time frames by known algorithms.
CNF formula partitioning can be viewed as hypergraph partitioning,
which has been studied extensively in VLSI design. Although CNF for-
mulas have been considered as hypergraphs before, we found that this
viewpoint was not productive for partitioning, and we introduce a new
viewpoint in the dual hypergraph. Hypergraph partitioning technology
from VLSI design is adapted to this problem. The overall goal of
satisfiability testing requires criteria different from those used in VLSI
design. Several heuristics are described and investigated experimentally.
Some formulas from circuit applications that were extremely difficult or
impossible for existing algorithms have been solved. However, the
method is not useful on formulas with little or no structure, such as ran-
domly generated formulas.  2000 Academic Press
1. INTRODUCTION
The propositional satisfiability decision problem arises frequently as a sub-
problem in other applications, such as automated verification and automated
theorem proving. Such applications may generate very large formulas, some of
which are beyond the capabilities of known algorithms. Typically, these applica-
tions incorporate a satisfiability tester, as a subroutine, that performs well for most
of the formulas generated by the application. However, for some formulas it keeps
on running beyond acceptable time limits. What can the application do? Some
applications can afford to give up and try something else. In other cases, failure to
solve this formula is critical and the whole application fails. Our research is directed
toward providing a satisfiability tester of last resort to be brought in on critical
formulas where standard methods have failed.
This paper summarizes results presented at CADE-13 [PVG96]. The main idea
is to partition a large difficult formula into smaller formulas that (in the worst case)
must each be solved. However, due to the exponential behavior of all known
satisfiability decision algorithms, the smaller formulas may be many orders of
doi:10.1006inco.1999.2860, available online at http:www.idealibrary.com on
179 0890-540100 35.00
Copyright  2000 by Academic Press
All rights of reproduction in any form reserved.
magnitude easier for the standard satisfiability subroutine. Because of the overhead
of formula partitioning, this method would only be invoked when the standard sub-
routine was unable to solve a problem within reasonable resource limits.
The two partitioning methods (see Section 3) incorporate an existing satisfiability
tester as a subroutine. The first heuristic can be combined with any complete
satisfiability algorithm. However, the second heuristic requires limited interaction
with the underlying satisfiability algorithm and can be combined with most model-
searching algorithms, such as variants of the DavisPutnamLovelandLogemann
(DPLL) scheme [DP60, DLL62]. Our study combined with an existing tester
program showed greatly increased efficiency on several circuit formulas that were
extremely difficult or impossible for other known methods (see Fig. 3).
Both heuristics are based on partitioning the input formula into two or more
subformulas. Partitioning an input formula naturally fits into the hypergraph cut
problem, and it represents a process that analyzes the input formula structure.
Methods from VLSI design have been adapted to this problem effectively. To be
useful, the cut must achieve some degree of balance in the resulting connected com-
ponents and must be small in some sense. Except for the hyperedges that occur in
multiple subformulas, the structural analysis of the input formula results in subfor-
mulas that are independent of each other.
CNF formulas have been studied as hypergraphs before [GU89, GLP93]. The
normal approach is to define each clause as a hyperedge connecting all the
variables, or perhaps the literals, that occur in the clause. From this viewpoint the
hypergraph cut problem consists of finding a favorable set of ‘‘cut’’ clauses, such
that, if these clauses are removed from the formula, the remaining variables (the
vertices of the hypergraph) fall into two or more groups (connected components)
that are not related by any remaining clause. This natural method has not proven
successful on large formulas, for reasons discussed in Section 2.
The approach introduced here considers the dual of the above hypergraph, which is
also a hypergraph. In this new viewpoint, each variable is defined as a hyperedge con-
necting all the clauses in which it occurs. Each clause is a vertex now. In this context the
hypergraph cut problem consists of finding a favorable set of cut variables, such that, if
these variables are removed from the formula, the remaining clauses fall into two or
more groups (connected components) that are not related by any remaining variable.
2. CNF FORMULA PARTITIONING
The reason to prefer the dual view of a hypergraph over the normal approach lies
in the eventual application. At a high level, the partition is used as follows: For
each partial assignment required by the cut set, apply the assignment to the induced
subformulas F1 and F2 , making them independent. Now try to find models of F1
and F2 independently. A model in this context is a partial truth assignment that
satisfies the formula. If this process ever succeeds, a model for the entire formula
has been found. However, to demonstrate unsatisfiability it is necessary to show
that the process fails for all required partial assignments.
The difference between the two hypergraph views lies in what partial assignments are
required. For the usual view, the cut set is a set of clauses, and all partial assignments that
180 PARK AND VAN GELDER
satisfy this set of clauses are required. The number of variables in this cut set can be
significantly larger than the number of clauses, and the number of satisfying partial
assignments can be exponential in the number of variables involved. The number of
required partial assignments is not directly related to the cardinality of the cut set.
For the new view, the cut set is a set of variables. The required partial
assignments are all partial assignments to these variables that satisfy the clauses (if
any) that consist entirely of variables (positive or negative) in the cut set. While this
number is exponential also, it is directly related to the cardinality of the cut set, so
an algorithm to find small cut sets is more likely to achieve a useful partition.
As further motivation for the new hypergraph view, consider that a formula typi-
cally has more clauses than variables. In VLSI design, there are many more gates,
which correspond to vertices, than wires, which correspond to hyperedges. Thus we
expected that partitioning algorithms from that domain would transfer more
effectively for the new hypergraph view.
Given an input formula F, all the variables in F are grouped into the following
three classes: Vc , V1 , and V2 . The resulting classification of variables must guaran-
tee that there exists no clause that contains both V1 and V2 variables.
Example 1. The input formula is
F=[(v1 , v2)
C1
, (v1 , v4)
C2
, (&v1 , v2 , v4)
C3
, (&v1 , v3)
C4
, (v1 , &v3)
C5
].
The derived dual hypergraph from F is shown in Fig. 1. One possible partition of
F is
F1 =[(v1 , v2)
C1
, (v1 , v4)
C2
, (&v1 , v2 , v4)
C3
]
F2=[(&v1 , v3)
C4
, (v1 , &v5)
C5
].
The resulting status of the variables are V1=[v2 , v4], Vc=[v1], and V2=[v3].
FIG. 1. The pictorial representation of the derived hypergraph.
181PARTITIONING METHODS FOR SATISFIABILITY
The partition of F into F1 and F2 can be viewed as a hypergraph cut problem,
and it has been studied extensively in VLSIPCB CAD. Among the many available
hypergraph partitioning algorithms (see [PVG96] for detail), we implemented the
hypergraph min-cut algorithm by Fiduccia and Mattheyses [FM82].
3. THE TWO PARTITIONING HEURISTICS
The motivation behind the two partitioning heuristics is presented in Section 3.1,
and we briefly discuss how the two heuristics explore the search space imposed by
the cut set in Section 3.2. For further details, see [PVG96].
3.1. Motivation
Assume that a SAT tester S typically determines satisfiability of a formula F after
T(N) running time, where F has N variables. Assume that T(N) is given by
T(N)=A2:N
for some constants A and :. For convenience, the time unit is chosen to make
A=1. The : value indicates hardness of the formula class.
Both heuristics require an input formula F to be partitioned into F1 and F2 . Let
N1 and N2 be the number of variables in F1 and F2 , respectively, and NC be the
size of the cut set. Then, the expected running time of the two heuristics with S is
following:
T $(N)=2NC(2:N1+2:N2).
Assuming that the size of the two subformulas is balanced, T $(N) approaches
approximately 2NC - T(N), which is much smaller than T(N) when NC is small.
3.2. Exploring the Cut-Set Search Space
For each assignment to the cut variables, F1 and F2 simplify into formulas that
have no variables in common. They can then be tested independently. F is
satisfiable if and only if there is some compatible assignment to the cut variables
that makes the resulting simplifications of F1 and F2 satisfiable.
The first heuristic exhaustively generates assignments to the cut variables until a
compatible assignment is found or no more assignments can be generated. However,
the second heuristicthe main innovation presentedbegins by trying to satisfy
one of the partitioned formulas while delaying the bindings to cut variables. When
the formula can be satisfied with just a few cut variables bound, there is a potential
to greatly reduce the search space for a compatible assignment. When the subfor-
mula F1 is satisfiable, the model of F1 may not have bindings to all cut variables.
Then the don’t-care variables (unassigned cut variables) can have any binding when
searching for a model of F2 . No matter what truth assignments are made to the
don’t-care variables in F2 , those assignments cannot be conflict variables between
182 PARK AND VAN GELDER
FIG. 2. Test formulas and the resulting partitions.
F1 and F2 since there are no truth assignments made to don’t-care variables in F1 .
Thus, only the bound cut variables in the model of F1 are forced on F2 as a set of
unit clauses constraints.
4. EXPERIMENTAL RESULTS
The result of partitioning test formulas is shown in Fig. 2 (see [PVG96] for more
detail). All of the test formulas are unsatisfiable instances. A comparison of the
running time between 2cl (a model-searching method [VGT96]) and MSAT (our
partitioning method, built over 2cl, and using the second heuristic in Section 3.2)
is shown in Fig. 3. In general, MSAT resulted in significant speed gain. For exam-
ple, 2cl spent about 200 CPU hours to determine the satisfiability of c5315-3, but
for MSAT it took less than 5 CPU minutes. The extreme increase of efficiency for
some formulas was possible because the partitioning step extracts the structural
information of the input formulas, and MSAT avoids forcing unnecessary combina-
tions of truth assignments on the cut set.
5. CONCLUSION
We have introduced two heuristics that are based on partitioning an input for-
mula. These two heuristics are control programs that can incorporate an existing
SAT tester as a subroutine. For some of the circuit formulas, the two heuristics
showed a significant gain of efficiency with little (or no) modification of the existing
SAT tester. This supports our intuition that dealing with subformulas can be within
the reach of existing SAT testers although the original formula may not be.
FIG. 3. Comparison of the run time between 2cl and MSAT programs. The > symbol under the
2cl runtime column denotes that the program did not finish and was cancelled after this amount of
time. Times are CPU seconds on a Sun SPARCsystem 1041.
183PARTITIONING METHODS FOR SATISFIABILITY
From the result shown in Fig. 3, we observe that the partitioned subformulas of
pret150-75 are still hard. Since the size of the cut set is only 5, we predict that
MSAT can perform better if we decompose the subformulas further. The further
decomposition of these subformulas is feasible because of the small size of the cut
set, and its efficiency is a future research issue.
ACKNOWLEDGMENTS
Both of the authors were supported in part by NSF Grant CCR-9503830. We thank Tracy Larrabee
and her research group for providing test formulas and Yumi Tsuji for helping us in the process of
modifying her 2cl program.
Received July 27, 1998; published online September 6, 2000
REFERENCES
[DLL62] Davis, M., Logemann, G., and Loveland, D. (1962), A machine program for theorem-
proving, Commun. Assoc. Comput. Mach. 5, 394397.
[DP60] Davis, M., and Putnam, H. (1960), A computing procedure for quantification theory,
J. Assoc. Comput. Mach. 7, 201215.
[FM82] Fiduccia, C., and Mattheyses, R. (1982), A linear-time heuristic for improving network
partition, in ‘‘ACM IEEE 19th Design Automation Conference Proceedings,’’ pp. 175181.
[GLP93] Gallo, G., Longo, G., and Pallottino, S. (1993), Directed hypergraph and applications,
Discrete Appl. Math. 42, 177201.
[GU89] Gallo, G., and Urbani, G. (1989), Algorithms for testing the satisfiability of propositional
formulae, J. Logic Progr. 7, 4561.
[PVG96] Park, T. J., and Van Gelder, A. (1996), Partitioning methods for satisfiability testing on large
formulas, in ‘‘Proceedings 13th International Conference on Automated Deduction, New
Brunswick,’’ Lecture Notes in Artificial Intelligence, Vol. 1104, pp. 748762, Springer-Verlag,
Berlin.
[VGT96] Van Gelder, A., and Tsuji, Y. K. (1996), Satisfiability testing with more reasoning and less
guessing, in ‘‘Cliques, Coloring, and Satisfiability: Second DIMACS Implementation and
Challenge’’ (D. S. Johnson and M. Trick, Eds.), DIMACS Series in Discrete Mathematics
and Theoretical Computer Science, Amer. Math. Soc., New York.
184 PARK AND VAN GELDER
