13 research outputs found
Community structure in industrial SAT instances
Modern SAT solvers have experienced a remarkable progress on solving industrial instances. It is believed that most of these successful techniques exploit the underlying structure of industrial instances. Recently, there have been some attempts to analyze the structure of industrial SAT instances in terms of complex networks, with the aim of explaining the success of SAT solving techniques, and possibly improving them.
In this paper, we study the community structure, or modularity, of industrial SAT instances. In a graph with clear community structure, or high modularity, we can find a partition of its nodes into communities such that most edges connect variables of the same community. Representing SAT instances as graphs, we show that most application benchmarks are characterized by a high modularity. On the contrary, random SAT instances are closer to the classical Erdös-Rényi random graph model, where no structure can be observed. We also analyze how this structure evolves by the effects of the execution of a CDCL SAT solver, and observe that new clauses learned by the solver during the search contribute to destroy the original structure of the formula. Motivated by this observation, we finally present an application that exploits the community structure to detect relevant learned clauses, and we show that detecting these clauses results in an improvement on the performance of the SAT solver. Empirically, we observe that this improves the performance of several SAT solvers on industrial SAT formulas, especially on satisfiable instances.Peer ReviewedPostprint (published version
Hide and Seek: Scaling Machine Learning for Combinatorial Optimization via the Probabilistic Method
Applying deep learning to solve real-life instances of hard combinatorial
problems has tremendous potential. Research in this direction has focused on
the Boolean satisfiability (SAT) problem, both because of its theoretical
centrality and practical importance. A major roadblock faced, though, is that
training sets are restricted to random formulas of size several orders of
magnitude smaller than formulas of practical interest, raising serious concerns
about generalization. This is because labeling random formulas of increasing
size rapidly becomes intractable. By exploiting the probabilistic method in a
fundamental way, we remove this roadblock entirely: we show how to generate
correctly labeled random formulas of any desired size, without having to solve
the underlying decision problem. Moreover, the difficulty of the classification
task for the formulas produced by our generator is tunable by varying a simple
scalar parameter. This opens up an entirely new level of sophistication for the
machine learning methods that can be brought to bear on Satisfiability. Using
our generator, we train existing state-of-the-art models for the task of
predicting satisfiability on formulas with 10,000 variables. We find that they
do no better than random guessing. As a first indication of what can be
achieved with the new generator, we present a novel classifier that performs
significantly better than random guessing 99% on the same datasets, for most
difficulty levels. Crucially, unlike past approaches that learn based on
syntactic features of a formula, our classifier performs its learning on a
short prefix of a solver's computation, an approach that we expect to be of
independent interest
Characterizing the Temperature of SAT Formulas
The remarkable advances in SAT solving achieved in the last years have allowed to use this technology to solve many real-world applications, such as planning, formal verification and cryptography, among others. Interestingly, these industrial SAT problems are commonly believed to be easier than classical random SAT formulas, but estimating their actual hardness is still a very challenging question, which in some cases even requires to solve them. In this context, realistic pseudo-industrial random SAT generators have emerged with the aim of reproducing the main features of these application problems to better understand the success of those SAT solving techniques on them. In this work, we present a model to estimate the temperature of real-world SAT instances. This temperature represents the degree of distortion into the expected structure of the formula, from highly structured benchmarks (more similar to real-world SAT instances) to the complete absence of structure (observed in the classical random SAT model). Our solution is based on the popularity–similarity random model for SAT, which has been recently presented to reproduce two crucial features of application SAT benchmarks: scale-free and community structures. This model is able to control the hardness of the generated formula by introducing some randomizations in the expected structure. Using our regression model, we observe that the estimated temperature of the applications benchmarks used in the last SAT Competitions correlates to their hardness in most of the cases.Juan de la Cierva program, fellowship IJC2019-040489-I, funded by MCIN and AE
HardSATGEN: Understanding the Difficulty of Hard SAT Formula Generation and A Strong Structure-Hardness-Aware Baseline
Industrial SAT formula generation is a critical yet challenging task.
Existing SAT generation approaches can hardly simultaneously capture the global
structural properties and maintain plausible computational hardness. We first
present an in-depth analysis for the limitation of previous learning methods in
reproducing the computational hardness of original instances, which may stem
from the inherent homogeneity in their adopted split-merge procedure. On top of
the observations that industrial formulae exhibit clear community structure and
oversplit substructures lead to the difficulty in semantic formation of logical
structures, we propose HardSATGEN, which introduces a fine-grained control
mechanism to the neural split-merge paradigm for SAT formula generation to
better recover the structural and computational properties of the industrial
benchmarks. Experiments including evaluations on private and practical
corporate testbed show the superiority of HardSATGEN being the only method to
successfully augment formulae maintaining similar computational hardness and
capturing the global structural properties simultaneously. Compared to the best
previous methods, the average performance gains achieve 38.5% in structural
statistics, 88.4% in computational metrics, and over 140.7% in the
effectiveness of guiding solver tuning by our generated instances. Source code
is available at http://github.com/Thinklab-SJTU/HardSATGENComment: Published at SIGKDD 2023, see
http://dl.acm.org/doi/10.1145/3580305.359983
Scale-Free Random SAT Instances
We focus on the random generation of SAT instances that have properties similar to
real-world instances. It is known that many industrial instances, even with a great number of
variables, can be solved by a clever solver in a reasonable amount of time. This is not possible,
in general, with classical randomly generated instances. We provide a different generation model
of SAT instances, called scale-free random SAT instances. This is based on the use of a non-uniform
probability distribution P(i) ∼ i
−β
to select variable i, where β is a parameter of the model. This
results in formulas where the number of occurrences k of variables follows a power-law distribution
P(k) ∼ k
−δ
, where δ = 1 + 1/β. This property has been observed in most real-world SAT instances.
For β = 0, our model extends classical random SAT instances. We prove the existence of a SAT–
UNSAT phase transition phenomenon for scale-free random 2-SAT instances with β < 1/2 when
the clause/variable ratio is m/n =
1−2β
(1−β)
2
. We also prove that scale-free random k-SAT instances are
unsatisfiable with a high probability when the number of clauses exceeds ω(n
(1−β)k
). The proof of
this result suggests that, when β > 1 − 1/k, the unsatisfiability of most formulas may be due to small
cores of clauses. Finally, we show how this model will allow us to generate random instances similar
to industrial instances, of interest for testing purposes.This research was supported by the project PROOFS, Grant PID2019-109137GB-C21 funded by MCIN/AEI/10.13039/501100011033
The impact of heterogeneity and geometry on the proof complexity of random satisfiability
Satisfiability is considered the canonical NP-complete problem and is used as a starting point for hardness reductions in theory, while in practice heuristic SAT solving algorithms can solve large-scale industrial SAT instances very efficiently. This disparity between theory and practice is believed to be a result of inherent properties of industrial SAT instances that make them tractable. Two characteristic properties seem to be prevalent in the majority of real-world SAT instances, heterogeneous degree distribution and locality. To understand the impact of these two properties on SAT, we study the proof complexity of random -SAT models that allow to control heterogeneity and locality. Our findings show that heterogeneity alone does not make SAT easy as heterogeneous random -SAT instances have superpolynomial resolution size. This implies intractability of these instances for modern SAT-solvers. In contrast, modeling locality with underlying geometry leads to small unsatisfiable subformulas, which can be found within polynomial time