13 research outputs found

    Community structure in industrial SAT instances

    Get PDF
    Modern SAT solvers have experienced a remarkable progress on solving industrial instances. It is believed that most of these successful techniques exploit the underlying structure of industrial instances. Recently, there have been some attempts to analyze the structure of industrial SAT instances in terms of complex networks, with the aim of explaining the success of SAT solving techniques, and possibly improving them. In this paper, we study the community structure, or modularity, of industrial SAT instances. In a graph with clear community structure, or high modularity, we can find a partition of its nodes into communities such that most edges connect variables of the same community. Representing SAT instances as graphs, we show that most application benchmarks are characterized by a high modularity. On the contrary, random SAT instances are closer to the classical Erdös-Rényi random graph model, where no structure can be observed. We also analyze how this structure evolves by the effects of the execution of a CDCL SAT solver, and observe that new clauses learned by the solver during the search contribute to destroy the original structure of the formula. Motivated by this observation, we finally present an application that exploits the community structure to detect relevant learned clauses, and we show that detecting these clauses results in an improvement on the performance of the SAT solver. Empirically, we observe that this improves the performance of several SAT solvers on industrial SAT formulas, especially on satisfiable instances.Peer ReviewedPostprint (published version

    Hide and Seek: Scaling Machine Learning for Combinatorial Optimization via the Probabilistic Method

    Full text link
    Applying deep learning to solve real-life instances of hard combinatorial problems has tremendous potential. Research in this direction has focused on the Boolean satisfiability (SAT) problem, both because of its theoretical centrality and practical importance. A major roadblock faced, though, is that training sets are restricted to random formulas of size several orders of magnitude smaller than formulas of practical interest, raising serious concerns about generalization. This is because labeling random formulas of increasing size rapidly becomes intractable. By exploiting the probabilistic method in a fundamental way, we remove this roadblock entirely: we show how to generate correctly labeled random formulas of any desired size, without having to solve the underlying decision problem. Moreover, the difficulty of the classification task for the formulas produced by our generator is tunable by varying a simple scalar parameter. This opens up an entirely new level of sophistication for the machine learning methods that can be brought to bear on Satisfiability. Using our generator, we train existing state-of-the-art models for the task of predicting satisfiability on formulas with 10,000 variables. We find that they do no better than random guessing. As a first indication of what can be achieved with the new generator, we present a novel classifier that performs significantly better than random guessing 99% on the same datasets, for most difficulty levels. Crucially, unlike past approaches that learn based on syntactic features of a formula, our classifier performs its learning on a short prefix of a solver's computation, an approach that we expect to be of independent interest

    Characterizing the Temperature of SAT Formulas

    Get PDF
    The remarkable advances in SAT solving achieved in the last years have allowed to use this technology to solve many real-world applications, such as planning, formal verification and cryptography, among others. Interestingly, these industrial SAT problems are commonly believed to be easier than classical random SAT formulas, but estimating their actual hardness is still a very challenging question, which in some cases even requires to solve them. In this context, realistic pseudo-industrial random SAT generators have emerged with the aim of reproducing the main features of these application problems to better understand the success of those SAT solving techniques on them. In this work, we present a model to estimate the temperature of real-world SAT instances. This temperature represents the degree of distortion into the expected structure of the formula, from highly structured benchmarks (more similar to real-world SAT instances) to the complete absence of structure (observed in the classical random SAT model). Our solution is based on the popularity–similarity random model for SAT, which has been recently presented to reproduce two crucial features of application SAT benchmarks: scale-free and community structures. This model is able to control the hardness of the generated formula by introducing some randomizations in the expected structure. Using our regression model, we observe that the estimated temperature of the applications benchmarks used in the last SAT Competitions correlates to their hardness in most of the cases.Juan de la Cierva program, fellowship IJC2019-040489-I, funded by MCIN and AE

    Generating Random Instances of Weighted Model Counting:An Empirical Analysis with Varying Primal Treewidth

    Get PDF

    HardSATGEN: Understanding the Difficulty of Hard SAT Formula Generation and A Strong Structure-Hardness-Aware Baseline

    Full text link
    Industrial SAT formula generation is a critical yet challenging task. Existing SAT generation approaches can hardly simultaneously capture the global structural properties and maintain plausible computational hardness. We first present an in-depth analysis for the limitation of previous learning methods in reproducing the computational hardness of original instances, which may stem from the inherent homogeneity in their adopted split-merge procedure. On top of the observations that industrial formulae exhibit clear community structure and oversplit substructures lead to the difficulty in semantic formation of logical structures, we propose HardSATGEN, which introduces a fine-grained control mechanism to the neural split-merge paradigm for SAT formula generation to better recover the structural and computational properties of the industrial benchmarks. Experiments including evaluations on private and practical corporate testbed show the superiority of HardSATGEN being the only method to successfully augment formulae maintaining similar computational hardness and capturing the global structural properties simultaneously. Compared to the best previous methods, the average performance gains achieve 38.5% in structural statistics, 88.4% in computational metrics, and over 140.7% in the effectiveness of guiding solver tuning by our generated instances. Source code is available at http://github.com/Thinklab-SJTU/HardSATGENComment: Published at SIGKDD 2023, see http://dl.acm.org/doi/10.1145/3580305.359983

    Scale-Free Random SAT Instances

    Get PDF
    We focus on the random generation of SAT instances that have properties similar to real-world instances. It is known that many industrial instances, even with a great number of variables, can be solved by a clever solver in a reasonable amount of time. This is not possible, in general, with classical randomly generated instances. We provide a different generation model of SAT instances, called scale-free random SAT instances. This is based on the use of a non-uniform probability distribution P(i) ∼ i −β to select variable i, where β is a parameter of the model. This results in formulas where the number of occurrences k of variables follows a power-law distribution P(k) ∼ k −δ , where δ = 1 + 1/β. This property has been observed in most real-world SAT instances. For β = 0, our model extends classical random SAT instances. We prove the existence of a SAT– UNSAT phase transition phenomenon for scale-free random 2-SAT instances with β < 1/2 when the clause/variable ratio is m/n = 1−2β (1−β) 2 . We also prove that scale-free random k-SAT instances are unsatisfiable with a high probability when the number of clauses exceeds ω(n (1−β)k ). The proof of this result suggests that, when β > 1 − 1/k, the unsatisfiability of most formulas may be due to small cores of clauses. Finally, we show how this model will allow us to generate random instances similar to industrial instances, of interest for testing purposes.This research was supported by the project PROOFS, Grant PID2019-109137GB-C21 funded by MCIN/AEI/10.13039/501100011033

    Proceedings of SAT Competition 2017 : Solver and Benchmark Descriptions

    Get PDF

    The impact of heterogeneity and geometry on the proof complexity of random satisfiability

    Get PDF
    Satisfiability is considered the canonical NP-complete problem and is used as a starting point for hardness reductions in theory, while in practice heuristic SAT solving algorithms can solve large-scale industrial SAT instances very efficiently. This disparity between theory and practice is believed to be a result of inherent properties of industrial SAT instances that make them tractable. Two characteristic properties seem to be prevalent in the majority of real-world SAT instances, heterogeneous degree distribution and locality. To understand the impact of these two properties on SAT, we study the proof complexity of random -SAT models that allow to control heterogeneity and locality. Our findings show that heterogeneity alone does not make SAT easy as heterogeneous random -SAT instances have superpolynomial resolution size. This implies intractability of these instances for modern SAT-solvers. In contrast, modeling locality with underlying geometry leads to small unsatisfiable subformulas, which can be found within polynomial time