    Phase Transition and Network Structure in Realistic SAT Problems

    A fundamental question in Computer Science is understanding when a specific class of problems go from being computationally easy to hard. Because of its generality and applications, the problem of Boolean Satisfiability (aka SAT) is often used as a vehicle for investigating this question. A signal result from these studies is that the hardness of SAT problems exhibits a dramatic easy-to-hard phase transition with respect to the problem constrainedness. Past studies have however focused mostly on SAT instances generated using uniform random distributions, where all constraints are independently generated, and the problem variables are all considered of equal importance. These assumptions are unfortunately not satisfied by most real problems. Our project aims for a deeper understanding of hardness of SAT problems that arise in practice. We study two key questions: (i) How does easy-to-hard transition change with more realistic distributions that capture neighborhood sensitivity and rich-get-richer aspects of real problems and (ii) Can these changes be explained in terms of the network properties (such as node centrality and small-worldness) of the clausal networks of the SAT problems. Our results, based on extensive empirical studies and network analyses, provide important structural and computational insights into realistic SAT problems. Our extensive empirical studies show that SAT instances from realistic distributions do exhibit phase transition, but the transition occurs sooner (at lower values of constrainedness) than the instances from uniform random distribution. We show that this behavior can be explained in terms of their clausal network properties such as eigenvector centrality and small-worldness (measured indirectly in terms of the clustering coefficients and average node distance)

    The Fractal Dimension of SAT Formulas

    Modern SAT solvers have experienced a remarkable progress on solving industrial instances. Most of the techniques have been developed after an intensive experimental testing process. Recently, there have been some attempts to analyze the structure of these formulas in terms of complex networks, with the long-term aim of explaining the success of these SAT solving techniques, and possibly improving them. We study the fractal dimension of SAT formulas, and show that most industrial families of formulas are self-similar, with a small fractal dimension. We also show that this dimension is not affected by the addition of learnt clauses. We explore how the dimension of a formula, together with other graph properties can be used to characterize SAT instances. Finally, we give empirical evidence that these graph properties can be used in state-of-the-art portfolios.Comment: 20 pages, 11 Postscript figure

    Community Structure in Industrial SAT Instances

    Modern SAT solvers have experienced a remarkable progress on solving industrial instances. Most of the techniques have been developed after an intensive experimental process. It is believed that these techniques exploit the underlying structure of industrial instances. However, there are few works trying to exactly characterize the main features of this structure. The research community on complex networks has developed techniques of analysis and algorithms to study real-world graphs that can be used by the SAT community. Recently, there have been some attempts to analyze the structure of industrial SAT instances in terms of complex networks, with the aim of explaining the success of SAT solving techniques, and possibly improving them. In this paper, inspired by the results on complex networks, we study the community structure, or modularity, of industrial SAT instances. In a graph with clear community structure, or high modularity, we can find a partition of its nodes into communities such that most edges connect variables of the same community. In our analysis, we represent SAT instances as graphs, and we show that most application benchmarks are characterized by a high modularity. On the contrary, random SAT instances are closer to the classical Erd\"os-R\'enyi random graph model, where no structure can be observed. We also analyze how this structure evolves by the effects of the execution of a CDCL SAT solver. In particular, we use the community structure to detect that new clauses learned by the solver during the search contribute to destroy the original structure of the formula. This is, learned clauses tend to contain variables of distinct communities

    Finding community structure in networks using the eigenvectors of matrices

    We consider the problem of detecting communities or modules in networks, groups of vertices with a higher-than-average density of edges connecting them. Previous work indicates that a robust approach to this problem is the maximization of the benefit function known as "modularity" over possible divisions of a network. Here we show that this maximization process can be written in terms of the eigenspectrum of a matrix we call the modularity matrix, which plays a role in community detection similar to that played by the graph Laplacian in graph partitioning calculations. This result leads us to a number of possible algorithms for detecting community structure, as well as several other results, including a spectral measure of bipartite structure in networks and a new centrality measure that identifies those vertices that occupy central positions within the communities to which they belong. The algorithms and measures proposed are illustrated with applications to a variety of real-world complex networks.Comment: 22 pages, 8 figures, minor corrections in this versio

    Scale-Free Random SAT Instances

    We focus on the random generation of SAT instances that have properties similar to real-world instances. It is known that many industrial instances, even with a great number of variables, can be solved by a clever solver in a reasonable amount of time. This is not possible, in general, with classical randomly generated instances. We provide a different generation model of SAT instances, called \emph{scale-free random SAT instances}. It is based on the use of a non-uniform probability distribution P(i)iβP(i)\sim i^{-\beta} to select variable ii, where β\beta is a parameter of the model. This results into formulas where the number of occurrences kk of variables follows a power-law distribution P(k)kδP(k)\sim k^{-\delta} where δ=1+1/β\delta = 1 + 1/\beta. This property has been observed in most real-world SAT instances. For β=0\beta=0, our model extends classical random SAT instances. We prove the existence of a SAT-UNSAT phase transition phenomenon for scale-free random 2-SAT instances with β<1/2\beta<1/2 when the clause/variable ratio is m/n=12β(1β)2m/n=\frac{1-2\beta}{(1-\beta)^2}. We also prove that scale-free random k-SAT instances are unsatisfiable with high probability when the number of clauses exceeds ω(n(1β)k)\omega(n^{(1-\beta)k}). %This implies that the SAT/UNSAT phase transition phenomena vanishes when β>11/k\beta>1-1/k, and formulas are unsatisfiable due to a small core of clauses. The proof of this result suggests that, when β>11/k\beta>1-1/k, the unsatisfiability of most formulas may be due to small cores of clauses. Finally, we show how this model will allow us to generate random instances similar to industrial instances, of interest for testing purposes

    Applications of Bee Colony Optimization

    Many computationally difficult problems are attacked using non-exact algorithms, such as approximation algorithms and heuristics. This thesis investigates an ex- ample of the latter, Bee Colony Optimization, on both an established optimization problem in the form of the Quadratic Assignment Problem and the FireFighting problem, which has not been studied before as an optimization problem. Bee Colony Optimization is a swarm intelligence algorithm, a paradigm that has increased in popularity in recent years, and many of these algorithms are based on natural pro- cesses. We tested the Bee Colony Optimization algorithm on the QAPLIB library of Quadratic Assignment Problem instances, which have either optimal or best known solutions readily available, and enabled us to compare the quality of solutions found by the algorithm. In addition, we implemented a couple of other well known algorithms for the Quadratic Assignment Problem and consequently we could analyse the runtime of our algorithm. We introduce the Bee Colony Optimization algorithm for the FireFighting problem. We also implement some greedy algorithms and an Ant Colony Optimization al- gorithm for the FireFighting problem, and compare the results obtained on some randomly generated instances. We conclude that Bee Colony Optimization finds good solutions for the Quadratic Assignment Problem, however further investigation on speedup methods is needed to improve its performance to that of other algorithms. In addition, Bee Colony Optimization is effective on small instances of the FireFighting problem, however as instance size increases the results worsen in comparison to the greedy algorithms, and more work is needed to improve the decisions made on these instances

    CDCL SAT solver heuristics: Clause management, instance structure, and decisions

    The Boolean satisfiability problem or SAT is the problem of deciding if a Boolean formula has a satisfying assignment. It was the first problem shown to be NP-complete, and remains one of the most well-known and studied NP-complete problems. We do not expect to find a polynomial time algorithm that solves all SAT problems, as this would imply equivalence of the complexity classes P and NP, which seems unlikely. However, there are algorithms and heuristics to solve SAT problems that are often effective in practice. A SAT solver is a program that takes as input a Boolean formula and tries to find a satisfying assignment for it. The most-used algorithm in SAT solvers intended for solving real-world problems is known as Conflict Driven Clause Learning, abbreviated CDCL. Due to its broad usage, improving the performance of these solvers can have a large impact on other fields that use SAT solvers and also make SAT solving a useful tool for more applications. The practical performance of CDCL SAT solvers depends critically on a small number of key heuristic mechanisms, and works on these heuristics over the past 20 years have improved CDCL solver performance significantly. This dissertation contributes to our understanding of two of the key heuristic mechanisms, known as the decision heuristic and the clause database management scheme. There are two main foci, which are closely related. First, we focus on developing light weighted methods to use measures of instance structure in solver heuristics. The structure of instances arising from real-world problems seems to be one of the main features that makes them special but there is little work exploiting structural properties within CDCL solvers. We introduce a new structural measure for SAT instances, called Centrality, and show that this measure can be used in both decision and clause management heuristics to improve solver performance. Second, we study different components of clause database management schemes in order to understand and improve them. We categorize clauses as permanent and temporary, show that the permanent set is key to solver performance and propose modifications to the criteria for permanent clauses to improve performance. In recent years, clause database management strategies used in high-performance solvers have become complex, making their study and refinement difficult. We introduce a new clause reduction scheme, called online deletion, which is simple to implement and results in comparable performance