3,448 research outputs found
Community structure in industrial SAT instances
Modern SAT solvers have experienced a remarkable progress on solving industrial instances. It is believed that most of these successful techniques exploit the underlying structure of industrial instances. Recently, there have been some attempts to analyze the structure of industrial SAT instances in terms of complex networks, with the aim of explaining the success of SAT solving techniques, and possibly improving them.
In this paper, we study the community structure, or modularity, of industrial SAT instances. In a graph with clear community structure, or high modularity, we can find a partition of its nodes into communities such that most edges connect variables of the same community. Representing SAT instances as graphs, we show that most application benchmarks are characterized by a high modularity. On the contrary, random SAT instances are closer to the classical Erdös-Rényi random graph model, where no structure can be observed. We also analyze how this structure evolves by the effects of the execution of a CDCL SAT solver, and observe that new clauses learned by the solver during the search contribute to destroy the original structure of the formula. Motivated by this observation, we finally present an application that exploits the community structure to detect relevant learned clauses, and we show that detecting these clauses results in an improvement on the performance of the SAT solver. Empirically, we observe that this improves the performance of several SAT solvers on industrial SAT formulas, especially on satisfiable instances.Peer ReviewedPostprint (published version
Community Structure in Industrial SAT Instances
Modern SAT solvers have experienced a remarkable progress on solving
industrial instances. Most of the techniques have been developed after an
intensive experimental process. It is believed that these techniques exploit
the underlying structure of industrial instances. However, there are few works
trying to exactly characterize the main features of this structure.
The research community on complex networks has developed techniques of
analysis and algorithms to study real-world graphs that can be used by the SAT
community. Recently, there have been some attempts to analyze the structure of
industrial SAT instances in terms of complex networks, with the aim of
explaining the success of SAT solving techniques, and possibly improving them.
In this paper, inspired by the results on complex networks, we study the
community structure, or modularity, of industrial SAT instances. In a graph
with clear community structure, or high modularity, we can find a partition of
its nodes into communities such that most edges connect variables of the same
community. In our analysis, we represent SAT instances as graphs, and we show
that most application benchmarks are characterized by a high modularity. On the
contrary, random SAT instances are closer to the classical Erd\"os-R\'enyi
random graph model, where no structure can be observed. We also analyze how
this structure evolves by the effects of the execution of a CDCL SAT solver. In
particular, we use the community structure to detect that new clauses learned
by the solver during the search contribute to destroy the original structure of
the formula. This is, learned clauses tend to contain variables of distinct
communities
On the Hardness of SAT with Community Structure
Recent attempts to explain the effectiveness of Boolean satisfiability (SAT)
solvers based on conflict-driven clause learning (CDCL) on large industrial
benchmarks have focused on the concept of community structure. Specifically,
industrial benchmarks have been empirically found to have good community
structure, and experiments seem to show a correlation between such structure
and the efficiency of CDCL. However, in this paper we establish hardness
results suggesting that community structure is not sufficient to explain the
success of CDCL in practice. First, we formally characterize a property shared
by a wide class of metrics capturing community structure, including
"modularity". Next, we show that the SAT instances with good community
structure according to any metric with this property are still NP-hard.
Finally, we study a class of random instances generated from the
"pseudo-industrial" community attachment model of Gir\'aldez-Cru and Levy. We
prove that, with high probability, instances from this model that have
relatively few communities but are still highly modular require exponentially
long resolution proofs and so are hard for CDCL. We also present experimental
evidence that our result continues to hold for instances with many more
communities. This indicates that actual industrial instances easily solved by
CDCL may have some other relevant structure not captured by the community
attachment model.Comment: 23 pages. Full version of a SAT 2016 pape
Beyond the structure of SAT formulas
Hoy en dÃa, muchos problemas del mundo real son codificados en instancias SAT y resueltos eficientemente por modernos SAT solvers. Estos solvers, usualmente conocidos como Conflict-Driven Clause Learning (CDCL: Aprendizaje de cláusulas guiado por conflictos) SAT solvers, incluyen una variedad de sofisticadas técnicas, como el aprendizaje de cláusulas, estructuras de datos perezosas, heurÃsticas de ramificación adaptativas basadas en los conflictos, o reinicios aleatorios, entre otros. Sin embargo, las razones de su eficiencia resolviendo problemas SAT del mundo real, o industriales, son todavÃa desconocidas. La creencia común en la comunidad SAT es que estas técnicas explotan alguna estructura oculta de los problemas del mundo real. En esta tesis, primeramente se caracteriza algunas importantes caracterÃsticas de la estructura subyacente de las instancias SAT industriales. EspecÃficamente, estas son la estructura de comunidades y la estructura auto-similar. Se observa que la mayorÃa de las fórmulas SAT industriales, vistas como grafos, tienen estas dos propiedades. Esto significa que (i) en un grafo con una estructura de comunidades clara; es decir, alta modularidad, se puede encontrar una partición de sus nodos en comunidades de tal forma que la mayorÃa de las aristas conectan nodos de la misma comunidad; y (ii) en un grafo con el patrón de auto-similitud; es decir, siendo fractal, su forma se mantiene después de re-escalados (agrupando conjuntos de nodos en uno). Se analiza también cómo estas estructuras están afectadas por los efectos de las técnicas CDCL durante la búsqueda. Usando los estudios estructurales previos, se proponen tres aplicaciones. Primero, se aborda el problema de la generación aleatoria de instancias SAT pseudo-industriales usando la noción de modularidad. Nuestro modelo genera instancias similares a las (clásicas) fórmulas SAT aleatorias cuando la modularidad es baja, pero cuando este valor es alto, nuestro modelo también es adecuado para modelar problemas pseudo-industriales realÃsticamente. Segundo, se propone un método basado en la estructura en comunidades de la instancia para detectar cláusulas aprendidas relevantes. Nuestra técnica aumenta la instancia original con un conjunto de cláusulas relevantes, y esto resulta en una mejora general de la eficiencia de varios CDCL SAT solvers. Finalmente, se analiza la clasificación de instancias SAT industrial en familias usando las caracterÃsticas estructurales previamente analizadas, y se comparan con otros clasificadores comúnmente usados en aproximaciones SAT portfolio. En resumen, esta disertación extiende nuestro conocimiento sobre la estructura de las instancias SAT, con el objetivo de explicar mejor el éxito de las técnicas CDCL, con la posibilidad de mejorarlas; y propone una serie de aplicaciones basadas en este análisis de la estructura subyacente de las fórmulas SAT.Nowadays, many real-world problems are encoded into SAT instances and efficiently solved by modern SAT solvers. These solvers, usually known as Conflict-Driven Clause Learning (CDCL) SAT solvers, include a variety of sophisticated techniques, such as clause learning, lazy data structures, conflict-based adaptive branching heuristics, or random restarts, among others. However, the reasons of their efficiency in solving real-world, or industrial, SAT instances are still unknown. The common wisdom in the SAT community is that these technique exploit some hidden structure of real-world problems. In this thesis, we characterize some important features of the underlying structure of industrial SAT instances. Namely, they are the community structure and the self-similar structure. We observe that most industrial SAT formulas, viewed as graphs, have these two properties. This means that~(i) in a graph with a clear community structure, i.e. having high modularity, we can find a partition of its nodes into communities such that most edges connect nodes of the same community; and~(ii) in a graph with a self-similar pattern, i.e. being fractal, its shape is kept after re-scalings, i.e., grouping sets of nodes into a single node. We also analyze how these structures are affected by the effects of CDCL techniques during the search. Using the previous structural studies, we propose three applications. First, we face the problem of generating pseudo-industrial random SAT instances using the notion of modularity. Our model generates instances similar to (classical) random SAT formulas when the modularity is low, but when this value is high, our model is also adequate to model realistic pseudo-industrial problems. Second, we propose a method based on the community structure of the instance to detect relevant learnt clauses. Our technique augments the original instance with this set of relevant clauses, and this results into an overall improvement of the efficiency of several state-of-the-art CDCL SAT solvers. Finally, we analyze the classification of industrial SAT instances into families using the previously analyzed structure features, and we compare them to other classifiers commonly used in portfolio SAT approaches. In summary, this \dissertation extends the understandings of the structure of SAT instances, with the aim of better explaining the success of CDCL techniques and possibly improve them, and propose a number of applications based on this analysis of the underlying structure of SAT formulas
The Fractal Dimension of SAT Formulas
Modern SAT solvers have experienced a remarkable progress on solving
industrial instances. Most of the techniques have been developed after an
intensive experimental testing process. Recently, there have been some attempts
to analyze the structure of these formulas in terms of complex networks, with
the long-term aim of explaining the success of these SAT solving techniques,
and possibly improving them.
We study the fractal dimension of SAT formulas, and show that most industrial
families of formulas are self-similar, with a small fractal dimension. We also
show that this dimension is not affected by the addition of learnt clauses. We
explore how the dimension of a formula, together with other graph properties
can be used to characterize SAT instances. Finally, we give empirical evidence
that these graph properties can be used in state-of-the-art portfolios.Comment: 20 pages, 11 Postscript figure
Exploiting Resolution-based Representations for MaxSAT Solving
Most recent MaxSAT algorithms rely on a succession of calls to a SAT solver
in order to find an optimal solution. In particular, several algorithms take
advantage of the ability of SAT solvers to identify unsatisfiable subformulas.
Usually, these MaxSAT algorithms perform better when small unsatisfiable
subformulas are found early. However, this is not the case in many problem
instances, since the whole formula is given to the SAT solver in each call. In
this paper, we propose to partition the MaxSAT formula using a resolution-based
graph representation. Partitions are then iteratively joined by using a
proximity measure extracted from the graph representation of the formula. The
algorithm ends when only one partition remains and the optimal solution is
found. Experimental results show that this new approach further enhances a
state of the art MaxSAT solver to optimally solve a larger set of industrial
problem instances
Scale-Free Random SAT Instances
We focus on the random generation of SAT instances that have properties
similar to real-world instances. It is known that many industrial instances,
even with a great number of variables, can be solved by a clever solver in a
reasonable amount of time. This is not possible, in general, with classical
randomly generated instances. We provide a different generation model of SAT
instances, called \emph{scale-free random SAT instances}. It is based on the
use of a non-uniform probability distribution to select
variable , where is a parameter of the model. This results into
formulas where the number of occurrences of variables follows a power-law
distribution where . This property
has been observed in most real-world SAT instances. For , our model
extends classical random SAT instances.
We prove the existence of a SAT-UNSAT phase transition phenomenon for
scale-free random 2-SAT instances with when the clause/variable
ratio is . We also prove that scale-free
random k-SAT instances are unsatisfiable with high probability when the number
of clauses exceeds . %This implies that the SAT/UNSAT
phase transition phenomena vanishes when , and formulas are
unsatisfiable due to a small core of clauses. The proof of this result suggests
that, when , the unsatisfiability of most formulas may be due to
small cores of clauses. Finally, we show how this model will allow us to
generate random instances similar to industrial instances, of interest for
testing purposes
Automated design of boolean satisfiability solvers employing evolutionary computation
Modern society gives rise to complex problems which sometimes lend themselves to being transformed into Boolean satisfiability (SAT) decision problems; this thesis presents an example from the program understanding domain. Current conflict-driven clause learning (CDCL) SAT solvers employ all-purpose heuristics for making decisions when finding truth assignments for arbitrary logical expressions called SAT instances. The instances derived from a particular problem class exhibit a unique underlying structure which impacts a solver\u27s effectiveness. Thus, tailoring the solver heuristics to a particular problem class can significantly enhance the solver\u27s performance; however, manual specialization is very labor intensive. Automated development may apply hyper-heuristics to search program space by utilizing problem-derived building blocks. This thesis demonstrates the potential for genetic programming (GP) powered hyper-heuristic driven automated design of algorithms to create tailored CDCL solvers, in this case through custom variable scoring and learnt clause scoring heuristics, with significantly better performance on targeted classes of SAT problem instances. As the run-time of GP is often dominated by fitness evaluation, evaluating multiple offspring in parallel typically reduces the time incurred by fitness evaluation proportional to the number of parallel processing units. The naive synchronous approach requires an entire generation to be evaluated before progressing to the next generation; as such, heterogeneity in the evaluation times will degrade the performance gain, as parallel processing units will have to idle until the longest evaluation has completed. This thesis shows empirical evidence justifying the employment of an asynchronous parallel model for GP powered hyper-heuristics applied to SAT solver space, rather than the generational synchronous alternative, for gaining speed-ups in evolution time. Additionally, this thesis explores the use of a multi-objective GP to reveal the trade-off surface between multiple CDCL attributes --Abstract, page iii
- …