826 research outputs found
Understanding and Enhancing CDCL-based SAT Solvers
Modern conflict-driven clause-learning (CDCL) Boolean satisfiability (SAT) solvers routinely
solve formulas from industrial domains with millions of variables and clauses, despite the Boolean
satisfiability problem being NP-complete and widely regarded as intractable in general. At the
same time, very small crafted or randomly generated formulas are often infeasible for CDCL
solvers. A commonly proposed explanation is that these solvers somehow exploit the underlying
structure inherent in industrial instances. A better understanding of the structure of Boolean
formulas not only enables improvements to modern SAT solvers, but also lends insight as to why
solvers perform well or poorly on certain types of instances. Even further, examining solvers
through the lens of these underlying structures can help to distinguish the behavior of different
solving heuristics, both in theory and practice.
The first issue we address relates to the representation of SAT formulas. A given Boolean
satisfiability problem can be represented in arbitrarily many ways, and the type of encoding can
have significant effects on SAT solver performance. Further, in some cases, a direct encoding
to SAT may not be the best choice. We introduce a new system that integrates SAT solving
with computer algebra systems (CAS) to address representation issues for several graph-theoretic
problems. We use this system to improve the bounds on several finitely-verified conjectures
related to graph-theoretic problems. We demonstrate how our approach is more appropriate for
these problems than other off-the-shelf SAT-based tools.
For more typical SAT formulas, a better understanding of their underlying structural properties,
and how they relate to SAT solving, can deepen our understanding of SAT. We perform a largescale
evaluation of many of the popular structural measures of formulas, such as community
structure, treewidth, and backdoors. We investigate how these parameters correlate with CDCL
solving time, and whether they can effectively be used to distinguish formulas from different
domains. We demonstrate how these measures can be used as a means to understand the behavior
of solvers during search. A common theme is that the solver exhibits locality during search
through the lens of these underlying structures, and that the choice of solving heuristic can greatly
influence this locality. We posit that this local behavior of modern SAT solvers is crucial to their
performance.
The remaining contributions dive deeper into two new measures of SAT formulas. We first
consider a simple measure, denoted “mergeability,” which characterizes the proportion of input
clauses pairs that can resolve and merge. We develop a formula generator that takes as input a seed
formula, and creates a sequence of increasingly more mergeable formulas, while maintaining many
of the properties of the original formula. Experiments over randomly-generated industrial-like
instances suggest that mergeability strongly negatively correlates with CDCL solving time, i.e., as
the mergeability of formulas increases, the solving time decreases, particularly for unsatisfiable
instances.
Our final contribution considers whether one of the aforementioned measures, namely backdoor
size, is influenced by solver heuristics in theory. Starting from the notion of learning-sensitive
(LS) backdoors, we consider various extensions of LS backdoors by incorporating different branching
heuristics and restart policies. We introduce learning-sensitive with restarts (LSR) backdoors
and show that, when backjumping is disallowed, LSR backdoors may be exponentially smaller
than LS backdoors. We further demonstrate that the size of LSR backdoors are dependent on the
learning scheme used during search. Finally, we present new algorithms to compute upper-bounds
on LSR backdoors that intrinsically rely upon restarts, and can be computed with a single run of
a SAT solver. We empirically demonstrate that this can often produce smaller backdoors than
previous approaches to computing LS backdoors
A Tree Locality-Sensitive Hash for Secure Software Testing
Bugs in software that make it through testing can cost tens of millions of dollars each year, and in some cases can even result in the loss of human life. In order to eliminate bugs, developers may use symbolic execution to search through possible program states looking for anomalous states. Most of the computational effort to search through these states is spent solving path constraints in order to determine the feasibility of entering each state. State merging can make this search more efficient by combining program states, allowing multiple execution paths to be analyzed at the same time. However, a merge with dissimilar path constraints dramatically increases the time necessary to solve the path constraint. Currently, there are no distance measures for path constraints, and pairwise comparison of program states is not scalable. A hashing method is presented that clusters constraints in such a way that similar constraints are placed in the same cluster without requiring pairwise comparisons between queries. When combined with other state-of-the-art state merging techniques, the hashing method allows the symbolic executor to execute more instructions per second and find more terminal execution states than the other techniques alone, without decreasing the high path coverage achieved by merging many states together
Limits of CDCL Learning via Merge Resolution
In their seminal work, Atserias et al. and independently Pipatsrisawat and
Darwiche in 2009 showed that CDCL solvers can simulate resolution proofs with
polynomial overhead. However, previous work does not address the tightness of
the simulation, i.e., the question of how large this overhead needs to be. In
this paper, we address this question by focusing on an important property of
proofs generated by CDCL solvers that employ standard learning schemes, namely
that the derivation of a learned clause has at least one inference where a
literal appears in both premises (aka, a merge literal). Specifically, we show
that proofs of this kind can simulate resolution proofs with at most a linear
overhead, but there also exist formulas where such overhead is necessary or,
more precisely, that there exist formulas with resolution proofs of linear
length that require quadratic CDCL proofs
A Survey of Symbolic Execution Techniques
Many security and software testing applications require checking whether
certain properties of a program hold for any possible usage scenario. For
instance, a tool for identifying software vulnerabilities may need to rule out
the existence of any backdoor to bypass a program's authentication. One
approach would be to test the program using different, possibly random inputs.
As the backdoor may only be hit for very specific program workloads, automated
exploration of the space of possible inputs is of the essence. Symbolic
execution provides an elegant solution to the problem, by systematically
exploring many possible execution paths at the same time without necessarily
requiring concrete inputs. Rather than taking on fully specified input values,
the technique abstractly represents them as symbols, resorting to constraint
solvers to construct actual instances that would cause property violations.
Symbolic execution has been incubated in dozens of tools developed over the
last four decades, leading to major practical breakthroughs in a number of
prominent software reliability applications. The goal of this survey is to
provide an overview of the main ideas, challenges, and solutions developed in
the area, distilling them for a broad audience.
The present survey has been accepted for publication at ACM Computing
Surveys. If you are considering citing this survey, we would appreciate if you
could use the following BibTeX entry: http://goo.gl/Hf5FvcComment: This is the authors pre-print copy. If you are considering citing
this survey, we would appreciate if you could use the following BibTeX entry:
http://goo.gl/Hf5Fv
Recognition and Exploitation of Gate Structure in SAT Solving
In der theoretischen Informatik ist das SAT-Problem der archetypische Vertreter der Klasse der NP-vollständigen Probleme, weshalb effizientes SAT-Solving im Allgemeinen als unmöglich angesehen wird.
Dennoch erzielt man in der Praxis oft erstaunliche Resultate, wo einige Anwendungen Probleme mit Millionen von Variablen erzeugen, die von neueren SAT-Solvern in angemessener Zeit gelöst werden können.
Der Erfolg von SAT-Solving in der Praxis ist auf aktuelle Implementierungen des Conflict Driven Clause-Learning (CDCL) Algorithmus zurückzuführen, dessen Leistungsfähigkeit weitgehend von den verwendeten Heuristiken abhängt, welche implizit die Struktur der in der industriellen Praxis erzeugten Instanzen ausnutzen.
In dieser Arbeit stellen wir einen neuen generischen Algorithmus zur effizienten Erkennung der Gate-Struktur in CNF-Encodings von SAT Instanzen vor, und außerdem drei Ansätze, in denen wir diese Struktur explizit ausnutzen.
Unsere Beiträge umfassen auch die Implementierung dieser Ansätze in unserem SAT-Solver Candy und die Entwicklung eines Werkzeugs für die verteilte Verwaltung von Benchmark-Instanzen und deren Attribute, der Global Benchmark Database (GBD)
FuncTeller: How Well Does eFPGA Hide Functionality?
Hardware intellectual property (IP) piracy is an emerging threat to the
global supply chain. Correspondingly, various countermeasures aim to protect
hardware IPs, such as logic locking, camouflaging, and split manufacturing.
However, these countermeasures cannot always guarantee IP security. A malicious
attacker can access the layout/netlist of the hardware IP protected by these
countermeasures and further retrieve the design. To eliminate/bypass these
vulnerabilities, a recent approach redacts the design's IP to an embedded
field-programmable gate array (eFPGA), disabling the attacker's access to the
layout/netlist. eFPGAs can be programmed with arbitrary functionality. Without
the bitstream, the attacker cannot recover the functionality of the protected
IP. Consequently, state-of-the-art attacks are inapplicable to pirate the
redacted hardware IP. In this paper, we challenge the assumed security of
eFPGA-based redaction. We present an attack to retrieve the hardware IP with
only black-box access to a programmed eFPGA. We observe the effect of modern
electronic design automation (EDA) tools on practical hardware circuits and
leverage the observation to guide our attack. Thus, our proposed method
FuncTeller selects minterms to query, recovering the circuit function within a
reasonable time. We demonstrate the effectiveness and efficiency of FuncTeller
on multiple circuits, including academic benchmark circuits, Stanford MIPS
processor, IBEX processor, Common Evaluation Platform GPS, and Cybersecurity
Awareness Worldwide competition circuits. Our results show that FuncTeller
achieves an average accuracy greater than 85% over these tested circuits
retrieving the design's functionality.Comment: To be published in the proceedings of the 32st USENIX Security
Symposium, 202
Improving Model Finding for Integrated Quantitative-qualitative Spatial Reasoning With First-order Logic Ontologies
Many spatial standards are developed to harmonize the semantics and specifications of GIS data and for sophisticated reasoning. All these standards include some types of simple and complex geometric features, and some of them incorporate simple mereotopological relations. But the relations as used in these standards, only allow the extraction of qualitative information from geometric data and lack formal semantics that link geometric representations with mereotopological or other qualitative relations. This impedes integrated reasoning over qualitative data obtained from geometric sources and “native” topological information – for example as provided from textual sources where precise locations or spatial extents are unknown or unknowable. To address this issue, the first contribution in this dissertation is a first-order logical ontology that treats geometric features (e.g. polylines, polygons) and relations between them as specializations of more general types of features (e.g. any kind of 2D or 1D features) and mereotopological relations between them. Key to this endeavor is the use of a multidimensional theory of space wherein, unlike traditional logical theories of mereotopology (like RCC), spatial entities of different dimensions can co-exist and be related. However terminating or tractable reasoning with such an expressive ontology and potentially large amounts of data is a challenging AI problem. Model finding tools used to verify FOL ontologies with data usually employ a SAT solver to determine the satisfiability of the propositional instantiations (SAT problems) of the ontology. These solvers often experience scalability issues with increasing number of objects and size and complexity of the ontology, limiting its use to ontologies with small signatures and building small models with less than 20 objects. To investigate how an ontology influences the size of its SAT translation and consequently the model finder’s performance, we develop a formalization of FOL ontologies with data. We theoretically identify parameters of an ontology that significantly contribute to the dramatic growth in size of the SAT problem. The search space of the SAT problem is exponential in the signature of the ontology (the number of predicates in the axiomatization and any additional predicates from skolemization) and the number of distinct objects in the model. Axiomatizations that contain many definitions lead to large number of SAT propositional clauses. This is from the conversion of biconditionals to clausal form. We therefore postulate that optional definitions are ideal sentences that can be eliminated from an ontology to boost model finder’s performance. We then formalize optional definition elimination (ODE) as an FOL ontology preprocessing step and test the simplification on a set of spatial benchmark problems to generate smaller SAT problems (with fewer clauses and variables) without changing the satisfiability and semantic meaning of the problem. We experimentally demonstrate that the reduction in SAT problem size also leads to improved model finding with state-of-the-art model finders, with speedups of 10-99%. Altogether, this dissertation improves spatial reasoning capabilities using FOL ontologies – in terms of a formal framework for integrated qualitative-geometric reasoning, and specific ontology preprocessing steps that can be built into automated reasoners to achieve better speedups in model finding times, and scalability with moderately-sized datasets
The 1st Verified Software Competition, Extended Experience Report
We, the organizers and participants, report our experiences
from the 1st Veried Software Competition, held in August 2010 in Edinburgh
at the VSTTE 2010 conferenc
- …