826 research outputs found

    Understanding and Enhancing CDCL-based SAT Solvers

    Get PDF
    Modern conflict-driven clause-learning (CDCL) Boolean satisfiability (SAT) solvers routinely solve formulas from industrial domains with millions of variables and clauses, despite the Boolean satisfiability problem being NP-complete and widely regarded as intractable in general. At the same time, very small crafted or randomly generated formulas are often infeasible for CDCL solvers. A commonly proposed explanation is that these solvers somehow exploit the underlying structure inherent in industrial instances. A better understanding of the structure of Boolean formulas not only enables improvements to modern SAT solvers, but also lends insight as to why solvers perform well or poorly on certain types of instances. Even further, examining solvers through the lens of these underlying structures can help to distinguish the behavior of different solving heuristics, both in theory and practice. The first issue we address relates to the representation of SAT formulas. A given Boolean satisfiability problem can be represented in arbitrarily many ways, and the type of encoding can have significant effects on SAT solver performance. Further, in some cases, a direct encoding to SAT may not be the best choice. We introduce a new system that integrates SAT solving with computer algebra systems (CAS) to address representation issues for several graph-theoretic problems. We use this system to improve the bounds on several finitely-verified conjectures related to graph-theoretic problems. We demonstrate how our approach is more appropriate for these problems than other off-the-shelf SAT-based tools. For more typical SAT formulas, a better understanding of their underlying structural properties, and how they relate to SAT solving, can deepen our understanding of SAT. We perform a largescale evaluation of many of the popular structural measures of formulas, such as community structure, treewidth, and backdoors. We investigate how these parameters correlate with CDCL solving time, and whether they can effectively be used to distinguish formulas from different domains. We demonstrate how these measures can be used as a means to understand the behavior of solvers during search. A common theme is that the solver exhibits locality during search through the lens of these underlying structures, and that the choice of solving heuristic can greatly influence this locality. We posit that this local behavior of modern SAT solvers is crucial to their performance. The remaining contributions dive deeper into two new measures of SAT formulas. We first consider a simple measure, denoted “mergeability,” which characterizes the proportion of input clauses pairs that can resolve and merge. We develop a formula generator that takes as input a seed formula, and creates a sequence of increasingly more mergeable formulas, while maintaining many of the properties of the original formula. Experiments over randomly-generated industrial-like instances suggest that mergeability strongly negatively correlates with CDCL solving time, i.e., as the mergeability of formulas increases, the solving time decreases, particularly for unsatisfiable instances. Our final contribution considers whether one of the aforementioned measures, namely backdoor size, is influenced by solver heuristics in theory. Starting from the notion of learning-sensitive (LS) backdoors, we consider various extensions of LS backdoors by incorporating different branching heuristics and restart policies. We introduce learning-sensitive with restarts (LSR) backdoors and show that, when backjumping is disallowed, LSR backdoors may be exponentially smaller than LS backdoors. We further demonstrate that the size of LSR backdoors are dependent on the learning scheme used during search. Finally, we present new algorithms to compute upper-bounds on LSR backdoors that intrinsically rely upon restarts, and can be computed with a single run of a SAT solver. We empirically demonstrate that this can often produce smaller backdoors than previous approaches to computing LS backdoors

    A Tree Locality-Sensitive Hash for Secure Software Testing

    Get PDF
    Bugs in software that make it through testing can cost tens of millions of dollars each year, and in some cases can even result in the loss of human life. In order to eliminate bugs, developers may use symbolic execution to search through possible program states looking for anomalous states. Most of the computational effort to search through these states is spent solving path constraints in order to determine the feasibility of entering each state. State merging can make this search more efficient by combining program states, allowing multiple execution paths to be analyzed at the same time. However, a merge with dissimilar path constraints dramatically increases the time necessary to solve the path constraint. Currently, there are no distance measures for path constraints, and pairwise comparison of program states is not scalable. A hashing method is presented that clusters constraints in such a way that similar constraints are placed in the same cluster without requiring pairwise comparisons between queries. When combined with other state-of-the-art state merging techniques, the hashing method allows the symbolic executor to execute more instructions per second and find more terminal execution states than the other techniques alone, without decreasing the high path coverage achieved by merging many states together

    Limits of CDCL Learning via Merge Resolution

    Full text link
    In their seminal work, Atserias et al. and independently Pipatsrisawat and Darwiche in 2009 showed that CDCL solvers can simulate resolution proofs with polynomial overhead. However, previous work does not address the tightness of the simulation, i.e., the question of how large this overhead needs to be. In this paper, we address this question by focusing on an important property of proofs generated by CDCL solvers that employ standard learning schemes, namely that the derivation of a learned clause has at least one inference where a literal appears in both premises (aka, a merge literal). Specifically, we show that proofs of this kind can simulate resolution proofs with at most a linear overhead, but there also exist formulas where such overhead is necessary or, more precisely, that there exist formulas with resolution proofs of linear length that require quadratic CDCL proofs

    A Survey of Symbolic Execution Techniques

    Get PDF
    Many security and software testing applications require checking whether certain properties of a program hold for any possible usage scenario. For instance, a tool for identifying software vulnerabilities may need to rule out the existence of any backdoor to bypass a program's authentication. One approach would be to test the program using different, possibly random inputs. As the backdoor may only be hit for very specific program workloads, automated exploration of the space of possible inputs is of the essence. Symbolic execution provides an elegant solution to the problem, by systematically exploring many possible execution paths at the same time without necessarily requiring concrete inputs. Rather than taking on fully specified input values, the technique abstractly represents them as symbols, resorting to constraint solvers to construct actual instances that would cause property violations. Symbolic execution has been incubated in dozens of tools developed over the last four decades, leading to major practical breakthroughs in a number of prominent software reliability applications. The goal of this survey is to provide an overview of the main ideas, challenges, and solutions developed in the area, distilling them for a broad audience. The present survey has been accepted for publication at ACM Computing Surveys. If you are considering citing this survey, we would appreciate if you could use the following BibTeX entry: http://goo.gl/Hf5FvcComment: This is the authors pre-print copy. If you are considering citing this survey, we would appreciate if you could use the following BibTeX entry: http://goo.gl/Hf5Fv

    Recognition and Exploitation of Gate Structure in SAT Solving

    Get PDF
    In der theoretischen Informatik ist das SAT-Problem der archetypische Vertreter der Klasse der NP-vollständigen Probleme, weshalb effizientes SAT-Solving im Allgemeinen als unmöglich angesehen wird. Dennoch erzielt man in der Praxis oft erstaunliche Resultate, wo einige Anwendungen Probleme mit Millionen von Variablen erzeugen, die von neueren SAT-Solvern in angemessener Zeit gelöst werden können. Der Erfolg von SAT-Solving in der Praxis ist auf aktuelle Implementierungen des Conflict Driven Clause-Learning (CDCL) Algorithmus zurückzuführen, dessen Leistungsfähigkeit weitgehend von den verwendeten Heuristiken abhängt, welche implizit die Struktur der in der industriellen Praxis erzeugten Instanzen ausnutzen. In dieser Arbeit stellen wir einen neuen generischen Algorithmus zur effizienten Erkennung der Gate-Struktur in CNF-Encodings von SAT Instanzen vor, und außerdem drei Ansätze, in denen wir diese Struktur explizit ausnutzen. Unsere Beiträge umfassen auch die Implementierung dieser Ansätze in unserem SAT-Solver Candy und die Entwicklung eines Werkzeugs für die verteilte Verwaltung von Benchmark-Instanzen und deren Attribute, der Global Benchmark Database (GBD)

    FuncTeller: How Well Does eFPGA Hide Functionality?

    Full text link
    Hardware intellectual property (IP) piracy is an emerging threat to the global supply chain. Correspondingly, various countermeasures aim to protect hardware IPs, such as logic locking, camouflaging, and split manufacturing. However, these countermeasures cannot always guarantee IP security. A malicious attacker can access the layout/netlist of the hardware IP protected by these countermeasures and further retrieve the design. To eliminate/bypass these vulnerabilities, a recent approach redacts the design's IP to an embedded field-programmable gate array (eFPGA), disabling the attacker's access to the layout/netlist. eFPGAs can be programmed with arbitrary functionality. Without the bitstream, the attacker cannot recover the functionality of the protected IP. Consequently, state-of-the-art attacks are inapplicable to pirate the redacted hardware IP. In this paper, we challenge the assumed security of eFPGA-based redaction. We present an attack to retrieve the hardware IP with only black-box access to a programmed eFPGA. We observe the effect of modern electronic design automation (EDA) tools on practical hardware circuits and leverage the observation to guide our attack. Thus, our proposed method FuncTeller selects minterms to query, recovering the circuit function within a reasonable time. We demonstrate the effectiveness and efficiency of FuncTeller on multiple circuits, including academic benchmark circuits, Stanford MIPS processor, IBEX processor, Common Evaluation Platform GPS, and Cybersecurity Awareness Worldwide competition circuits. Our results show that FuncTeller achieves an average accuracy greater than 85% over these tested circuits retrieving the design's functionality.Comment: To be published in the proceedings of the 32st USENIX Security Symposium, 202

    Improving Model Finding for Integrated Quantitative-qualitative Spatial Reasoning With First-order Logic Ontologies

    Get PDF
    Many spatial standards are developed to harmonize the semantics and specifications of GIS data and for sophisticated reasoning. All these standards include some types of simple and complex geometric features, and some of them incorporate simple mereotopological relations. But the relations as used in these standards, only allow the extraction of qualitative information from geometric data and lack formal semantics that link geometric representations with mereotopological or other qualitative relations. This impedes integrated reasoning over qualitative data obtained from geometric sources and “native” topological information – for example as provided from textual sources where precise locations or spatial extents are unknown or unknowable. To address this issue, the first contribution in this dissertation is a first-order logical ontology that treats geometric features (e.g. polylines, polygons) and relations between them as specializations of more general types of features (e.g. any kind of 2D or 1D features) and mereotopological relations between them. Key to this endeavor is the use of a multidimensional theory of space wherein, unlike traditional logical theories of mereotopology (like RCC), spatial entities of different dimensions can co-exist and be related. However terminating or tractable reasoning with such an expressive ontology and potentially large amounts of data is a challenging AI problem. Model finding tools used to verify FOL ontologies with data usually employ a SAT solver to determine the satisfiability of the propositional instantiations (SAT problems) of the ontology. These solvers often experience scalability issues with increasing number of objects and size and complexity of the ontology, limiting its use to ontologies with small signatures and building small models with less than 20 objects. To investigate how an ontology influences the size of its SAT translation and consequently the model finder’s performance, we develop a formalization of FOL ontologies with data. We theoretically identify parameters of an ontology that significantly contribute to the dramatic growth in size of the SAT problem. The search space of the SAT problem is exponential in the signature of the ontology (the number of predicates in the axiomatization and any additional predicates from skolemization) and the number of distinct objects in the model. Axiomatizations that contain many definitions lead to large number of SAT propositional clauses. This is from the conversion of biconditionals to clausal form. We therefore postulate that optional definitions are ideal sentences that can be eliminated from an ontology to boost model finder’s performance. We then formalize optional definition elimination (ODE) as an FOL ontology preprocessing step and test the simplification on a set of spatial benchmark problems to generate smaller SAT problems (with fewer clauses and variables) without changing the satisfiability and semantic meaning of the problem. We experimentally demonstrate that the reduction in SAT problem size also leads to improved model finding with state-of-the-art model finders, with speedups of 10-99%. Altogether, this dissertation improves spatial reasoning capabilities using FOL ontologies – in terms of a formal framework for integrated qualitative-geometric reasoning, and specific ontology preprocessing steps that can be built into automated reasoners to achieve better speedups in model finding times, and scalability with moderately-sized datasets
    • …
    corecore