578 research outputs found

    BinGold: Towards robust binary analysis by extracting the semantics of binary code as semantic flow graphs (SFGs)

    Get PDF
    AbstractBinary analysis is useful in many practical applications, such as the detection of malware or vulnerable software components. However, our survey of the literature shows that most existing binary analysis tools and frameworks rely on assumptions about specific compilers and compilation settings. It is well known that techniques such as refactoring and light obfuscation can significantly alter the structure of code, even for simple programs. Applying such techniques or changing the compiler and compilation settings can significantly affect the accuracy of available binary analysis tools, which severely limits their practicability, especially when applied to malware. To address these issues, we propose a novel technique that extracts the semantics of binary code in terms of both data and control flow. Our technique allows more robust binary analysis because the extracted semantics of the binary code is generally immune from light obfuscation, refactoring, and varying the compilers or compilation settings. Specifically, we apply data-flow analysis to extract the semantic flow of the registers as well as the semantic components of the control flow graph, which are then synthesized into a novel representation called the semantic flow graph (SFG). Subsequently, various properties, such as reflexive, symmetric, antisymmetric, and transitive relations, are extracted from the SFG and applied to binary analysis. We implement our system in a tool called BinGold and evaluate it against thirty binary code applications. Our evaluation shows that BinGold successfully determines the similarity between binaries, yielding results that are highly robust against light obfuscation and refactoring. In addition, we demonstrate the application of BinGold to two important binary analysis tasks: binary code authorship attribution, and the detection of clone components across program executables. The promising results suggest that BinGold can be used to enhance existing techniques, making them more robust and practical

    Computational complexity of the landscape I

    Get PDF
    We study the computational complexity of the physical problem of finding vacua of string theory which agree with data, such as the cosmological constant, and show that such problems are typically NP hard. In particular, we prove that in the Bousso-Polchinski model, the problem is NP complete. We discuss the issues this raises and the possibility that, even if we were to find compelling evidence that some vacuum of string theory describes our universe, we might never be able to find that vacuum explicitly. In a companion paper, we apply this point of view to the question of how early cosmology might select a vacuum.Comment: JHEP3 Latex, 53 pp, 2 .eps figure

    Symbolic execution of verification languages and floating-point code

    Get PDF
    The focus of this thesis is a program analysis technique named symbolic execution. We present three main contributions to this field. First, an investigation into comparing several state-of-the-art program analysis tools at the level of an intermediate verification language over a large set of benchmarks, and improvements to the state-of-the-art of symbolic execution for this language. This is explored via a new tool, Symbooglix, that operates on the Boogie intermediate verification language. Second, an investigation into performing symbolic execution of floating-point programs via a standardised theory of floating-point arithmetic that is supported by several existing constraint solvers. This is investigated via two independent extensions of the KLEE symbolic execution engine to support reasoning about floating-point operations (with one tool developed by the thesis author). Third, an investigation into the use of coverage-guided fuzzing as a means for solving constraints over finite data types, inspired by the difficulties associated with solving floating-point constraints. The associated prototype tool, JFS, which builds on the LibFuzzer project, can at present be applied to a wide range of SMT queries over bit-vector and floating-point variables, and shows promise on floating-point constraints.Open Acces

    Scalable Synthesis and Verification: Towards Reliable Autonomy

    Get PDF
    We have seen the growing deployment of autonomous systems in our daily life, ranging from safety-critical self-driving cars to dialogue agents. While impactful and impressive, these systems do not often come with guarantees and are not rigorously evaluated for failure cases. This is in part due to the limited scalability of tools available for designing correct-by-construction systems, or verifying them posthoc. Another key limitation is the lack of availability of models for the complex environments with which autonomous systems often have to interact with. In the direction of overcoming these above mentioned bottlenecks to designing reliable autonomous systems, this thesis makes contributions along three fronts. First, we develop an approach for parallelized synthesis from linear-time temporal logic Specifications corresponding to the generalized reactivity (1) fragment. We begin by identifying a special case corresponding to singleton liveness goals that allows for a decomposition of the synthesis problem, which facilitates parallelized synthesis. Based on the intuition from this special case, we propose a more generalized approach for parallelized synthesis that relies on identifying equicontrollable states. Second, we consider learning-based approaches to enable verification at scale for complex systems, and for autonomous systems that interact with black-box environments. For the former, we propose a new abstraction refinement procedure based on machine learning to improve the performance of nonlinear constraint solving algorithms on large-scale problems. For the latter, we present a data-driven approach based on chance-constrained optimization that allows for a system to be evaluated for specification conformance without an accurate model of the environment. We demonstrate this approach on several tasks, including a lane-change scenario with real-world driving data. Lastly, we consider the problem of interpreting and verifying learning-based components such as neural networks. We introduce a new method based on Craig's interpolants for computing compact symbolic abstractions of pre-images for neural networks. Our approach relies on iteratively computing approximations that provably overapproximate and underapproximate the pre-images at all layers. Further, building on existing work for training neural networks for verifiability in the classification setting, we propose extensions that allow us to generalize the approach to more general architectures and temporal specifications.</p
    • …
    corecore