69,778 research outputs found

    CryptOpt: Verified Compilation with Random Program Search for Cryptographic Primitives

    Full text link
    Most software domains rely on compilers to translate high-level code to multiple different machine languages, with performance not too much worse than what developers would have the patience to write directly in assembly language. However, cryptography has been an exception, where many performance-critical routines have been written directly in assembly (sometimes through metaprogramming layers). Some past work has shown how to do formal verification of that assembly, and other work has shown how to generate C code automatically along with formal proof, but with consequent performance penalties vs. the best-known assembly. We present CryptOpt, the first compilation pipeline that specializes high-level cryptographic functional programs into assembly code significantly faster than what GCC or Clang produce, with mechanized proof (in Coq) whose final theorem statement mentions little beyond the input functional program and the operational semantics of x86-64 assembly. On the optimization side, we apply randomized search through the space of assembly programs, with repeated automatic benchmarking on target CPUs. On the formal-verification side, we connect to the Fiat Cryptography framework (which translates functional programs into C-like IR code) and extend it with a new formally verified program-equivalence checker, incorporating a modest subset of known features of SMT solvers and symbolic-execution engines. The overall prototype is quite practical, e.g. producing new fastest-known implementations for the relatively new Intel i9 12G, of finite-field arithmetic for both Curve25519 (part of the TLS standard) and the Bitcoin elliptic curve secp256k1

    Compiler verification for fun and profit

    Get PDF
    International audienceFormal verification of software or hardware systems — be it by model checking, deductive verification, abstract interpretation, type checking, or any other kind of static analysis — is generally conducted over high-level programming or description languages, quite remote from the actual machine code and circuits that execute in the system. To bridge this particular gap, we all rely on compilers and other code generators to automatically produce the executable artifact. Compilers are, however, vulnerable to miscompilation: bugs in the compiler that cause incorrect code to be generated from a correct source code, possibly invalidating the guarantees so painfully obtained by source-level formal verification. Recent experimental studies show that many widely-used production-quality compilers suffer from miscompilation.The formal verification of compilers and related code generators is a radical, mathematically-grounded answer to the miscompilation issue. By applying formal verification (typically, interactive theorem proving) to the compiler itself, it is possible to guarantee that the compiler preserves the semantics of the source programs it transforms, or at least preserves the properties of interest that were formally verified over the source programs. Proving the correctness of compilers is an old idea that took a long time to scale all the way to realistic compilers. In the talk, I give an overview of CompCert C, a moderately-optimizing compiler for almost all of the ISO C 99 language that has been formally verified using the Coq proof assistant.The CompCert project is one point in a space of code generators whose verification deserves attention. For example, functional languages and object-oriented languages raise the issue of jointly verifying the compiler and the run-time system (memory management, exception handling, etc) that the generated code depends on. At the other end of the expressiveness spectrum, synchronous languages and hardware description languages also raise interesting verified generation issues, as exemplified by Pnueli’s seminal work on translation validation for Signal and Braibant and Chlipala’s recent work on verified hardware synthesis.Orthogonally, the integration of verification tools and compilers that are both verified against a shared formal semantics opens fascinating opportunities for “super-optimizations” that generate better code by exploiting the properties of the source code that were formally verified

    The FormAI Dataset: Generative AI in Software Security Through the Lens of Formal Verification

    Full text link
    This paper presents the FormAI dataset, a large collection of 112, 000 AI-generated compilable and independent C programs with vulnerability classification. We introduce a dynamic zero-shot prompting technique constructed to spawn diverse programs utilizing Large Language Models (LLMs). The dataset is generated by GPT-3.5-turbo and comprises programs with varying levels of complexity. Some programs handle complicated tasks like network management, table games, or encryption, while others deal with simpler tasks like string manipulation. Every program is labeled with the vulnerabilities found within the source code, indicating the type, line number, and vulnerable function name. This is accomplished by employing a formal verification method using the Efficient SMT-based Bounded Model Checker (ESBMC), which uses model checking, abstract interpretation, constraint programming, and satisfiability modulo theories to reason over safety/security properties in programs. This approach definitively detects vulnerabilities and offers a formal model known as a counterexample, thus eliminating the possibility of generating false positive reports. We have associated the identified vulnerabilities with Common Weakness Enumeration (CWE) numbers. We make the source code available for the 112, 000 programs, accompanied by a separate file containing the vulnerabilities detected in each program, making the dataset ideal for training LLMs and machine learning algorithms. Our study unveiled that according to ESBMC, 51.24% of the programs generated by GPT-3.5 contained vulnerabilities, thereby presenting considerable risks to software safety and security.Comment: https://github.com/FormAI-Datase

    Source Code Matching for Reuse of Formal Specifications

    Get PDF
    Although Software Verification technology is rapidly advancing, the process of formally specifying the intended behaviour of a program can still be difficult and time consuming as the program increases in size and complexity. In this project we focus on the source code matching module of ArĂ­s (Analogical Reasoning for reuse of Implementation & Specification) platform in which we aim to increase the number of verified programs by reducing the effort of writing specifications. Our approach promotes the advantages of code reuse and the possibility of transferring specifications between similar implementations. In order to effectively compare two source code files we represent them using Conceptual Graphs that allow us to explore the semantic content of the code while also analysing its structural properties using graph-based techniques. For comparing two conceptual graphs, we propose to use an incremental matching algorithm based on IAM (the Incremental Analogy Machine (Keane, et al., 1994)) and find the best mapping between isomorphic (exact matches) or homomorphic (non-identical) sub-graphs. We further develop analogical inferences from the acquired mapping using the CWSG (Copy With Substitution and Generation) algorithm for pattern completion and generate new specifications into our target/problem code. Finally, we present our evaluation and show that between structurally similar programs, the formal specifications can be fully transferred and successfully verified. Our overall results are very encouraging and clearly show the potential of reusing formal specifications in creating more dependable software systems

    Function extraction

    Get PDF
    AbstractLow-level imperative programming languages typically have complex operational semantics (e.g. derived from an underlying processor architecture). In this paper, we describe an automatic method for extracting recursive functions from such low-level programs. The functions are derived by formal deduction from the semantics of the programming language. For each function extracted, a proof of correspondence to the original program is automatically constructed. Subsequent program verification can then be done without referring to the details of the low-level programming language semantics at all: it suffices to prove properties of the extracted function. The technique is explained for simple while programs and also for the machine code of a widely used processor. We show how heuristics can enhance the output from the function extractor/decompiler and how the technique aids implementation of a trustworthy compiler. Our tools have been implemented in the HOL4 theorem prover

    Mechanized semantics

    Get PDF
    The goal of this lecture is to show how modern theorem provers---in this case, the Coq proof assistant---can be used to mechanize the specification of programming languages and their semantics, and to reason over individual programs and over generic program transformations, as typically found in compilers. The topics covered include: operational semantics (small-step, big-step, definitional interpreters); a simple form of denotational semantics; axiomatic semantics and Hoare logic; generation of verification conditions, with application to program proof; compilation to virtual machine code and its proof of correctness; an example of an optimizing program transformation (dead code elimination) and its proof of correctness

    Matrix Code

    Full text link
    Matrix Code gives imperative programming a mathematical semantics and heuristic power comparable in quality to functional and logic programming. A program in Matrix Code is developed incrementally from a specification in pre/post-condition form. The computations of a code matrix are characterized by powers of the matrix when it is interpreted as a transformation in a space of vectors of logical conditions. Correctness of a code matrix is expressed in terms of a fixpoint of the transformation. The abstract machine for Matrix Code is the dual-state machine, which we present as a variant of the classical finite-state machine.Comment: 39 pages, 19 figures; extensions and minor correction
    • …
    corecore