127 research outputs found

    HERMIT: Mechanized Reasoning during Compilation in the Glasgow Haskell Compiler

    Get PDF
    It is difficult to write programs which are both correct and fast. A promising approach, functional programming, is based on the idea of using pure, mathematical functions to construct programs. With effort, it is possible to establish a connection between a specification written in a functional language, which has been proven correct, and a fast implementation, via program transformation. When practiced in the functional programming community, this style of reasoning is still typically performed by hand, by either modifying the source code or using pen-and-paper. Unfortunately, performing such semi-formal reasoning by directly modifying the source code often obfuscates the program, and pen-and-paper reasoning becomes outdated as the program changes over time. Even so, this semi-formal reasoning prevails because formal reasoning is time-consuming, and requires considerable expertise. Formal reasoning tools often only work for a subset of the target language, or require programs to be implemented in a custom language for reasoning. This dissertation investigates a solution, called HERMIT, which mechanizes reasoning during compilation. HERMIT can be used to prove properties about programs written in the Haskell functional programming language, or transform them to improve their performance. Reasoning in HERMIT proceeds in a style familiar to practitioners of pen-and-paper reasoning, and mechanization allows these techniques to be applied to real-world programs with greater confidence. HERMIT can also re-check recorded reasoning steps on subsequent compilations, enforcing a connection with the program as the program is developed. HERMIT is the first system capable of directly reasoning about the full Haskell language. The design and implementation of HERMIT, motivated both by typical reasoning tasks and HERMIT's place in the Haskell ecosystem, is presented in detail. Three case studies investigate HERMIT's capability to reason in practice. These case studies demonstrate that semi-formal reasoning with HERMIT lowers the barrier to writing programs which are both correct and fast

    Doctor of Philosophy

    Get PDF
    dissertationTrusted computing base (TCB) of a computer system comprises components that must be trusted in order to support its security policy. Research communities have identified the well-known minimal TCB principle, namely, the TCB of a system should be as small as possible, so that it can be thoroughly examined and verified. This dissertation is an experiment showing how small the TCB for an isolation service is based on software fault isolation (SFI) for small multitasking embedded systems. The TCB achieved by this dissertation includes just the formal definitions of isolation properties, instruction semantics, program logic, and a proof assistant, besides hardware. There is not a compiler, an assembler, a verifier, a rewriter, or an operating system in the TCB. To the best of my knowledge, this is the smallest TCB that has ever been shown for guaranteeing nontrivial properties of real binary programs on real hardware. This is accomplished by combining SFI techniques and high-confidence formal verification. An SFI implementation inserts dynamic checks before dangerous operations, and these checks provide necessary invariants needed by the formal verification to prove theorems about the isolation properties of ARM binary programs. The high-confidence assurance of the formal verification comes from two facts. First, the verification is based on an existing realistic semantics of the ARM ISA that is independently developed by Cambridge researchers. Second, the verification is conducted in a higher-order proof assistant-the HOL theorem prover, which mechanically checks every verification step by rigorous logic. In addition, the entire verification process, including both specification generation and verification, is automatic. To support proof automation, a novel program logic has been designed, and an automatic reasoning framework for verifying shallow safety properties has been developed. The program logic integrates Hoare-style reasoning and Floyd's inductive assertion reasoning together in a small set of definitions, which overcomes shortcomings of Hoare logic and facilitates proof automation. All inference rules of the logic are proven based on the instruction semantics and the logic definitions. The framework leverages abstract interpretation to automatically find function specifications required by the program logic. The results of the abstract interpretation are used to construct the function specifications automatically, and the specifications are proven without human interaction by utilizing intermediate theorems generated during the abstract interpretation. All these work in concert to create the very small TCB

    Optimizing and Incrementalizing Higher-order Collection Queries by AST Transformation

    Get PDF
    In modernen, universellen Programmiersprachen sind Abfragen auf Speicher-basierten Kollektionen oft rechenintensiver als erforderlich. Während Datenbankenabfragen vergleichsweise einfach optimiert werden können, fällt dies bei Speicher-basierten Kollektionen oft schwer, denn universelle Programmiersprachen sind in aller Regel ausdrucksstärker als Datenbanken. Insbesondere unterstützen diese Sprachen meistens verschachtelte, rekursive Datentypen und Funktionen höherer Ordnung. Kollektionsabfragen können per Hand optimiert und inkrementalisiert werden, jedoch verringert dies häufig die Modularität und ist oft zu fehleranfällig, um realisierbar zu sein oder um Instandhaltung von entstandene Programm zu gewährleisten. Die vorliegende Doktorarbeit demonstriert, wie Abfragen auf Kollektionen systematisch und automatisch optimiert und inkrementalisiert werden können, um Programmierer von dieser Last zu befreien. Die so erzeugten Programme werden in derselben Kernsprache ausgedrückt, um weitere Standardoptimierungen zu ermöglichen. Teil I entwickelt eine Variante der Scala API für Kollektionen, die Staging verwendet um Abfragen als abstrakte Syntaxbäume zu reifizieren. Auf Basis dieser Schnittstelle werden anschließend domänenspezifische Optimierungen von Programmiersprachen und Datenbanken angewandt; unter anderem werden Abfragen umgeschrieben, um vom Programmierer ausgewählte Indizes zu benutzen. Dank dieser Indizes kann eine erhebliche Beschleunigung der Ausführungsgeschwindigkeit gezeigt werden; eine experimentelle Auswertung zeigt hierbei Beschleunigungen von durchschnittlich 12x bis zu einem Maximum von 12800x. Um Programme mit Funktionen höherer Ordnung durch Programmtransformation zu inkrementalisieren, wird in Teil II eine Erweiterung der Finite-Differenzen-Methode vorgestellt [Paige and Koenig, 1982; Blakeley et al., 1986; Gupta and Mumick, 1999] und ein erster Ansatz zur Inkrementalisierung durch Programmtransformation für Programme mit Funktionen höherer Ordnung entwickelt. Dabei werden Programme zu Ableitungen transformiert, d.h. zu Programmen die Eingangsdifferenzen in Ausgangdifferenzen umwandeln. Weiterhin werden in den Kapiteln 12–13 die Korrektheit des Inkrementalisierungsansatzes für einfach-getypten und ungetypten λ-Kalkül bewiesen und Erweiterungen zu System F besprochen. Ableitungen müssen oft Ergebnisse der ursprünglichen Programme wiederverwenden. Um eine solche Wiederverwendung zu ermöglichen, erweitert Kapitel 17 die Arbeit von Liu and Teitelbaum [1995] zu Programmen mit Funktionen höherer Ordnung und entwickeln eine Programmtransformation solcher Programme im Cache-Transfer-Stil. Für eine effiziente Inkrementalisierung ist es weiterhin notwendig, passende Grundoperationen auszuwählen und manuell zu inkrementalisieren. Diese Arbeit deckt einen Großteil der wichtigsten Grundoperationen auf Kollektionen ab. Die Durchführung von Fallstudien zeigt deutliche Laufzeitverbesserungen sowohl in Praxis als auch in der asymptotischen Komplexität.In modern programming languages, queries on in-memory collections are often more expensive than needed. While database queries can be readily optimized, it is often not trivial to use them to express collection queries which employ nested data and first-class functions, as enabled by functional programming languages. Collection queries can be optimized and incrementalized by hand, but this reduces modularity, and is often too error-prone to be feasible or to enable maintenance of resulting programs. To free programmers from such burdens, in this thesis we study how to optimize and incrementalize such collection queries. Resulting programs are expressed in the same core language, so that they can be subjected to other standard optimizations. To enable optimizing collection queries which occur inside programs, we develop a staged variant of the Scala collection API that reifies queries as ASTs. On top of this interface, we adapt domain-specific optimizations from the fields of programming languages and databases; among others, we rewrite queries to use indexes chosen by programmers. Thanks to the use of indexes we show significant speedups in our experimental evaluation, with an average of 12x and a maximum of 12800x. To incrementalize higher-order programs by program transformation, we extend finite differencing [Paige and Koenig, 1982; Blakeley et al., 1986; Gupta and Mumick, 1999] and develop the first approach to incrementalization by program transformation for higher-order programs. Base programs are transformed to derivatives, programs that transform input changes to output changes. We prove that our incrementalization approach is correct: We develop the theory underlying incrementalization for simply-typed and untyped λ-calculus, and discuss extensions to System F. Derivatives often need to reuse results produced by base programs: to enable such reuse, we extend work by Liu and Teitelbaum [1995] to higher-order programs, and develop and prove correct a program transformation, converting higher-order programs to cache-transfer-style. For efficient incrementalization, it is necessary to choose and incrementalize by hand appropriate primitive operations. We incrementalize a significant subset of collection operations and perform case studies, showing order-of-magnitude speedups both in practice and in asymptotic complexity

    Formal Verification of an SSA-based Middle-end for CompCert

    Get PDF
    CompCert is a formally verified compiler that generates compact and efficient PowerPC, ARM and x86 code for a large and realistic subset of the C language. However, CompCert foregoes using Static Single Assignment (SSA), an intermediate representation that allows for writing simpler and faster optimizers, and that is used by many compilers. In fact, it has remained an open problem to verify formally an SSA-based compiler middle-end. We report on a formally verified, SSA-based, middle-end for CompCert. Our middle-end performs conversion from CompCert intermediate form to SSA form, optimization of SSA programs, including Global Value Numbering, and transforming out of SSA to intermediate form. In addition to provide the first formally verified SSA-based middle-end, we address two problems raised by Leroy in 2009: giving a simple and intuitive formal semantics to SSA, and leveraging the global properties of SSA to reason locally about program optimizations

    Compiler verification meets cross-language linking via data abstraction

    Get PDF
    Many real programs are written in multiple different programming languages, and supporting this pattern creates challenges for formal compiler verification. We describe our Coq verification of a compiler for a high-level language, such that the compiler correctness theorem allows us to derive partial-correctness Hoare-logic theorems for programs built by linking the assembly code output by our compiler and assembly code produced by other means. Our compiler supports such tricky features as storable cross-language function pointers, without giving up the usual benefits of being able to verify different compiler phases (including, in our case, two classic optimizations) independently. The key technical innovation is a mixed operational and axiomatic semantics for the source language, with a built-in notion of abstract data types, such that compiled code interfaces with other languages only through axiomatically specified methods that mutate encapsulated private data, represented in whatever formats are most natural for those languages.National Science Foundation (U.S.) (Grant CCF-1253229)United States. Defense Advanced Research Projects Agency (Agreement FA8750-12-2-0293)United States. Dept. of Energy. Office of Science (Award DE-SC0008923

    Formalizing and Verifying a Modern Build Language

    Get PDF
    CLOUDMAKE is a software utility that automatically builds executable programs and libraries from source code—a modern MAKE utility. Its design gives rise to a number of possible optimizations, like cached builds, and the exe-cutables to be built are described using a functional programming language. This paper formally and mechanically verifies the correctness of central CLOUDMAKE algorithms. The paper defines the CLOUDMAKE language using an operational semantics, but with a twist: the central operation exec is defined axiomatically, making it pluggable so that it can be replaced by calls to compilers, linkers, and other tools. The formalization and proofs of the central CLOUDMAKE algorithms are done entirely in DAFNY, the proof engine of which is an SMT-based program verifier

    A formal approach to contract verification for high-integrity applications

    Get PDF
    Doctor of PhilosophyDepartment of Computing and Information SciencesJohn M. HatcliffHigh-integrity applications are safety- and security-critical applications developed for a variety of critical tasks. The correctness of these applications must be thoroughly tested or formally verified to ensure their reliability and robustness. The major properties to be verified for the correctness of applications include: (1) functional properties, capturing the expected behaviors of a software, (2) dataflow property, tracking data dependency and preventing secret data from leaking to the public, and (3) robustness property, the ability of a program to deal with errors during execution. This dissertation presents and explores formal verification and proof technique, a promising technique using rigorous mathematical methods, to verify critical applications from the above three aspects. Our research is carried out in the context of SPARK, a programming language designed for development of safety- and security-critical applications. First, we have formalized in the Coq proof assistant the dynamic semantics for a significant subset of the SPARK 2014 language, which includes run-time checks as an integral part of the language, as any formal methods for program specification and verification depend on the unambiguous semantics of the language. Second, we have formally defined and proved the correctness of run-time checks generation and optimization based on SPARK reference semantics, and have built the certifying tools within the mechanized proof infrastructure to certify the run-time checks inserted by the GNAT compiler frontend to guarantee the absence of run-time errors. Third, we have proposed a language-based information security policy framework and the associated enforcement algorithm, which is proved to be sound with respect to the formalized program semantics. We have shown how the policy framework can be integrated into SPARK 2014 for more advanced information security analysis

    CryptOpt: Verified Compilation with Random Program Search for Cryptographic Primitives

    Full text link
    Most software domains rely on compilers to translate high-level code to multiple different machine languages, with performance not too much worse than what developers would have the patience to write directly in assembly language. However, cryptography has been an exception, where many performance-critical routines have been written directly in assembly (sometimes through metaprogramming layers). Some past work has shown how to do formal verification of that assembly, and other work has shown how to generate C code automatically along with formal proof, but with consequent performance penalties vs. the best-known assembly. We present CryptOpt, the first compilation pipeline that specializes high-level cryptographic functional programs into assembly code significantly faster than what GCC or Clang produce, with mechanized proof (in Coq) whose final theorem statement mentions little beyond the input functional program and the operational semantics of x86-64 assembly. On the optimization side, we apply randomized search through the space of assembly programs, with repeated automatic benchmarking on target CPUs. On the formal-verification side, we connect to the Fiat Cryptography framework (which translates functional programs into C-like IR code) and extend it with a new formally verified program-equivalence checker, incorporating a modest subset of known features of SMT solvers and symbolic-execution engines. The overall prototype is quite practical, e.g. producing new fastest-known implementations for the relatively new Intel i9 12G, of finite-field arithmetic for both Curve25519 (part of the TLS standard) and the Bitcoin elliptic curve secp256k1
    corecore