139 research outputs found
Development of a Translator from LLVM to ACL2
In our current work a library of formally verified software components is to
be created, and assembled, using the Low-Level Virtual Machine (LLVM)
intermediate form, into subsystems whose top-level assurance relies on the
assurance of the individual components. We have thus undertaken a project to
build a translator from LLVM to the applicative subset of Common Lisp accepted
by the ACL2 theorem prover. Our translator produces executable ACL2 formal
models, allowing us to both prove theorems about the translated models as well
as validate those models by testing. The resulting models can be translated and
certified without user intervention, even for code with loops, thanks to the
use of the def::ung macro which allows us to defer the question of termination.
Initial measurements of concrete execution for translated LLVM functions
indicate that performance is nearly 2.4 million LLVM instructions per second on
a typical laptop computer. In this paper we overview the translation process
and illustrate the translator's capabilities by way of a concrete example,
including both a functional correctness theorem as well as a validation test
for that example.Comment: In Proceedings ACL2 2014, arXiv:1406.123
Formalizing the SSA-based Compiler for Verified Advanced Program Transformations
Compilers are not always correct due to the complexity of language semantics and transformation algorithms, the trade-offs between compilation speed and verifiability,etc.The bugs of compilers can undermine the source-level verification efforts (such as type systems, static analysis, and formal proofs) and produce target programs with different meaning from source programs. Researchers have used mechanized proof tools to implement verified compilers that are guaranteed to preserve program semantics and proved to be more robust than ad-hoc non-verified compilers.
The goal of the dissertation is to make a step towards verifying an industrial strength modern compiler--LLVM, which has a typed, SSA-based, and general-purpose intermediate representation, therefore allowing more advanced program transformations than existing approaches. The dissertation formally defines the sequential semantics of the LLVM intermediate representation with its type system, SSA properties, memory model, and operational semantics. To design and reason about program transformations in the LLVM IR, we provide tools for interacting with the LLVM infrastructure and metatheory for SSA properties, memory safety, dynamic semantics, and control-flow-graphs. Based on the tools and metatheory, the dissertation implements verified and extractable applications for LLVM that include an interpreter for the LLVM IR, a transformation for enforcing memory safety, translation validators for local optimizations, and verified SSA construction transformation.
This dissertation shows that formal models of SSA-based compiler intermediate representations can be used to verify low-level program transformations, thereby enabling the construction of high-assurance compiler passes
A Formal Semantics of the GraalVM Intermediate Representation
The optimization phase of a compiler is responsible for transforming an
intermediate representation (IR) of a program into a more efficient form.
Modern optimizers, such as that used in the GraalVM compiler, use an IR
consisting of a sophisticated graph data structure that combines data flow and
control flow into the one structure. As part of a wider project on the
verification of optimization passes of GraalVM, this paper describes a
semantics for its IR within Isabelle/HOL. The semantics consists of a big-step
operational semantics for data nodes (which are represented in a graph-based
static single assignment (SSA) form) and a small-step operational semantics for
handling control flow including heap-based reads and writes, exceptions, and
method calls. We have proved a suite of canonicalization optimizations and
conditional elimination optimizations with respect to the semantics.Comment: 16 pages, 8 figures, to be published to ATVA 202
Generating Verified LLVM from Isabelle/HOL
We present a framework to generate verified LLVM programs from Isabelle/HOL. It is based on a code generator that generates LLVM text from a simplified fragment of LLVM, shallowly embedded into Isabelle/HOL. On top, we have developed a separation logic, a verification condition generator, and an LLVM backend to the Isabelle Refinement Framework.
As case studies, we have produced verified LLVM implementations of binary search and the Knuth-Morris-Pratt string search algorithm. These are one order of magnitude faster than the Standard-ML implementations produced with the original Refinement Framework, and on par with unverified C implementations. Adoption of the original correctness proofs to the new LLVM backend was straightforward.
The trusted code base of our approach is the shallow embedding of the LLVM fragment and the code generator, which is a pretty printer combined with some straightforward compilation steps
Static analysis of energy consumption for LLVM IR programs
Energy models can be constructed by characterizing the energy consumed by
executing each instruction in a processor's instruction set. This can be used
to determine how much energy is required to execute a sequence of assembly
instructions, without the need to instrument or measure hardware.
However, statically analyzing low-level program structures is hard, and the
gap between the high-level program structure and the low-level energy models
needs to be bridged. We have developed techniques for performing a static
analysis on the intermediate compiler representations of a program.
Specifically, we target LLVM IR, a representation used by modern compilers,
including Clang. Using these techniques we can automatically infer an estimate
of the energy consumed when running a function under different platforms, using
different compilers.
One of the challenges in doing so is that of determining an energy cost of
executing LLVM IR program segments, for which we have developed two different
approaches. When this information is used in conjunction with our analysis, we
are able to infer energy formulae that characterize the energy consumption for
a particular program. This approach can be applied to any languages targeting
the LLVM toolchain, including C and XC or architectures such as ARM Cortex-M or
XMOS xCORE, with a focus towards embedded platforms. Our techniques are
validated on these platforms by comparing the static analysis results to the
physical measurements taken from the hardware. Static energy consumption
estimation enables energy-aware software development, without requiring
hardware knowledge
Lasagne : a static binary translator for weak memory model architectures
Funding: This work was supported by a UK RISE Grant.The emergence of new architectures create a recurring challenge to ensure that existing programs still work on them. Manually porting legacy code is often impractical. Static binary translation (SBT) is a process where a program’s binary is automatically translated from one architecture to another, while preserving their original semantics. However, these SBT tools have limited support to various advanced architectural features. Importantly, they are currently unable to translate concurrent binaries. The main challenge arises from the mismatches of the memory consistency model specified by the different architectures, especially when porting existing binaries to a weak memory model architecture. In this paper, we propose Lasagne, an end-to-end static binary translator with precise translation rules between x86 and Arm concurrency semantics. First, we propose a concurrency model for Lasagne’s intermediate representation (IR) and formally proved mappings between the IR and the two architectures. The memory ordering is preserved by introducing fences in the translated code. Finally, we propose optimizations focused on raising the level of abstraction of memory address calculations and reducing the number offences. Our evaluation shows that Lasagne reduces the number of fences by up to about 65%, with an average reduction of 45.5%, significantly reducing their runtime overhead.Postprin
A Linear First-Order Functional Intermediate Language for Verified Compilers
We present the linear first-order intermediate language IL for verified
compilers. IL is a functional language with calls to a nondeterministic
environment. We give IL terms a second, imperative semantic interpretation and
obtain a register transfer language. For the imperative interpretation we
establish a notion of live variables. Based on live variables, we formulate a
decidable property called coherence ensuring that the functional and the
imperative interpretation of a term coincide. We formulate a register
assignment algorithm for IL and prove its correctness. The algorithm translates
a functional IL program into an equivalent imperative IL program. Correctness
follows from the fact that the algorithm reaches a coherent program after
consistently renaming local variables. We prove that the maximal number of live
variables in the initial program bounds the number of different variables in
the final coherent program. The entire development is formalized in Coq.Comment: Addressed comments from reviewers (ITP 2015): (1) Added discussion of
a paper in related work (2) Added definition of renamed-apart in appendix (3)
Formulation changes in a coupe of place
- …