64,471 research outputs found
Translation validation for compilation verification
Modern optimizing compilers such as LLVM and GCC are huge and complex, and mature releases routinely have uncaught bugs. Beyond harm to software development, the lack of formal correctness guarantees for the compilation process seriously limits the guarantees other software systems can provide, since the compiler that generates the final executable cannot be trusted. These circumstances have motivated broad interest in compilation verification: providing a formal guarantee that a compilation of a program is correct.
Translation Validation is a commonly used compilation verification technique that aims to prove the correctness of a single instance of compilation, by considering only the specific input and output programs and treating the compiler mostly as a black box. Translation Validation techniques are well-suited to the compilation verification problem because they can be composed to validate a sequence of compilation steps, they can easily retrofit to existing compilers, and they can be maintained independently from the compiler itself by a separate team of formal method experts.
The basic components of a Translation Validation system are (1) a formal notion of program equivalence, (2) a verification condition generator that generates a relation between program points and variables in the input and output programs, (3) a proof system that accepts the verification conditions, generates a machine-checkable equivalence proof, and checks the proof for correctness.
Ideally, such a system is completely agnostic to the specifics of transformation from the input to the output as well as independent of the input/output languages. This allows the same system to be reused across the many transformation and translation passes found in modern compilers. However, this is not true in the state of the art: most existing systems are custom-tailored for a particular sequence of transformations, and moreover, specialized for a specific, common intermediate language for the input and output programs.
The overall goal of this work is to show that it is possible to develop a (mostly) language-independent, transformation-agnostic translation validation system with support for different input/output languages for an optimizing, production-quality compiler. In this thesis, we present such a system as well as the theoretical and practical advances needed to arrive to it.
First, we present a formal framework for program equivalence checking that is transformation-agnostic and language-independent. This framework can serve as-is as the proof system for any number of Translation Validation systems targeting different transformation and/or translation phases within an existing compiler. The basis of the framework is a rigorous formalization, namely cut-bisimulation, for weak bisimulation variants that serve as a generalization of the various (sometimes ad-hoc) notions of program equivalence found in the literature. We develop a program equivalence checking algorithm that proves two programs equivalent by reducing a proposed relation between corresponding program states to a cut-bisimulation relation. We implement this algorithm in KEQ, a new tool for checking program equivalence that accepts the operational semantics of the input and output languages as parameters, and is independent of the transformation used to generate the output. This is the first program equivalence checking tool known to the authors that is language-parametric instead of containing hard-coded language semantics as is the norm in the literature.
Then, we use KEQ as the equivalence checker for two different Translation Validation systems targeting two phases of the LLVM compiler: the Instruction Selection phase and the Register Allocation phase. The two systems share the same notion of equivalence (cut-bisimulation), the same proof system (KEQ), as well as the semantic definitions for the input/output languages (LLVM IR and x86-64 based Machine IR), which are separate artifacts and not hardcoded into the logic of the systems. The only components that are transformation-specific are the two verification condition generators. The Instruction Selection one requires minimal support from the compiler in the form of compiler-generated hints, while the Register Allocation one is employing a novel inference algorithm for register allocation and related optimizations. These systems were evaluated on the GCC SPEC 2006 benchmark, where they correctly validated 4331 / 4732 (91.52%) and 4574 / 4732 (96.67%) functions with supported features respectively
Trustworthy Refactoring via Decomposition and Schemes: A Complex Case Study
Widely used complex code refactoring tools lack a solid reasoning about the
correctness of the transformations they implement, whilst interest in proven
correct refactoring is ever increasing as only formal verification can provide
true confidence in applying tool-automated refactoring to industrial-scale
code. By using our strategic rewriting based refactoring specification
language, we present the decomposition of a complex transformation into smaller
steps that can be expressed as instances of refactoring schemes, then we
demonstrate the semi-automatic formal verification of the components based on a
theoretical understanding of the semantics of the programming language. The
extensible and verifiable refactoring definitions can be executed in our
interpreter built on top of a static analyser framework.Comment: In Proceedings VPT 2017, arXiv:1708.0688
Computational reverse mathematics and foundational analysis
Reverse mathematics studies which subsystems of second order arithmetic are
equivalent to key theorems of ordinary, non-set-theoretic mathematics. The main
philosophical application of reverse mathematics proposed thus far is
foundational analysis, which explores the limits of different foundations for
mathematics in a formally precise manner. This paper gives a detailed account
of the motivations and methodology of foundational analysis, which have
heretofore been largely left implicit in the practice. It then shows how this
account can be fruitfully applied in the evaluation of major foundational
approaches by a careful examination of two case studies: a partial realization
of Hilbert's program due to Simpson [1988], and predicativism in the extended
form due to Feferman and Sch\"{u}tte.
Shore [2010, 2013] proposes that equivalences in reverse mathematics be
proved in the same way as inequivalences, namely by considering only
-models of the systems in question. Shore refers to this approach as
computational reverse mathematics. This paper shows that despite some
attractive features, computational reverse mathematics is inappropriate for
foundational analysis, for two major reasons. Firstly, the computable
entailment relation employed in computational reverse mathematics does not
preserve justification for the foundational programs above. Secondly,
computable entailment is a complete relation, and hence employing it
commits one to theoretical resources which outstrip those available within any
foundational approach that is proof-theoretically weaker than
.Comment: Submitted. 41 page
Scheduler-specific Confidentiality for Multi-Threaded Programs and Its Logic-Based Verification
Observational determinism has been proposed in the literature as a way to ensure confidentiality for multi-threaded programs. Intuitively, a program is observationally deterministic if the behavior of the public variables is deterministic, i.e., independent of the private variables and the scheduling policy. Several formal definitions of observational determinism exist, but all of them have shortcomings; for example they accept insecure programs or they reject too many innocuous programs. Besides, the role of schedulers was ignored in all the proposed definitions. A program that is secure under one kind of scheduler might not be secure when executed with a different scheduler. The existing definitions do not ensure that an accepted program behaves securely under the scheduler that is used to deploy the program. Therefore, this paper proposes a new formalization of scheduler-specific observational determinism. It accepts programs that are secure when executed under a specific scheduler. Moreover, it is less restrictive on harmless programs under a particular scheduling policy. In addition, we discuss how compliance with our definition can be verified, using model checking. We use the idea of self-composition and we rephrase the observational determinism property for a single program as a temporal logic formula over the program executed in parallel with an independent copy of itself. Thus two states reachable during the execution of are combined into a reachable program state of the self-composed program. This allows to compare two program executions in a single temporal logic formula. The actual characterization is done in two steps. First we discuss how stuttering equivalence can be characterized as a temporal logic formula. Observational determinism is then expressed in terms of the stuttering equivalence characterization. This results in a conjunction of an LTL and a CTL formula, that are amenable to model checking
Advanced Probabilistic Couplings for Differential Privacy
Differential privacy is a promising formal approach to data privacy, which
provides a quantitative bound on the privacy cost of an algorithm that operates
on sensitive information. Several tools have been developed for the formal
verification of differentially private algorithms, including program logics and
type systems. However, these tools do not capture fundamental techniques that
have emerged in recent years, and cannot be used for reasoning about
cutting-edge differentially private algorithms. Existing techniques fail to
handle three broad classes of algorithms: 1) algorithms where privacy depends
accuracy guarantees, 2) algorithms that are analyzed with the advanced
composition theorem, which shows slower growth in the privacy cost, 3)
algorithms that interactively accept adaptive inputs.
We address these limitations with a new formalism extending apRHL, a
relational program logic that has been used for proving differential privacy of
non-interactive algorithms, and incorporating aHL, a (non-relational) program
logic for accuracy properties. We illustrate our approach through a single
running example, which exemplifies the three classes of algorithms and explores
new variants of the Sparse Vector technique, a well-studied algorithm from the
privacy literature. We implement our logic in EasyCrypt, and formally verify
privacy. We also introduce a novel coupling technique called \emph{optimal
subset coupling} that may be of independent interest
Deciding KAT and Hoare Logic with Derivatives
Kleene algebra with tests (KAT) is an equational system for program
verification, which is the combination of Boolean algebra (BA) and Kleene
algebra (KA), the algebra of regular expressions. In particular, KAT subsumes
the propositional fragment of Hoare logic (PHL) which is a formal system for
the specification and verification of programs, and that is currently the base
of most tools for checking program correctness. Both the equational theory of
KAT and the encoding of PHL in KAT are known to be decidable. In this paper we
present a new decision procedure for the equivalence of two KAT expressions
based on the notion of partial derivatives. We also introduce the notion of
derivative modulo particular sets of equations. With this we extend the
previous procedure for deciding PHL. Some experimental results are also
presented.Comment: In Proceedings GandALF 2012, arXiv:1210.202
- …