80 research outputs found
Modular Collaborative Program Analysis
With our world increasingly relying on computers, it is important to ensure the quality, correctness, security, and performance of software systems. Static analysis that computes properties of computer programs without executing them has been an important method to achieve this for decades. However, static analysis faces major chal-
lenges in increasingly complex programming languages and software systems and increasing and sometimes conflicting demands for soundness, precision, and scalability. In order to cope with these challenges, it is necessary to build static analyses for complex problems from small, independent, yet collaborating modules that can be developed in isolation and combined in a plug-and-play manner.
So far, no generic architecture to implement and combine a broad range of dissimilar static analyses exists. The goal of this thesis is thus to design such an architecture and implement it as a generic framework for developing modular, collaborative static analyses. We use several, diverse case-study analyses from which we systematically derive requirements to guide the design of the framework. Based on this, we propose the use of a blackboard-architecture style collaboration of analyses that we implement in the OPAL framework. We also develop a formal model of our architectures core concepts and show how it enables freely composing analyses while retaining their soundness guarantees.
We showcase and evaluate our architecture using the case-study analyses, each of which shows how important and complex problems of static analysis can be addressed using a modular, collaborative implementation style. In particular, we show how a modular architecture for the construction of call graphs ensures consistent soundness of different algorithms. We show how modular analyses for different aspects of immutability mutually benefit each other. Finally, we show how the analysis of method purity can benefit from the use of other complex analyses in a collaborative manner and from exchanging different analysis implementations that exhibit different characteristics. Each of these case studies improves over the respective state of the art in terms of soundness, precision, and/or scalability and shows how our architecture enables experimenting with and fine-tuning trade-offs between these qualities
Compiler-Based Approach to Enhance BliMe Hardware Usability
Outsourced computing has emerged as an efficient platform for data processing, but it has raised security concerns due to potential exposure of sensitive data through runtime and side-channel attacks. To address these concerns, the BliMe hardware extensions offer a hardware-enforced taint tracking policy to prevent secret-dependent data exposure. However, such strict policies can hinder software usability on BliMe hardware.
While existing solutions can transform software to make it constant-time and more compatible with BliMe policies, they are not fully compatible with BliMe hardware. To strengthen the usability of BliMe hardware, we propose a compiler-based tool to detect and transform policy violations, ensuring constant-time compliance with BliMe. Our tool employs static analysis for taint tracking and employs transformation techniques including array access expansion, control-flow linearization and branchless select. We have implemented the tool on LLVM-11 to automatically convert existing source code.
We then conducted experiments on WolfSSL and OISA to examine the accuracy of the analysis and the effect of the transformations. Our evaluation indicates that our tool can successfully transform multiple code patterns. However, we acknowledge that certain code patterns are challenging to transform. Therefore, we also discuss manual approaches and explore potential future work to expand the coverage of our automatic transformations
A Type System for Julia
The Julia programming language was designed to fill the needs of scientific
computing by combining the benefits of productivity and performance languages.
Julia allows users to write untyped scripts easily without needing to worry
about many implementation details, as do other productivity languages. If one
just wants to get the work done-regardless of how efficient or general the
program might be, such a paradigm is ideal. Simultaneously, Julia also allows
library developers to write efficient generic code that can run as fast as
implementations in performance languages such as C or Fortran. This combination
of user-facing ease and library developer-facing performance has proven quite
attractive, and the language has increasing adoption.
With adoption comes combinatorial challenges to correctness. Multiple
dispatch -- Julia's key mechanism for abstraction -- allows many libraries to
compose "out of the box." However, it creates bugs where one library's
requirements do not match what another provides. Typing could address this at
the cost of Julia's flexibility for scripting.
I developed a "best of both worlds" solution: gradual typing for Julia. My
system forms the core of a gradual type system for Julia, laying the foundation
for improving the correctness of Julia programs while not getting in the way of
script writers. My framework allows methods to be individually typed or
untyped, allowing users to write untyped code that interacts with typed library
code and vice versa. Typed methods then get a soundness guarantee that is
robust in the presence of both dynamically typed code and dynamically generated
definitions. I additionally describe protocols, a mechanism for typing
abstraction over concrete implementation that accommodates one common pattern
in Julia libraries, and describe its implementation into my typed Julia
framework.Comment: PhD thesi
Computer Aided Verification
The open access two-volume set LNCS 11561 and 11562 constitutes the refereed proceedings of the 31st International Conference on Computer Aided Verification, CAV 2019, held in New York City, USA, in July 2019. The 52 full papers presented together with 13 tool papers and 2 case studies, were carefully reviewed and selected from 258 submissions. The papers were organized in the following topical sections: Part I: automata and timed systems; security and hyperproperties; synthesis; model checking; cyber-physical systems and machine learning; probabilistic systems, runtime techniques; dynamical, hybrid, and reactive systems; Part II: logics, decision procedures; and solvers; numerical programs; verification; distributed systems and networks; verification and invariants; and concurrency
Protecting applications using trusted execution environments
While cloud computing has been broadly adopted, companies that deal with sensitive data are still reluctant to do so due to privacy concerns or legal restrictions. Vulnerabilities in complex cloud infrastructures, resource sharing among tenants, and malicious insiders pose a real threat to the confidentiality and integrity of sensitive customer data. In recent years trusted execution environments (TEEs), hardware-enforced isolated regions that can protect code and data from the rest of the system, have become available as part of commodity CPUs. However, designing applications for the execution within TEEs requires careful consideration of the elevated threats that come with running in a fully untrusted environment. Interaction with the environment should be minimised, but some cooperation with the untrusted host is required, e.g. for disk and network I/O, via a host interface. Implementing this interface while maintaining the security of sensitive application code and data is a fundamental challenge.
This thesis addresses this challenge and discusses how TEEs can be leveraged to secure existing applications efficiently and effectively in untrusted environments. We explore this in the context of three systems that deal with the protection of TEE applications and their host interfaces:
SGX-LKL is a library operating system that can run full unmodified applications within TEEs with a minimal general-purpose host interface. By providing broad system support inside the TEE, the reliance on the untrusted host can be reduced to a minimal set of low-level operations that cannot be performed inside the enclave. SGX-LKL provides transparent protection of the host interface and for both disk and network I/O.
Glamdring is a framework for the semi-automated partitioning of TEE applications into an untrusted and a trusted compartment. Based on source-level annotations, it uses either dynamic or static code analysis to identify sensitive parts of an application. Taking into account the objectives of a small TCB size and low host interface complexity, it defines an application-specific host interface and generates partitioned application code.
EnclaveDB is a secure database using Intel SGX based on a partitioned in-memory database engine. The core of EnclaveDB is its logging and recovery protocol for transaction durability. For this, it relies on the database log managed and persisted by the untrusted database server. EnclaveDB protects against advanced host interface attacks and ensures the confidentiality, integrity, and freshness of sensitive data.Open Acces
Understanding and evolving the Rust programming language
Rust is a young systems programming language that aims to fill the gap between high-level languages—which provide strong static guarantees like memory and thread safety—and low-level languages—which give the programmer fine-grained control over data layout and memory management. This dissertation presents two projects establishing the first formal foundations for Rust, enabling us to better understand and evolve this important language: RustBelt and Stacked Borrows. RustBelt is a formal model of Rust’s type system, together with a soundness proof establishing memory and thread safety. The model is designed to verify the safety of a number of intricate APIs from the Rust standard library, despite the fact that the implementations of these APIs use unsafe language features. Stacked Borrows is a proposed extension of the Rust specification, which enables the compiler to use the strong aliasing information in Rust’s types to better analyze and optimize the code it is compiling. The adequacy of this specification is evaluated not only formally, but also by running real Rust code in an instrumented version of Rust’s Miri interpreter that implements the Stacked Borrows semantics. RustBelt is built on top of Iris, a language-agnostic framework, implemented in the Coq proof assistant, for building higher-order concurrent separation logics. This dissertation begins by giving an introduction to Iris, and explaining how Iris enables the derivation of complex high-level reasoning principles from a few simple ingredients. In RustBelt, this technique is exploited crucially to introduce the lifetime logic, which provides a novel separation-logic account of borrowing, a key distinguishing feature of the Rust type system.Rust ist eine junge systemnahe Programmiersprache, die es sich zum Ziel gesetzt hat, die Lücke zu schließen zwischen Sprachen mit hohem Abstraktionsniveau, die vor Speicher- und Nebenläufigkeitsfehlern schützen, und Sprachen mit niedrigem Abstraktionsniveau, welche dem Programmierer detaillierte Kontrolle über die Repräsentation von Daten und die Verwaltung des Speichers ermöglichen. Diese Dissertation stellt zwei Projekte vor, welche die ersten formalen Grundlagen für Rust zum Zwecke des besseren Verständnisses und der weiteren Entwicklung dieser wichtigen Sprache legen: RustBelt und Stacked Borrows. RustBelt ist ein formales Modell des Typsystems von Rust einschließlich eines Korrektheitsbeweises, welcher die Sicherheit von Speicherzugriffen und Nebenläufigkeit zeigt. Das Modell ist darauf ausgerichtet, einige komplexe Komponenten der Standardbibliothek von Rust zu verifizieren, obwohl die Implementierung dieser Komponenten unsichere Sprachkonstrukte verwendet. Stacked Borrows ist eine Erweiterung der Spezifikation von Rust, die es dem Compiler ermöglicht, den Quelltext mit Hilfe der im Typsystem kodierten Alias-Informationen besser zu analysieren und zu optimieren. Die Tauglichkeit dieser Spezifikation wird nicht nur formal belegt, sondern auch an echten Programmen getestet, und zwar mit Hilfe einer um Stacked Borrows erweiterten Version des Interpreters Miri. RustBelt basiert auf Iris, welches die Konstruktion von Separationslogiken für beliebige Programmiersprachen im Beweisassistenten Coq ermöglicht. Diese Dissertation beginnt mit einer Einführung in Iris und erklärt, wie komplexe Beweismethoden mit Hilfe weniger einfacher Bausteine hergeleitet werden können. In RustBelt wird diese Technik für die Umsetzung der „Lebenszeitlogik“ verwendet, einer Erweiterung der Separationslogik mit dem Konzept von „Leihgaben“ (borrows), welche eine wichtige Rolle im Typsystem von Rust spielen.This research was supported in part by a European Research Council (ERC) Consolidator Grant for the project "RustBelt", funded under the European Union’s Horizon 2020 Framework Programme (grant agreement no. 683289)
Pinpointing Software Inefficiencies With Profiling
Complex codebases with several layers of abstractions have abundant inefficiencies that affect the performance. These inefficiencies arise due to various causes such as developers\u27 inattention to performance, inappropriate choice of algorithms and inefficient code generation among others. To eliminate the redundancies, lots of work has been done during the compiling phase. However, not all redundancies can be easily detected or eliminated with compiler optimization passes due to aliasing, limited optimization scopes, and insensitivity to input and execution contexts act as severe deterrents to static program analysis. There are also profiling tools which can reveal how resources are used. However, they can hard to distinguish whether the resource is worth fully used. More profiling tools are in needed to diagnose resource wastage and pinpoint inefficiencies. We have developed three tools to pinpoint different types of inefficiencies in different granularity. We build Runtime Value Numbering (RVN), a dynamic fine-grained profiler to pinpoint and quantify redundant computations in an execution. It is based on the classical value numbering technique but works at runtime instead of compile-time. We developed RedSpy, a fine-grained profiler to pinpoint and quantify value redundancies in program executions. Value redundancy may happen overtime at the same locations or in adjacent locations, and thus it has temporal and spatial locality. RedSpy identifies both temporal and spatial value locality. Furthermore, RedSpy is capable of identifying values that are approximately the same, enabling optimization opportunities in HPC codes that often use floating-point computations. RVN and RedSpy are both instrumentation based tools. They provide comprehensive result while introducing high space and time overhead. Our lightweight framework, Witch, samples consecutive accesses to the same memory location by exploiting two ubiquitous hardware features: the performance monitoring units (PMU) and debug registers. Witch performs no instrumentation. Hence, witchcraft - tools built atop Witch - can detect a variety of software inefficiencies while introducing negligible slowdown and insignificant memory consumption and yet maintaining accuracy comparable to exhaustive instrumentation tools. Witch allowed us to scale our analysis to a large number of codebases. All the tools work on fully optimized binary executable and provide insightful optimization guidance by apportioning redundancies to their provenance - source lines and full calling contexts. We apply RVN, RedSpy, and Witch on programs that were optimization targets for decades and guided by the tools, we were able to eliminate redundancies that resulted in significant speedups
- …