27 research outputs found
Towards A Practical High-Assurance Systems Programming Language
Writing correct and performant low-level systems code is a notoriously demanding job, even for experienced developers. To make the matter worse, formally reasoning about their correctness properties introduces yet another level of complexity to the task. It requires considerable expertise in both systems programming and formal verification. The development can be extremely costly due to the sheer complexity of the systems and the nuances in them, if not assisted with appropriate tools that provide abstraction and automation.
Cogent is designed to alleviate the burden on developers when writing and verifying systems code. It is a high-level functional language with a certifying compiler, which automatically proves the correctness of the compiled code and also provides a purely functional abstraction of the low-level program to the developer. Equational reasoning techniques can then be used to prove functional correctness properties of the program on top of this abstract semantics, which is notably less laborious than directly verifying the C code.
To make Cogent a more approachable and effective tool for developing real-world systems, we further strengthen the framework by extending the core language and its ecosystem. Specifically, we enrich the language to allow users to control the memory representation of algebraic data types, while retaining the automatic proof with a data layout refinement calculus. We repurpose existing tools in a novel way and develop an intuitive foreign function interface, which provides users a seamless experience when using Cogent in conjunction with native C. We augment the Cogent ecosystem with a property-based testing framework, which helps developers better understand the impact formal verification has on their programs and enables a progressive approach to producing high-assurance systems. Finally we explore refinement type systems, which we plan to incorporate into Cogent for more expressiveness and better integration of systems programmers with the verification process
Software Engineering with Incomplete Information
Information may be the common currency of the universe, the stuff of creation. As the physicist John Wheeler claimed, we get ``it from bit''. Measuring information, however, is a hard problem. Knowing the meaning of information is a hard problem. Directing the movement of information is a hard problem. This hardness comes when our information about information is incomplete. Yet we need to offer decision making guidance, to the computer or developer, when facing this incompleteness. This work addresses this insufficiency within the universe of software engineering.
This thesis addresses the first problem by demonstrating that obtaining the relative magnitude of information flow is computationally less expensive than an exact measurement. We propose ranked information flow, or RIF, where different flows are ordered according to their FlowForward, a new measure designed for ease of ordering. To demonstrate the utility of FlowForward, we introduce information contour maps: heatmapped callgraphs of information flow within software. These maps serve multiple engineering uses, such as security and refactoring.
By mixing a type system with RIF, we address the problem of meaning. Information security is a common concern in software engineering. We present OaST, the world's first gradual security type system that replaces dynamic monitoring with information theoretic risk assessment. OaST now contextualises FlowForward within a formally verified framework: secure program components communicate over insecure channels ranked by how much information flows through them. This context helps the developer interpret the flows and enables security policy discovery, adaptation and refactoring.
Finally, we introduce safestrings, a type-based system for controlling how the information embedded within a string moves through a program. This takes a structural approach, whereby a string subtype is a more precise, information limited, subset of string, ie a string that contains an email address, rather than anything else
Modular and type-safe definition of Attribute Grammars with AspectAG
AspectAG is a Haskell-embedded domain-specific language (EDSL) that encodes first-class attribute grammars (AGs). AspectAG ensures the wellformedness of AGs at compile time by using extensible records and predicates encoded using old-fashioned type-level programming features, such as multiparameter type classes and functional dependencies. AspectAG suffers the usual drawbacks of EDSLs: when type errors occur they usually do not deliver error messages that refer to domain terms, but to the host language. Often, implementation details of the EDSL are leaked in those messages. The use of type-level programming techniques makes the situation worse since type-level abstraction mechanisms are quite poor. Additionally, old-fashioned type-level programs are untyped at type-level, which is inconsistent with the general approach of strongly-typed functional programming. By using modern Haskell extensions and techniques we propose a reworked version of AspectAG that tackles those weaknesses. New AG definitions are safer, both at the level of types and at the level of kinds. Furthemore, a set of identified domain-specific errors are reported with DSL-oriented messages. To achieve this, we define and use a framework for manipulating type errors that can be used in any EDSL. We show the pragmatics of AspectAG by defining languages and extending them both with new syntax and semantics. We use MateFun, a purelyfunctional language used to teach mathematics as a case study.AspectAG es un lenguaje de dominio especÃfico embebido (EDSL) que codifica gramáticas de atributos (AGs) como ciudadanos de primera clase. AspectAG garantiza la buena formación de las AGs en tiempo de compilación por medio del uso de registros extensibles y predicados, codificados gracias al uso de caracterÃsticas antiguas de programación a nivel de tipos, como clases multiparámetro y dependencias funcionales. AspectAG sufre las desventajas usuales de los EDSLs: cuando ocurren errores de tipado, los mensajes de error reportados no se expresan en términos del dominio, sino del lenguaje anfitrión. También es usual que detalles de implementación del EDSL se vean filtrados en estos mensajes. El uso de técnicas de programación a nivel de tipos agrava la situación porque los mecanismos de abstracción a nivel de tipos son pobres. Ademas, las técnicas de programación a nivel de tipos usadas en AspectAG son esencialmente no tipadas, lo que es inconsistente con nuestro enfoque de tipado fuerte. Usando extensiones modernas al sistema de tipos de Haskell, proponemos una nueva versión de la biblioteca AspectAG, abordando los problemas antes mencionados. Las nuevas definiciones de AGs son mas seguras tanto a nivel de tipado como a nivel de kinds (tipado a nivel de tipos). Ademas, un conjunto identificado de errores especÃficos del dominio son reportados con mensajes referentes al mismo. Para lograr esto, definimos y utilizamos un framework para manipular errores de tipado, que puede ser aplicado a cualquier EDSL. Mostramos la pragmática de AspectAG definiendo lenguajes y extendiéndoles con nueva sintaxis y con nueva semántica. Utilizamos el lenguaje MateFun, un lenguaje funcional puro utilizado para enseñar matemáticas como caso de estudio
A functional approach to heterogeneous computing in embedded systems
Developing programs for embedded systems presents quite a challenge; not only should programs be resource efficient, as they operate under memory and timing constraints, but they should also take full advantage of the hardware to achieve maximum performance. Since performance is such a significant factor in the design of embedded systems, modern systems typically incorporate more than one kind of processing element to benefit from specialized processing capabilities. For such heterogeneous systems the challenge in developing programs is even greater.In this thesis we explore a functional approach to heterogeneous system development as a means to address many of the modularity problems that are typically found in the application of low-level imperative programming for embedded systems. In particular, we explore a staged hardware software co-design language that we name Co-Feldspar and embed in Haskell. The staged approach enables designers to build their applications from reusable components and skeletons while retaining control over much of the generated source code. Furthermore, by embedding the language in Haskell we can exploit its type classes to write not only hardware and software programs, but also generic programs with overloaded instructions and expressions. We demonstrate the usefulness of the functional approach for co-design on a cryptographic example and signal processing filters, and benchmark software and mixed hardware-software implementations. Co-Feldspar currently adopts a monadic interface, which provides an imperative functional programming style that is suitable for explicit memory management and algorithms that rely on a certain evaluation order. For algorithms that are better defined as pure functions operating on immutable values, we provide a signal and array library that extends a monadic language, like Co-Feldspar. These extensions permit a functional style of programming by composing high-level combinators. Our compiler transforms such high-level code into efficient programs with mutating code. In particular, we show how to execute an FFT safely in-place, and how to describe a FIR and IIR filter efficiently as streams. Co-Feldspar’s monadic interface is however quite invasive; not only is the burden of explicit memory management quite heavy on the user, it is also quite easy to shoot on eself in the foot. It is for these reasons that we also explore a dynamic memory management discipline that is based on regions but predictable enough to be of use for embedded systems. Specifically, this thesis introduces a program analysis which annotates values with dynamically allocated memory regions. By limiting our efforts to functional languages that target embedded software, we manage to define a region inference algorithm that is considerably simpler than traditional approaches
Compilation and Code Optimization for Data Analytics
The trade-offs between the use of modern high-level and low-level programming languages in constructing complex software artifacts are well known. High-level languages allow for greater programmer productivity: abstraction and genericity allow for the same functionality to be implemented with significantly less code compared to low-level languages. Modularity, object-orientation, functional programming, and powerful type systems allow programmers not only to create clean abstractions and protect them from leaking, but also to define code units that are reusable and easily composable, and software architectures that are adaptable and extensible. The abstraction, succinctness, and modularity of high-level code help to avoid software bugs and facilitate debugging and maintenance.
The use of high-level languages comes at a performance cost: increased indirection due to abstraction, virtualization, and interpretation, and superfluous work, particularly in the form of tempory memory allocation and deallocation to support objects and encapsulation.
As a result of this, the cost of high-level languages for performance-critical systems may seem prohibitive.
The vision of abstraction without regret argues that it is possible to use high-level languages for building performance-critical systems that allow for both productivity and high performance, instead of trading off the former for the latter. In this thesis, we realize this vision for building different types of data analytics systems. Our means of achieving this is by employing compilation. The goal is to compile away expensive language features -- to compile high-level code down to efficient low-level code
Compilation and Code Optimization for Data Analytics
The trade-offs between the use of modern high-level and low-level programming languages in constructing complex software artifacts are well known. High-level languages allow for greater programmer productivity: abstraction and genericity allow for the same functionality to be implemented with significantly less code compared to low-level languages. Modularity, object-orientation, functional programming, and powerful type systems allow programmers not only to create clean abstractions and protect them from leaking, but also to define code units that are reusable and easily composable, and software architectures that are adaptable and extensible. The abstraction, succinctness, and modularity of high-level code help to avoid software bugs and facilitate debugging and maintenance.
The use of high-level languages comes at a performance cost: increased indirection due to abstraction, virtualization, and interpretation, and superfluous work, particularly in the form of tempory memory allocation and deallocation to support objects and encapsulation.
As a result of this, the cost of high-level languages for performance-critical systems may seem prohibitive.
The vision of abstraction without regret argues that it is possible to use high-level languages for building performance-critical systems that allow for both productivity and high performance, instead of trading off the former for the latter. In this thesis, we realize this vision for building different types of data analytics systems. Our means of achieving this is by employing compilation. The goal is to compile away expensive language features -- to compile high-level code down to efficient low-level code