9 research outputs found
Software refactoring and rewriting: from the perspective of code transformations
To refactor already working code while keeping reliability, compatibility and
perhaps security, we can borrow ideas from micropass/nanopass compilers. By
treating the procedure of software refactoring as composing code
transformations, and compressing repetitive transformations with automation
tools, we can often obtain representations of refactoring processes short
enough that their correctness can be analysed manually. Unlike in compilers, in
refactoring we usually only need to consider the codebase in question, so
regular text processing can be extensively used, fully exploiting patterns only
present in the codebase. Aside from the direct application of code
transformations from compilers, many other kinds of equivalence properties may
also be exploited. In this paper, two refactoring projects are given as the
main examples, where 10-100 times simplification has been achieved with the
application of a few kinds of useful transformations.Comment: 10 pages, 4 figure
Catala: A Programming Language for the Law
Law at large underpins modern society, codifying and governing many aspects
of citizens' daily lives. Oftentimes, law is subject to interpretation, debate
and challenges throughout various courts and jurisdictions. But in some other
areas, law leaves little room for interpretation, and essentially aims to
rigorously describe a computation, a decision procedure or, simply said, an
algorithm. Unfortunately, prose remains a woefully inadequate tool for the job.
The lack of formalism leaves room for ambiguities; the structure of legal
statutes, with many paragraphs and sub-sections spread across multiple pages,
makes it hard to compute the intended outcome of the algorithm underlying a
given text; and, as with any other piece of poorly-specified critical software,
the use of informal language leaves corner cases unaddressed. We introduce
Catala, a new programming language that we specifically designed to allow a
straightforward and systematic translation of statutory law into an executable
implementation. Catala aims to bring together lawyers and programmers through a
shared medium, which together they can understand, edit and evolve, bridging a
gap that often results in dramatically incorrect implementations of the law. We
have implemented a compiler for Catala, and have proven the correctness of its
core compilation steps using the F* proof assistant. We evaluate Catala on
several legal texts that are algorithms in disguise, notably section 121 of the
US federal income tax and the byzantine French family benefits; in doing so, we
uncover a bug in the official implementation. We observe as a consequence of
the formalization process that using Catala enables rich interactions between
lawyers and programmers, leading to a greater understanding of the original
legislative intent, while producing a correct-by-construction executable
specification reusable by the greater software ecosystem
Higher-Order, Data-Parallel Structured Deduction
State-of-the-art Datalog engines include expressive features such as ADTs
(structured heap values), stratified aggregation and negation, various
primitive operations, and the opportunity for further extension using FFIs.
Current parallelization approaches for state-of-art Datalogs target
shared-memory locking data-structures using conventional multi-threading, or
use the map-reduce model for distributed computing. Furthermore, current
state-of-art approaches cannot scale to formal systems which pervasively
manipulate structured data due to their lack of indexing for structured data
stored in the heap.
In this paper, we describe a new approach to data-parallel structured
deduction that involves a key semantic extension of Datalog to permit
first-class facts and higher-order relations via defunctionalization, an
implementation approach that enables parallelism uniformly both across sets of
disjoint facts and over individual facts with nested structure. We detail a
core language, , whose key invariant (subfact closure) ensures that each
subfact is materialized as a top-class fact. We extend to Slog, a
fully-featured language whose forms facilitate leveraging subfact closure to
rapidly implement expressive, high-performance formal systems. We demonstrate
Slog by building a family of control-flow analyses from abstract machines,
systematically, along with several implementations of classical type systems
(such as STLC and LF). We performed experiments on EC2, Azure, and ALCF's Theta
at up to 1000 threads, showing orders-of-magnitude scalability improvements
versus competing state-of-art systems
Generic Programming with Extensible Data Types; Or, Making Ad Hoc Extensible Data Types Less Ad Hoc
We present a novel approach to generic programming over extensible data
types. Row types capture the structure of records and variants, and can be used
to express record and variant subtyping, record extension, and modular
composition of case branches. We extend row typing to capture generic
programming over rows themselves, capturing patterns including lifting
operations to records and variations from their component types, and the
duality between cases blocks over variants and records of labeled functions,
without placing specific requirements on the fields or constructors present in
the records and variants. We formalize our approach in System R{\omega}, an
extension of F{\omega} with row types, and give a denotational semantics for
(stratified) R{\omega} in Agda.Comment: To appear at: International Conference on Functional Programming 2023
Corrected citations from previous versio
Pattern eliminating transformations
International audienceProgram transformation is a common practice in computer science, and its many applications can have a range of different objectives. For example, a program written in an original high level language could be either translated into machine code for execution purposes, or towards a language suitable for formal verification. Such compilations are split into several so-called passes which generally aim at eliminating certain constructions of the original language to get some intermediate languages and finally generate the target code. Rewriting is a widely established formalism to describe the mechanism and the logic behind such transformations. In a typed context featuring type-preserving rewrite rules, the underlying type system can be used to give syntactic guarantees on the shape of the results obtained after each pass, but this approach could lead to an accumulation of (auxiliary) types that should be considered. We propose in this paper an approach where the function symbols corresponding to the transformations performed in a pass are annotated with the (anti-)patterns they are supposed to eliminate and show how we can check that the transformation is consistent with the annotations and thus, that it eliminates the respective patterns
A Type and Scope Safe Universe of Syntaxes with Binding: Their Semantics and Proofs
Almost every programming language’s syntax includes a notion of binder and corresponding bound occurrences, along with the accompanying notions of α-equivalence, capture avoiding substitution, typing contexts, runtime environments, and so on. In the past, implementing and reasoning about programming languages required careful handling to maintain the correct behaviour of bound variables. Modern programming languages include features that enable constraints like scope safety to be expressed in types. Nevertheless, the programmer is still forced to write the same boilerplate over again for each new implementation of a scope safe operation (e.g., renaming, substitution, desugaring, printing, etc.), and then again for correctness proofs. We present an expressive universe of syntaxes with binding and demonstrate how to (1) implement scope safe traversals once and for all by generic programming; and (2) how to derive properties of these traversals by generic proving. Our universe description, generic traversals and proofs, and our examples have all been formalised in Agda and are available in the accompanying material
A nanopass framework for commercial compiler development
Contemporary commercial compilers typically handle sophisticated high-level source languages, generate efficient assembly or machine code for multiple hardware architectures, run under and generate code to run under multiple operating systems, and support source-level debugging, profiling, and other program development tools. As a result, commercial compilers tend to be among the most complex of software systems. Nanopass frameworks are designed to help make this complexity manageable. A nanopass framework is a domain-specific language, embedded in a general purpose programming language, to aid in compiler development. A nanopass compiler is comprised of many small passes, each of which performs a single task and specifies only the interesting transformations to be performed by the pass. Intermediate languages are formally specified by the compiler writer, which allows the infrastructure both to verify that the output of each pass is well-formed and to fill in the uninteresting boilerplate parts of each pass. Prior nanopass frameworks were prototype systems aimed at educational use, but we believe that a suitable nanopass framework can be used to support the development of commercial compilers. We have created such a framework and have demonstrated its effectiveness by using the framework to create a new commercial compiler that is a “plug replacement” for an existing commercial compiler. The new compiler uses a more sophisticated, although slower, register allocator and implements nearly all of the optimizations of the original compiler, along with several “new and improved” optimizations. When compared to the original compiler on a set of benchmarks, code generated by the new compiler runs, on average, 21.5% faster. The average compile time for these benchmarks is less than twice as long as with the original compiler. This dissertation provides a description of the new framework, the new compiler, and several experiments that demonstrate the performance and effectiveness of both, as well as a presentation of several optimizations performed by the new compiler and facilitated by the infrastructure