13 research outputs found
Sidekick compilation with xDSL
Traditionally, compiler researchers either conduct experiments within an
existing production compiler or develop their own prototype compiler; both
options come with trade-offs. On one hand, prototyping in a production compiler
can be cumbersome, as they are often optimized for program compilation speed at
the expense of software simplicity and development speed. On the other hand,
the transition from a prototype compiler to production requires significant
engineering work. To bridge this gap, we introduce the concept of sidekick
compiler frameworks, an approach that uses multiple frameworks that
interoperate with each other by leveraging textual interchange formats and
declarative descriptions of abstractions. Each such compiler framework is
specialized for specific use cases, such as performance or prototyping.
Abstractions are by design shared across frameworks, simplifying the transition
from prototyping to production. We demonstrate this idea with xDSL, a sidekick
for MLIR focused on prototyping and teaching. xDSL interoperates with MLIR
through a shared textual IR and the exchange of IRs through an IR Definition
Language. The benefits of sidekick compiler frameworks are evaluated by showing
on three use cases how xDSL impacts their development: teaching, DSL
compilation, and rewrite system prototyping. We also investigate the trade-offs
that xDSL offers, and demonstrate how we simplify the transition between
frameworks using the IRDL dialect. With sidekick compilation, we envision a
future in which engineers minimize the cost of development by choosing a
framework built for their immediate needs, and later transitioning to
production with minimal overhead
A Case Study in Modular Programming: Using AspectJ and OCaml in an Undergraduate Compiler Project
We report our experience in using two different languages to build the same software project. Specifically, we have converted an entire undergraduate compiler course from using AspectJ, an aspect-oriented language, to using OCaml, a functional language. The course has evolved over a period of eight years with, on average, 60 students completing it every year. In this article, we analyze our usage of the two programming languages and we compare and contrast the two software projects on a number of parameters, including how they enable students to write and test individual compiler phases in a modular way.
An extension-oriented compiler
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (leaves 107-110).The thesis of this dissertation is that compilers can and should allow programmers to extend programming languages with new syntax, features, and restrictions by writing extension modules that act as plugins for the compiler. We call compilers designed around this idea extension-oriented. The central challenge in designing and building an extension-oriented compiler is the creation of extension interfaces that are simultaneously powerful, to allow effective extensions; convenient, to make these extensions easy to write; and composable, to make it possible to use independently-written extensions together. This dissertation proposes and evaluates extension-oriented syntax trees (XSTs) as a way to meet these challenges. The key interfaces to XSTs are grammar statements, a convenient, composable interface to extend the input parser; syntax patterns, a way to manipulate XSTs in terms of the original program syntax; canonicalizers, which put XSTs into a canonical form to extend the reach of syntax patterns; and attributes, a lazy computation mechanism to structure analyses on XSTs and allow extensions to cooperate. We have implemented these interfaces in a small procedural language called zeta. Using zeta, we have built an extension-oriented compiler for C called xoc and then 13 extensions to C ranging in size from 16 lines to 245 lines.To evaluate XSTs and xoc, this dissertation examines two examples in detail: a reimplementation of the programming language Alef and a reimplementation of the Linux kernel checker Sparse. Both of these examples consist of a handful of small extensionsby Russell Stensby Cox.Ph.D
Implementing an Embedded Compiler using Program Transformation Rules
(To appear)International audienceDomain-specific languages (DSLs) are well-recognized to ease programming and improve robustness for a specific domain, by providing high-level domain-specific notations and verifications of domain-specific properties. The compiler of a DSL, however, is often difficult to develop and maintain, due to the need to define a specific treatment for a large and potentially increasing number of language constructs. To address this issue, we propose an approach for specifying a DSL compiler and verifier using control-flow sensitive concrete-syntax based matching rules. These rules either collect information about the source code to carry out verifications or perform transformations to carry out compilation. Because rules only mention the relevant constructs, using their concrete syntax, and hide the complexity of control-flow graph traversal, it is easy to understand the purpose of each rule. Furthermore, new compilation steps can be added using only a small number of lines of code. We explore this approach in the context of the z2z DSL for network gateway development, and show that the core of its compiler and verifier can be implemented in this manner
Programming Language Techniques for Natural Language Applications
It is easy to imagine machines that can communicate in natural language. Constructing such machines is more difficult. The aim of this thesis is to demonstrate
how declarative grammar formalisms that distinguish between abstract and concrete syntax make it easier to develop natural language applications.
We describe how the type-theorectical grammar formalism Grammatical
Framework (GF) can be used as a high-level language for natural language
applications. By taking advantage of techniques from the field of programming
language implementation, we can use GF grammars to perform portable
and efficient parsing and linearization, generate speech recognition language
models, implement multimodal fusion and fission, generate support code for
abstract syntax transformations, generate dialogue managers, and implement
speech translators and web-based syntax-aware editors.
By generating application components from a declarative grammar, we can
reduce duplicated work, ensure consistency, make it easier to build multilingual
systems, improve linguistic quality, enable re-use across system domains, and
make systems more portable
Generic Programming with Extensible Data Types; Or, Making Ad Hoc Extensible Data Types Less Ad Hoc
We present a novel approach to generic programming over extensible data
types. Row types capture the structure of records and variants, and can be used
to express record and variant subtyping, record extension, and modular
composition of case branches. We extend row typing to capture generic
programming over rows themselves, capturing patterns including lifting
operations to records and variations from their component types, and the
duality between cases blocks over variants and records of labeled functions,
without placing specific requirements on the fields or constructors present in
the records and variants. We formalize our approach in System R{\omega}, an
extension of F{\omega} with row types, and give a denotational semantics for
(stratified) R{\omega} in Agda.Comment: To appear at: International Conference on Functional Programming 2023
Corrected citations from previous versio
Efficient Tree-Traversals: Reconciling Parallelism and Dense Data Representations
Recent work showed that compiling functional programs to use dense,
serialized memory representations for recursive algebraic datatypes can yield
significant constant-factor speedups for sequential programs. But serializing
data in a maximally dense format consequently serializes the processing of that
data, yielding a tension between density and parallelism. This paper shows that
a disciplined, practical compromise is possible. We present Parallel Gibbon, a
compiler that obtains the benefits of dense data formats and parallelism. We
formalize the semantics of the parallel location calculus underpinning this
novel implementation strategy, and show that it is type-safe. Parallel Gibbon
exceeds the parallel performance of existing compilers for purely functional
programs that use recursive algebraic datatypes, including, notably,
abstract-syntax-tree traversals as in compilers
Simple optimizing JIT compilation of higher-order dynamic programming languages
Implémenter efficacement les langages de programmation dynamiques demande beaucoup d’effort de développement.
Les compilateurs ne cessent de devenir de plus en plus complexes.
Aujourd’hui, ils incluent souvent une phase d’interprétation, plusieurs phases de compilation, plusieurs représentations intermédiaires et des analyses de code. Toutes ces techniques permettent d’implémenter efficacement un langage de programmation dynamique, mais leur mise en oeuvre est difficile dans un contexte où les ressources de développement sont limitées.
Nous proposons une nouvelle approche et de nouvelles techniques dynamiques permettant de développer des compilateurs performants pour les langages dynamiques avec de relativement bonnes performances et un faible effort de développement.
Nous présentons une approche simple de compilation à la volée qui permet d’implémenter un langage en une seule phase de compilation, sans transformation vers des représentations intermédiaires.
Nous expliquons comment le versionnement de blocs de base, une technique de compilation existante, peut être étendue, sans effort de développement significatif, pour fonctionner interprocéduralement avec les langages de programmation d’ordre supérieur, permettant d’appliquer des optimisations interprocédurales sur ces langages.
Nous expliquons également comment le versionnement de blocs de base permet de supprimer certaines opérations utilisées pour implémenter les langages dynamiques et qui impactent les performances comme les vérifications de type.
Nous expliquons aussi comment les compilateurs peuvent exploiter les représentations dynamiques des valeurs par Tagging et NaN-boxing pour optimiser le code généré avec peu d’effort de développement.
Nous présentons également notre expérience de développement d’un compilateur à la volée pour le langage de programmation Scheme, pour montrer que ces techniques permettent effectivement de construire un compilateur avec un effort moins important que les compilateurs actuels et qu’elles permettent de générer du code efficace, qui rivalise avec les meilleures implémentations du langage Scheme.Efficiently implementing dynamic programming languages requires a significant development
effort. Over the years, compilers have become more complex. Today, they typically include
an interpretation phase, several compilation phases, several intermediate representations and
code analyses. These techniques allow efficiently implementing these programming languages
but are difficult to implement in contexts in which development resources are limited. We
propose a new approach and new techniques to build optimizing just-in-time compilers for
dynamic languages with relatively good performance and low development effort.
We present a simple just-in-time compilation approach to implement a language with
a single compilation phase, without the need to use code transformations to intermediate
representations. We explain how basic block versioning, an existing compilation technique,
can be extended without significant development effort, to work interprocedurally with higherorder
programming languages allowing interprocedural optimizations on these languages. We
also explain how basic block versioning allows removing operations used to implement dynamic
languages that degrade performance, such as type checks, and how compilers can use Tagging
and NaN-boxing to optimize the generated code with low development effort. We present our
experience of building a JIT compiler using these techniques for the Scheme programming
language to show that they indeed allow building compilers with less development effort
than other implementations and that they allow generating efficient code that competes with
current mature implementations of the Scheme language