Search CORE

9 research outputs found

Functional pearl: a SQL to C compiler in 500 lines of code

Author: Jones N. D.
Rompf T.
Stonebraker M.
Svenningsson J.
Zukowski M.
Publication venue: Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming
Publication date: 29/08/2015
Field of study

We present the design and implementation of a SQL query processor that outperforms existing database systems and is written in just about 500 lines of Scala code - a convincing case study that high-level functional programming can handily beat C for systems-level programming where the last drop of performance matters. The key enabler is a shift in perspective towards generative programming. The core of the query engine is an interpreter for relational algebra operations, written in Scala. Using the open-source LMS Framework (Lightweight Modular Staging), we turn this interpreter into a query compiler with very low effort. To do so, we capitalize on an old and widely known result from partial evaluation known as Futamura projections, which state that a program that can specialize an interpreter to any given input program is equivalent to a compiler. In this pearl, we discuss LMS programming patterns such as mixed-stage data structures (e.g. data records with static schema and dynamic field components) and techniques to generate low-level C code, including specialized data structures and data loading primitives

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Apollo (Cambridge)

Recommended from our members

A SQL to C compiler in 500 lines of code

Author: AMIN NADA
ROMPF TIARK
Publication venue: Journal of Functional Programming
Publication date: 01/01/2019
Field of study

AbstractWe present the design and implementation of a SQL query processor that outperforms existing database systems and is written in just about 500 lines of Scala code – a convincing case study that high-level functional programming can handily beat C for systems-level programming where the last drop of performance matters. The key enabler is a shift in perspective toward generative programming. The core of the query engine is an interpreter for relational-algebra operations, written in Scala. Using the open-source lightweight modular staging framework, we turn this interpreter into a query compiler with very low effort. To do so, we capitalize on an old and widely known result from partial evaluation: the first Futamura projection, which states that a process that can specialize an interpreter to any given input program is equivalent to a compiler. In this context, we discuss lightweight modular staging programming patterns such as mixed-stage data structures (e.g., data records with static schema and dynamic field components) and techniques to generate low-level C code, including specialized data structures and data loading primitives.</jats:p

Apollo (Cambridge)

Making an Embedded DBMS JIT-friendly

Author: Bolz Carl Friedrich
Kurilova Darya
Tratt Laurence
Publication venue
Publication date: 01/01/2016
Field of study

While database management systems (DBMSs) are highly optimized, interactions across the boundary between the programming language (PL) and the DBMS are costly, even for in-process embedded DBMSs. In this paper, we show that programs that interact with the popular embedded DBMS SQLite can be significantly optimized - by a factor of 3.4 in our benchmarks - by inlining across the PL / DBMS boundary. We achieved this speed-up by replacing parts of SQLite's C interpreter with RPython code and composing the resulting meta-tracing virtual machine (VM) - called SQPyte - with the PyPy VM. SQPyte does not compromise stand-alone SQL performance and is 2.2% faster than SQLite on the widely used TPC-H benchmark suite.Comment: 24 pages, 18 figure

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

King's Research Portal

Deductive Optimization of Relational Data Storage

Author: Feser John K.
Madden Samuel
Solar-Lezama Armando
Tang Nan
Publication venue
Publication date: 05/02/2020
Field of study

Optimizing the physical data storage and retrieval of data are two key database management problems. In this paper, we propose a language that can express a wide range of physical database layouts, going well beyond the row- and column-based methods that are widely used in database management systems. We use deductive synthesis to turn a high-level relational representation of a database query into a highly optimized low-level implementation which operates on a specialized layout of the dataset. We build a compiler for this language and conduct experiments using a popular database benchmark, which shows that the performance of these specialized queries is competitive with a state-of-the-art in memory compiled database system

arXiv.org e-Print Archive

DSpace@MIT

Recommended from our members

Collapsing towers of interpreters

Author: Amin Nada
Rompf Tiark
Publication venue: Proceedings of the ACM on Programming Languages
Publication date: 27/12/2017
Field of study

Given a tower of interpreters, i.e., a sequence of multiple interpreters interpreting one another as input programs, we aim to collapse this tower into a compiler that removes all interpretive overhead and runs in a single pass. In the real world, a use case might be Python code executed by an x86 runtime, on a CPU emulated in a JavaScript VM, running on an ARM CPU. Collapsing such a tower can not only exponentially improve runtime performance, but also enable the use of base-language tools for interpreted programs, e.g., for analysis and verification. In this paper, we lay the foundations in an idealized but realistic setting. We present a multi-level lambda calculus that features staging constructs and stage polymorphism: based on runtime parameters, an evaluator either executes source code (thereby acting as an interpreter) or generates code (thereby acting as a compiler). We identify stage polymorphism, a programming model from the domain of high-performance program generators, as the key mechanism to make such interpreters compose in a collapsible way. We present Pink, a meta-circular Lisp-like evaluator on top of this calculus, and demonstrate that we can collapse arbitrarily many levels of self-interpretation, including levels with semantic modifications. We discuss several examples: compiling regular expressions through an interpreter to base code, building program transformers from modi ed interpreters, and others. We develop these ideas further to include reflection and reification, culminating in Purple, a reflective language inspired by Brown, Blond, and Black, which realizes a conceptually infinite tower, where every aspect of the semantics can change dynamically. Addressing an open challenge, we show how user programs can be compiled and recompiled under user-modified semantics.Parts of this research were supported by ERC grant 321217, NSF awards 1553471 and 1564207, and DOE award DE-SC0018050

Apollo (Cambridge)

Dependent Object Types

Author: Amin Nada
Publication venue: Lausanne, EPFL
Publication date: 05/12/2016
Field of study

A scalable programming language is one in which the same concepts can describe small as well as large parts. Towards this goal, Scala unifies concepts from object and module systems. In particular, objects can contain type members, which can be selected as types, called path-dependent types. Focusing on path-dependent types, we develop a type-theoretic foundation for Scala: the calculus of Dependent Object Types (DOT). We derive DOT from System F, we add a lower bound to each type variable, in addition to its usual upper bound, (2) in System D, we turn each type variable into a regular term variable containing a type, (3) for a full subtyping lattice, we add intersection and union types, (4) for objects, we consolidate all values into records, (5) for objects that close over a self, we introduce a recursive type, binding a self term variable, (6) for recursive types, we first extend the theory in typing and then also in subtyping. Through this bottom-up exploration, we discover a sound, uniform yet powerful design for DOT. We devise strategies and techniques for proving soundness that scale through this iterative step-by-step process: (1) "pushback" of subtyping transitivity or subsumption, to concisely capture inversion of subtyping or typing, (2) distinction between concrete vs. abstract context variables, to resolve tension between preservation of types vs. preservation of type abstractions, (3) and, specifically for big-step semantics, a type that closes over an environment, to relate context-dependent types across closures. While ultimately, we have developed sound models of DOT in both big-step and small-step operational semantics, historically, the shift to big-step semantics has been helpful in focusing the requirements. In particular, by developing a novel big-step soundness proof for System F<:, calculi like System D<: emerge as straightforward generalizations, almost like removing artificial restrictions. Interesting in their own right, our type soundness techniques for definitional interpreters extend to mutable references without use of co-induction. The DOT calculus finally grounds languages like Scala in firm theory. The DOT calculus helps in finding bugs in Scala, and in understanding feature interaction better as well as requirements. The DOT calculus serves as a good basis for future work which studies extensions or encodings on top of the core, bridging the gap from DOT to Dotty / Scala

Infoscience - École polytechnique fédérale de Lausanne

Optimizing and Incrementalizing Higher-order Collection Queries by AST Transformation

Author: Giarrusso Paolo Giosuè
Publication venue: Universität Tübingen
Publication date: 14/02/2020
Field of study

In modernen, universellen Programmiersprachen sind Abfragen auf Speicher-basierten Kollektionen oft rechenintensiver als erforderlich. Während Datenbankenabfragen vergleichsweise einfach optimiert werden können, fällt dies bei Speicher-basierten Kollektionen oft schwer, denn universelle Programmiersprachen sind in aller Regel ausdrucksstärker als Datenbanken. Insbesondere unterstützen diese Sprachen meistens verschachtelte, rekursive Datentypen und Funktionen höherer Ordnung. Kollektionsabfragen können per Hand optimiert und inkrementalisiert werden, jedoch verringert dies häufig die Modularität und ist oft zu fehleranfällig, um realisierbar zu sein oder um Instandhaltung von entstandene Programm zu gewährleisten. Die vorliegende Doktorarbeit demonstriert, wie Abfragen auf Kollektionen systematisch und automatisch optimiert und inkrementalisiert werden können, um Programmierer von dieser Last zu befreien. Die so erzeugten Programme werden in derselben Kernsprache ausgedrückt, um weitere Standardoptimierungen zu ermöglichen. Teil I entwickelt eine Variante der Scala API für Kollektionen, die Staging verwendet um Abfragen als abstrakte Syntaxbäume zu reifizieren. Auf Basis dieser Schnittstelle werden anschließend domänenspezifische Optimierungen von Programmiersprachen und Datenbanken angewandt; unter anderem werden Abfragen umgeschrieben, um vom Programmierer ausgewählte Indizes zu benutzen. Dank dieser Indizes kann eine erhebliche Beschleunigung der Ausführungsgeschwindigkeit gezeigt werden; eine experimentelle Auswertung zeigt hierbei Beschleunigungen von durchschnittlich 12x bis zu einem Maximum von 12800x. Um Programme mit Funktionen höherer Ordnung durch Programmtransformation zu inkrementalisieren, wird in Teil II eine Erweiterung der Finite-Differenzen-Methode vorgestellt [Paige and Koenig, 1982; Blakeley et al., 1986; Gupta and Mumick, 1999] und ein erster Ansatz zur Inkrementalisierung durch Programmtransformation für Programme mit Funktionen höherer Ordnung entwickelt. Dabei werden Programme zu Ableitungen transformiert, d.h. zu Programmen die Eingangsdifferenzen in Ausgangdifferenzen umwandeln. Weiterhin werden in den Kapiteln 12–13 die Korrektheit des Inkrementalisierungsansatzes für einfach-getypten und ungetypten λ-Kalkül bewiesen und Erweiterungen zu System F besprochen. Ableitungen müssen oft Ergebnisse der ursprünglichen Programme wiederverwenden. Um eine solche Wiederverwendung zu ermöglichen, erweitert Kapitel 17 die Arbeit von Liu and Teitelbaum [1995] zu Programmen mit Funktionen höherer Ordnung und entwickeln eine Programmtransformation solcher Programme im Cache-Transfer-Stil. Für eine effiziente Inkrementalisierung ist es weiterhin notwendig, passende Grundoperationen auszuwählen und manuell zu inkrementalisieren. Diese Arbeit deckt einen Großteil der wichtigsten Grundoperationen auf Kollektionen ab. Die Durchführung von Fallstudien zeigt deutliche Laufzeitverbesserungen sowohl in Praxis als auch in der asymptotischen Komplexität.In modern programming languages, queries on in-memory collections are often more expensive than needed. While database queries can be readily optimized, it is often not trivial to use them to express collection queries which employ nested data and first-class functions, as enabled by functional programming languages. Collection queries can be optimized and incrementalized by hand, but this reduces modularity, and is often too error-prone to be feasible or to enable maintenance of resulting programs. To free programmers from such burdens, in this thesis we study how to optimize and incrementalize such collection queries. Resulting programs are expressed in the same core language, so that they can be subjected to other standard optimizations. To enable optimizing collection queries which occur inside programs, we develop a staged variant of the Scala collection API that reifies queries as ASTs. On top of this interface, we adapt domain-specific optimizations from the fields of programming languages and databases; among others, we rewrite queries to use indexes chosen by programmers. Thanks to the use of indexes we show significant speedups in our experimental evaluation, with an average of 12x and a maximum of 12800x. To incrementalize higher-order programs by program transformation, we extend finite differencing [Paige and Koenig, 1982; Blakeley et al., 1986; Gupta and Mumick, 1999] and develop the first approach to incrementalization by program transformation for higher-order programs. Base programs are transformed to derivatives, programs that transform input changes to output changes. We prove that our incrementalization approach is correct: We develop the theory underlying incrementalization for simply-typed and untyped λ-calculus, and discuss extensions to System F. Derivatives often need to reuse results produced by base programs: to enable such reuse, we extend work by Liu and Teitelbaum [1995] to higher-order programs, and develop and prove correct a program transformation, converting higher-order programs to cache-transfer-style. For efficient incrementalization, it is necessary to choose and incrementalize by hand appropriate primitive operations. We incrementalize a significant subset of collection operations and perform case studies, showing order-of-magnitude speedups both in practice and in asymptotic complexity

Publikationsserver der Universität Tübingen