46 research outputs found
Language-Integrated Updatable Views
Relational lenses are a modern approach to the view update problem in relational databases. As introduced by Bohannon et al. [5], relational lenses allow the definition of updatable views by the composition of lenses performing individual transformations. Horn et al. [20] provided the first implementation of incremental relational lenses, which demonstrated that relational lenses can be implemented efficiently by propagating changes to the database rather than replacing the entire database state.
However, neither approach proposes a concrete language design; consequently, it is unclear how to integrate lenses into a general-purpose programming language, or how to check that lenses satisfy the well-formedness conditions needed for predictable behaviour. In this paper, we propose the first full account of relational lenses in a functional programming language, by extending the Links web programming language. We provide support for higher-order predicates, and provide the first account of typechecking relational lenses which is amenable to implementation. We prove the soundness of our typing rules, and illustrate our approach by implementing a curation interface for a scientific database application
Structural abstraction: a mechanism for modular program construction
Abstraction mechanisms in programming languages aim to allow orthogonal pieces of functionality to be developed separately; complex software can then be constructed through the composition of these pieces. The effectiveness of such mechanisms lies in their support for modularity and reusability: The behavior of a piece of code should be reasoned about modularly---independently of the specific compositions it may participate in; the computation of a piece of code should allow specialization, so that it is reusable for different compositions. This dissertation introduces structural abstraction: a mechanism that advances the state of the art by allowing the writing of highly reusable code---code whose structure can be specialized per composition, while maintaining a high level of modularity.
Structural abstraction provides a disciplined way for code to inspect the structure of its clients in composition, and declare its own structure accordingly. The hallmark feature of structural abstraction is that, despite its emphasis on greater reusability, it still allows modular type checking: A piece of structurally abstract code can be type-checked independently of its uses in compositions---an invaluable feature for highly reusable components that will be statically composed by other programmers.
This dissertation introduces two structural abstraction techniques: static type conditions, and morphing. Static type conditions allow code to be conditionally declared based on subtyping constraints. A client of a piece of code can configure a desirable set of features by composing the code with types that satisfy the appropriate subtyping conditions. Morphing allows code to be iteratively declared, by statically reflecting over the structural members of code that it would be composed with. A morphing piece of code can mimic the structure of its clients in composition, or change its shape according to its clients in a pattern-based manner. Using either static type conditions or morphing, the structure of a piece of code is not statically determined, but can be automatically specialized by clients. Static type conditions and morphing both guarantee the modular type-safety of code: regardless of specific client configurations, code is guaranteed to be well-typed.Ph.D.Committee Chair: Yannis Smaragdakis; Committee Member: Oege de Moor; Committee Member: Richard LeBlanc; Committee Member: Santosh Pande; Committee Member: Spencer Rugabe
PG-Schema: Schemas for Property Graphs
Property graphs have reached a high level of maturity, witnessed by multiple
robust graph database systems as well as the ongoing ISO standardization effort
aiming at creating a new standard Graph Query Language (GQL). Yet, despite
documented demand, schema support is limited both in existing systems and in
the first version of the GQL Standard. It is anticipated that the second
version of the GQL Standard will include a rich DDL. Aiming to inspire the
development of GQL and enhance the capabilities of graph database systems, we
propose PG-Schema, a simple yet powerful formalism for specifying property
graph schemas. It features PG-Types with flexible type definitions supporting
multi-inheritance, as well as expressive constraints based on the recently
proposed PG-Keys formalism. We provide the formal syntax and semantics of
PG-Schema, which meet principled design requirements grounded in contemporary
property graph management scenarios, and offer a detailed comparison of its
features with those of existing schema languages and graph database systems.Comment: 25 page
Recommended from our members
Using Formal Methods to Verify Transactional Abstract Concurrency Control
Concurrent application design and implementation is more important than ever in today\u27s multi-core processor world. Transactional Memory (TM) Concurrent application design and implementation is more important than ever in today\u27s multi-core processor world. Transactional Memory (TM). Each has its own particular advantages and disadvantages. However, these techniques each need some extra information to `glue\u27 the non-transactional operation into a transactional context. At the most general level, non-transactional code must be decorated in such a way that the TM run-time can determine how those non-transactional operations commute with one another, and how to `undo\u27 the non-transactional operations in case the run-time needs to abort a software transaction. The TM run-time trusts that these programmer-provided annotations are correct. Therefore, if an implementor needs to employ one of these transactional `escape hatches\u27, it is crucially important that their concurrency control annotations be correct. However, reasoning about the commutativity of data structure operations is often challenging, and increasing the burden on the programmer with a proof requirement does not simplify the task of concurrent programming. There is a way to leverage the structure that these TM extensions require to reduce greatly the burden on the programmer. If the programmer could describe the abstract state of the data structure and then reason about it with as much machine assistance as possible, then there would be much less opportunity for error. Abstract state is preferable to a more concrete state, because it permits the programmer to use different concrete implementations of the same abstract data type. Also, some TM extensions such as open nesting can handle concrete state conflicts without programmer intervention (making the abstract state the appropriate state for reasoning about commutativity). A solution to the problem of specifying and verifying the concurrency properties of abstract data structures is the subject of this thesis. We will describe a new language, ACCLAM, for describing the abstract state of a data structure and reasoning about its concurrency control properties. This thesis also describes a tool that can process ACCLAM descriptions into a machine verifiable form (they are converted to a SAT problem). We will also provides a more detailed overview of transactional memory and the more popular extensions, a detailed semantic description of ACCLAM and a set of example data structure models and the results of processing those examples with the language processing tool
Optimizing and Incrementalizing Higher-order Collection Queries by AST Transformation
In modernen, universellen Programmiersprachen sind Abfragen auf Speicher-basierten Kollektionen oft rechenintensiver als erforderlich. Während Datenbankenabfragen vergleichsweise einfach optimiert werden können, fällt dies bei Speicher-basierten Kollektionen oft schwer, denn universelle Programmiersprachen sind in aller Regel ausdrucksstärker als Datenbanken. Insbesondere unterstützen diese Sprachen meistens verschachtelte, rekursive Datentypen und Funktionen höherer Ordnung.
Kollektionsabfragen können per Hand optimiert und inkrementalisiert werden, jedoch verringert dies häufig die Modularität und ist oft zu fehleranfällig, um realisierbar zu sein oder um Instandhaltung von entstandene Programm zu gewährleisten. Die vorliegende Doktorarbeit demonstriert, wie Abfragen auf Kollektionen systematisch und automatisch optimiert und inkrementalisiert werden können, um Programmierer von dieser Last zu befreien. Die so erzeugten Programme werden in derselben Kernsprache ausgedrückt, um weitere Standardoptimierungen zu ermöglichen.
Teil I entwickelt eine Variante der Scala API für Kollektionen, die Staging verwendet um Abfragen als abstrakte Syntaxbäume zu reifizieren. Auf Basis dieser Schnittstelle werden anschließend domänenspezifische Optimierungen von Programmiersprachen und Datenbanken angewandt; unter anderem werden Abfragen umgeschrieben, um vom Programmierer ausgewählte Indizes zu benutzen. Dank dieser Indizes kann eine erhebliche Beschleunigung der Ausführungsgeschwindigkeit gezeigt werden; eine experimentelle Auswertung zeigt hierbei Beschleunigungen von durchschnittlich 12x bis zu einem Maximum von 12800x.
Um Programme mit Funktionen höherer Ordnung durch Programmtransformation zu inkrementalisieren, wird in Teil II eine Erweiterung der Finite-Differenzen-Methode vorgestellt [Paige and Koenig, 1982; Blakeley et al., 1986; Gupta and Mumick, 1999] und ein erster Ansatz zur Inkrementalisierung durch Programmtransformation für Programme mit Funktionen höherer Ordnung entwickelt. Dabei werden Programme zu Ableitungen transformiert, d.h. zu Programmen die Eingangsdifferenzen in Ausgangdifferenzen umwandeln. Weiterhin werden in den Kapiteln 12–13 die Korrektheit des Inkrementalisierungsansatzes für einfach-getypten und ungetypten λ-Kalkül bewiesen und Erweiterungen zu System F besprochen.
Ableitungen müssen oft Ergebnisse der ursprünglichen Programme wiederverwenden. Um eine solche Wiederverwendung zu ermöglichen, erweitert Kapitel 17 die Arbeit von Liu and Teitelbaum [1995] zu Programmen mit Funktionen höherer Ordnung und entwickeln eine Programmtransformation solcher Programme im Cache-Transfer-Stil.
Für eine effiziente Inkrementalisierung ist es weiterhin notwendig, passende Grundoperationen auszuwählen und manuell zu inkrementalisieren. Diese Arbeit deckt einen Großteil der wichtigsten Grundoperationen auf Kollektionen ab. Die Durchführung von Fallstudien zeigt deutliche Laufzeitverbesserungen sowohl in Praxis als auch in der asymptotischen Komplexität.In modern programming languages, queries on in-memory collections are often more expensive than needed. While database queries can be readily optimized, it is often not trivial to use them to express collection queries which employ nested data and first-class functions, as enabled by functional programming languages.
Collection queries can be optimized and incrementalized by hand, but this reduces modularity, and is often too error-prone to be feasible or to enable maintenance of resulting programs. To free programmers from such burdens, in this thesis we study how to optimize and incrementalize such collection queries. Resulting programs are expressed in the same core language, so that they can be subjected to other standard optimizations.
To enable optimizing collection queries which occur inside programs, we develop a staged variant of the Scala collection API that reifies queries as ASTs. On top of this interface, we adapt domain-specific optimizations from the fields of programming languages and databases; among others, we rewrite queries to use indexes chosen by programmers. Thanks to the use of indexes we show significant speedups in our experimental evaluation, with an average of 12x and a maximum of 12800x.
To incrementalize higher-order programs by program transformation, we extend finite differencing [Paige and Koenig, 1982; Blakeley et al., 1986; Gupta and Mumick, 1999] and develop the first approach to incrementalization by program transformation for higher-order programs. Base programs are transformed to derivatives, programs that transform input changes to output changes. We prove that our incrementalization approach is correct: We develop the theory underlying incrementalization for simply-typed and untyped λ-calculus, and discuss extensions to System F.
Derivatives often need to reuse results produced by base programs: to enable such reuse, we extend work by Liu and Teitelbaum [1995] to higher-order programs, and develop and prove correct a program transformation, converting higher-order programs to cache-transfer-style.
For efficient incrementalization, it is necessary to choose and incrementalize by hand appropriate primitive operations. We incrementalize a significant subset of collection operations and perform case studies, showing order-of-magnitude speedups both in practice and in asymptotic complexity
Compilation and Code Optimization for Data Analytics
The trade-offs between the use of modern high-level and low-level programming languages in constructing complex software artifacts are well known. High-level languages allow for greater programmer productivity: abstraction and genericity allow for the same functionality to be implemented with significantly less code compared to low-level languages. Modularity, object-orientation, functional programming, and powerful type systems allow programmers not only to create clean abstractions and protect them from leaking, but also to define code units that are reusable and easily composable, and software architectures that are adaptable and extensible. The abstraction, succinctness, and modularity of high-level code help to avoid software bugs and facilitate debugging and maintenance.
The use of high-level languages comes at a performance cost: increased indirection due to abstraction, virtualization, and interpretation, and superfluous work, particularly in the form of tempory memory allocation and deallocation to support objects and encapsulation.
As a result of this, the cost of high-level languages for performance-critical systems may seem prohibitive.
The vision of abstraction without regret argues that it is possible to use high-level languages for building performance-critical systems that allow for both productivity and high performance, instead of trading off the former for the latter. In this thesis, we realize this vision for building different types of data analytics systems. Our means of achieving this is by employing compilation. The goal is to compile away expensive language features -- to compile high-level code down to efficient low-level code
Compilation and Code Optimization for Data Analytics
The trade-offs between the use of modern high-level and low-level programming languages in constructing complex software artifacts are well known. High-level languages allow for greater programmer productivity: abstraction and genericity allow for the same functionality to be implemented with significantly less code compared to low-level languages. Modularity, object-orientation, functional programming, and powerful type systems allow programmers not only to create clean abstractions and protect them from leaking, but also to define code units that are reusable and easily composable, and software architectures that are adaptable and extensible. The abstraction, succinctness, and modularity of high-level code help to avoid software bugs and facilitate debugging and maintenance.
The use of high-level languages comes at a performance cost: increased indirection due to abstraction, virtualization, and interpretation, and superfluous work, particularly in the form of tempory memory allocation and deallocation to support objects and encapsulation.
As a result of this, the cost of high-level languages for performance-critical systems may seem prohibitive.
The vision of abstraction without regret argues that it is possible to use high-level languages for building performance-critical systems that allow for both productivity and high performance, instead of trading off the former for the latter. In this thesis, we realize this vision for building different types of data analytics systems. Our means of achieving this is by employing compilation. The goal is to compile away expensive language features -- to compile high-level code down to efficient low-level code