8 research outputs found
Object Graph Programming
We introduce Object Graph Programming (OGO), which enables reading and
modifying an object graph (i.e., the entire state of the object heap) via
declarative queries. OGO models the objects and their relations in the heap as
an object graph thereby treating the heap as a graph database: each node in the
graph is an object (e.g., an instance of a class or an instance of a metadata
class) and each edge is a relation between objects (e.g., a field of one object
references another object). We leverage Cypher, the most popular query language
for graph databases, as OGO's query language. Unlike LINQ, which uses
collections (e.g., List) as a source of data, OGO views the entire object graph
as a single "collection". OGO is ideal for querying collections (just like
LINQ), introspecting the runtime system state (e.g., finding all instances of a
given class or accessing fields via reflection), and writing assertions that
have access to the entire program state. We prototyped OGO for Java in two
ways: (a) by translating an object graph into a Neo4j database on which we run
Cypher queries, and (b) by implementing our own in-memory graph query engine
that directly queries the object heap. We used OGO to rewrite hundreds of
statements in large open-source projects into OGO queries. We report our
experience and performance of our prototypes.Comment: 13 pages, ICSE 202
Comprehending Ringads for Phil Wadler, on the occasion of his 60th birthday
Abstract. List comprehensions are a widely used programming construct, in languages such as Haskell and Python and in technologies such as Microsoft's Language Integrated Query. They generalize from lists to arbitrary monads, yielding a lightweight idiom of imperative programming in a pure functional language. When the monad has the additional structure of a so-called ringad, corresponding to 'empty' and 'union' operations, then it can be seen as some kind of collection type, and the comprehension notation can also be extended to incorporate aggregations. Ringad comprehensions represent a convenient notation for expressing database queries. The ringad structure alone does not provide a good explanation or an efficient implementation of relational joins; but by allowing heterogeneous comprehensions, involving both bag and indexed table ringads, we show how to accommodate these too
Relational Algebra by Way of Adjunctions
Bulk types such as sets, bags, and lists are monads, and therefore support a notation for database queries based on comprehensions. This fact is the basis of much work on database query languages. The monadic structure easily explains most of standard relational algebra—specifically, selections and projections—allowing for an elegant mathematical foundation for those aspects of database query language design. Most, but not all: monads do not immediately offer an explanation of relational join or grouping, and hence important foundations for those crucial aspects of relational algebra are missing. The best they can offer is cartesian product followed by selection. Adjunctions come to the rescue: like any monad, bulk types also arise from certain adjunctions; we show that by paying due attention to other important adjunctions, we can elegantly explain the rest of standard relational algebra. In particular, graded monads provide a mathematical foundation for indexing and grouping, which leads directly to an efficient implementation, even of joins
Language-integrated provenance
Provenance is metadata about the where, the why, and the how of data. It is
evidence which can answer questions such as: Where exactly did this piece of
data come from? Why is this row in my result? How was it produced? Answers
to these questions are useful for judging the trustworthiness of data, and for
finding and correcting mistakes.
Most programs that use a database at all, already use one crude form of
provenance: they manually propagate row identifiers together with database
values, just in case they need to be updated later. More sophisticated forms
of provenance are exceedingly rare, because they are more difficult to implement
manually. Tools to calculate data provenance systematically, only exist
as research prototypes. Even standard database systems are hard to set up, as
evidenced by the rise of hosted database services, so there is little suprise that
prototypes of provenance systems are not used much.
This dissertation shows how a programming language can provide support
for provenance. Based on language-integrated query technology, it can systematically
rewrite queries to produce various forms of provenance. We describe
such query transformations for where-provenance and lineage, and discuss
how to enable programmers to define their own forms of provenance. Thanks
to query normalization the resulting queries still execute efficiently on mainstream
database systems. A programming language can help further by giving
provenance metadata precise types to ensure that it is handled appropriately.
Language-integrated queries make it easy to write programs that deal with
data, no special query language needed. Language-integrated provenance
makes it as easy to deal with data provenance, no special database needed