35 research outputs found
Code Generation for Efficient Query Processing in Managed Runtimes
In this paper we examine opportunities arising from the conver-gence of two trends in data management: in-memory database sys-tems (IMDBs), which have received renewed attention following the availability of affordable, very large main memory systems; and language-integrated query, which transparently integrates database queries with programming languages (thus addressing the famous ‘impedance mismatch ’ problem). Language-integrated query not only gives application developers a more convenient way to query external data sources like IMDBs, but also to use the same querying language to query an application’s in-memory collections. The lat-ter offers further transparency to developers as the query language and all data is represented in the data model of the host program-ming language. However, compared to IMDBs, this additional free-dom comes at a higher cost for query evaluation. Our vision is to improve in-memory query processing of application objects by introducing database technologies to managed runtimes. We focus on querying and we leverage query compilation to im-prove query processing on application objects. We explore dif-ferent query compilation strategies and study how they improve the performance of query processing over application data. We take C] as the host programming language as it supports language-integrated query through the LINQ framework. Our techniques de-liver significant performance improvements over the default LINQ implementation. Our work makes important first steps towards a future where data processing applications will commonly run on machines that can store their entire datasets in-memory, and will be written in a single programming language employing language-integrated query and IMDB-inspired runtimes to provide transparent and highly efficient querying. 1
Proceedings of the 3rd Workshop on Domain-Specific Language Design and Implementation (DSLDI 2015)
The goal of the DSLDI workshop is to bring together researchers and
practitioners interested in sharing ideas on how DSLs should be designed,
implemented, supported by tools, and applied in realistic application contexts.
We are both interested in discovering how already known domains such as graph
processing or machine learning can be best supported by DSLs, but also in
exploring new domains that could be targeted by DSLs. More generally, we are
interested in building a community that can drive forward the development of
modern DSLs. These informal post-proceedings contain the submitted talk
abstracts to the 3rd DSLDI workshop (DSLDI'15), and a summary of the panel
discussion on Language Composition
InterPoll: Crowd-Sourced Internet Polls
Crowd-sourcing is increasingly being used to provide answers to online polls and surveys. However, existing systems, while taking care of the mechanics of attracting crowd workers, poll building, and payment, provide little to help the survey-maker or pollster in obtaining statistically significant results devoid of even the obvious selection biases.
This paper proposes InterPoll, a platform for programming of crowd-sourced polls. Pollsters express polls as embedded LINQ queries and the runtime correctly reasons about uncertainty in those polls, only polling as many people as required to meet statistical guarantees. To optimize the cost of polls, InterPoll performs query optimization, as well as bias correction and power analysis. The goal of InterPoll is to provide a system that can be reliably used for research into marketing, social and political science questions.
This paper highlights some of the existing challenges and how InterPoll is designed to address most of them.
In this paper we summarize some of the work we have already done and give an outline for future work
Query Lifting: Language-integrated query for heterogeneous nested collections
Language-integrated query based on comprehension syntax is a powerful
technique for safe database programming, and provides a basis for advanced
techniques such as query shredding or query flattening that allow efficient
programming with complex nested collections. However, the foundations of these
techniques are lacking: although SQL, the most widely-used database query
language, supports heterogeneous queries that mix set and multiset semantics,
these important capabilities are not supported by known correctness results or
implementations that assume homogeneous collections. In this paper we study
language-integrated query for a heterogeneous query language
that combines set and multiset constructs. We show how
to normalize and translate queries to SQL, and develop a novel approach to
querying heterogeneous nested collections, based on the insight that ``local''
query subexpressions that calculate nested subcollections can be ``lifted'' to
the top level analogously to lambda-lifting for local function definitions.Comment: Full version of ESOP 2021 conference pape
Efficient query processing in managed runtimes
This thesis presents strategies to improve the query evaluation performance over
huge volumes of relational-like data that is stored in the memory space of managed
applications. Storing and processing application data in the memory space of managed
applications is motivated by the convergence of two recent trends in data management.
First, dropping DRAM prices have led to memory capacities that allow the entire working
set of an application to fit into main memory and to the emergence of in-memory
database systems (IMDBs). Second, language-integrated query transparently integrates
query processing syntax into programming languages and, therefore, allows complex
queries to be composed in the application. IMDBs typically serve as data stores to applications
written in an object-oriented language running on a managed runtime. In
this thesis, we propose a deeper integration of the two by storing all application data in
the memory space of the application and using language-integrated query, combined
with query compilation techniques, to provide fast query processing.
As a starting point, we look into storing data as runtime-managed objects in collection
types provided by the programming language. Queries are formulated using
language-integrated query and dynamically compiled to specialized functions that produce
the result of the query in a more efficient way by leveraging query compilation
techniques similar to those used in modern database systems. We show that the generated
query functions significantly improve query processing performance compared to
the default execution model for language-integrated query. However, we also identify
additional inefficiencies that can only be addressed by processing queries using low-level
techniques which cannot be applied to runtime-managed objects. To address this,
we introduce a staging phase in the generated code that makes query-relevant managed
data accessible to low-level query code. Our experiments in .NET show an improvement
in query evaluation performance of up to an order of magnitude over the default
language-integrated query implementation.
Motivated by additional inefficiencies caused by automatic garbage collection, we
introduce a new collection type, the black-box collection. Black-box collections integrate
the in-memory storage layer of a relational database system to store data and hide
the internal storage layout from the application by employing existing object-relational
mapping techniques (hence, the name black-box). Our experiments show that black-box
collections provide better query performance than runtime-managed collections
by allowing the generated query code to directly access the underlying relational in-memory
data store using low-level techniques. Black-box collections also outperform
a modern commercial database system. By removing huge volumes of collection data
from the managed heap, black-box collections further improve the overall performance
and response time of the application and improve the application’s scalability when
facing huge volumes of collection data.
To enable a deeper integration of the data store with the application, we introduce
self-managed collections. Self-managed collections are a new type of collection for
managed applications that, in contrast to black-box collections, store objects. As the
data elements stored in the collection are objects, they are directly accessible from the
application using references which allows for better integration of the data store with
the application. Self-managed collections manually manage the memory of objects
stored within them in a private heap that is excluded from garbage collection. We introduce
a special collection syntax and a novel type-safe manual memory management
system for this purpose. As was the case for black-box collections, self-managed collections
improve query performance by utilizing a database-inspired data layout and
allowing the use of low-level techniques. By also supporting references between collection
objects, they outperform black-box collections
Proceedings of the 3rd Workshop on Domain-Specific Language Design and Implementation (DSLDI'15)
The goal of the DSLDI workshop is to bring together researchers and practitioners interested in sharing ideas on how DSLs should be designed, implemented, supported by tools, and applied in realistic application contexts. We are both interested in discovering how already known domains such as graph processing or machine learning can be best supported by DSLs, but also in exploring new domains that could be targeted by DSLs. More generally, we are interested in building a community that can drive forward the development of modern DSLs. These informal post-proceedings contain the submitted talk abstracts to the 3rd DSLDI workshop (DSLDI'15), and a summary of the panel discussion on Language Composition
Cross-tier web programming for curated databases: a case study
Curated databases have become important sources of information across several scientific disciplines, and as the result of manual work of experts, often become important reference works. Features such as provenance tracking, archiving, and data citation are widely regarded as important features for the curated databases, but implementing such features is challenging, and small database projects often lack the resources to do so.
A scientific database application is not just the relational database itself, but also an ecosystem of web applications to display the data, and applications which allow data curation. Supporting advanced curation features requires changing all of these components, and there is currently no way to provide such capabilities in a reusable way.
Cross-tier programming languages have been proposed to simplify the creation of web applications, where developers can write an application in a single, uniform language. Consequently, database queries and updates can be written in the same language as the rest of the program, and at least in principle, it should be possible to provide curation features reusably via program transformations. As a first step towards this goal, it is important to establish that realistic curated databases can be implemented in a cross-tier programming language.
In this paper, we describe such a case study: reimplementing the web front end of a real world scientific database, the IUPHAR/BPS Guide to Pharmacology (GtoPdb), in the Links cross-tier programming language. We show how programming language features such as language-integrated query simplify the development process, and rule out common errors. Through a comparative performance evaluation, we show that the Links implementation performs fewer database queries, while the time needed to handle the queries is comparable to the Java version. Furthermore, while there is some overhead to using Links because of its comparative immaturity compared to Java, the Links version is usable as a proof-of-concept case study of cross-tier programming for curated databases.
[ This paper is a conference pre-print presented at IDCC 2020 after lightweight peer review. The most up-to-date version of the paper can be found on arXiv https://arxiv.org/abs/2003.03845