231 research outputs found

    Making an Embedded DBMS JIT-friendly

    Get PDF
    While database management systems (DBMSs) are highly optimized, interactions across the boundary between the programming language (PL) and the DBMS are costly, even for in-process embedded DBMSs. In this paper, we show that programs that interact with the popular embedded DBMS SQLite can be significantly optimized - by a factor of 3.4 in our benchmarks - by inlining across the PL / DBMS boundary. We achieved this speed-up by replacing parts of SQLite's C interpreter with RPython code and composing the resulting meta-tracing virtual machine (VM) - called SQPyte - with the PyPy VM. SQPyte does not compromise stand-alone SQL performance and is 2.2% faster than SQLite on the widely used TPC-H benchmark suite.Comment: 24 pages, 18 figure

    Cross-tier web programming for curated databases: a case study

    Get PDF
    Curated databases have become important sources of information across several scientific disciplines, and as the result of manual work of experts, often become important reference works. Features such as provenance tracking, archiving, and data citation are widely regarded as important features for the curated databases, but implementing such features is challenging, and small database projects often lack the resources to do so. A scientific database application is not just the relational database itself, but also an ecosystem of web applications to display the data, and applications which allow data curation. Supporting advanced curation features requires changing all of these components, and there is currently no way to provide such capabilities in a reusable way. Cross-tier programming languages have been proposed to simplify the creation of web applications, where developers can write an application in a single, uniform language. Consequently, database queries and updates can be written in the same language as the rest of the program, and at least in principle, it should be possible to provide curation features reusably via program transformations. As a first step towards this goal, it is important to establish that realistic curated databases can be implemented in a cross-tier programming language. In this paper, we describe such a case study: reimplementing the web front end of a real world scientific database, the IUPHAR/BPS Guide to Pharmacology (GtoPdb), in the Links cross-tier programming language. We show how programming language features such as language-integrated query simplify the development process, and rule out common errors. Through a comparative performance evaluation, we show that the Links implementation performs fewer database queries, while the time needed to handle the queries is comparable to the Java version. Furthermore, while there is some overhead to using Links because of its comparative immaturity compared to Java, the Links version is usable as a proof-of-concept case study of cross-tier programming for curated databases. [ This paper is a conference pre-print presented at IDCC 2020 after lightweight peer review. The most up-to-date version of the paper can be found on arXiv https://arxiv.org/abs/2003.03845

    Query Lifting: Language-integrated query for heterogeneous nested collections

    Get PDF
    Language-integrated query based on comprehension syntax is a powerful technique for safe database programming, and provides a basis for advanced techniques such as query shredding or query flattening that allow efficient programming with complex nested collections. However, the foundations of these techniques are lacking: although SQL, the most widely-used database query language, supports heterogeneous queries that mix set and multiset semantics, these important capabilities are not supported by known correctness results or implementations that assume homogeneous collections. In this paper we study language-integrated query for a heterogeneous query language NRCλ(Set,Bag)NRC_\lambda(Set,Bag) that combines set and multiset constructs. We show how to normalize and translate queries to SQL, and develop a novel approach to querying heterogeneous nested collections, based on the insight that ``local'' query subexpressions that calculate nested subcollections can be ``lifted'' to the top level analogously to lambda-lifting for local function definitions.Comment: Full version of ESOP 2021 conference pape

    A Practical Theory of Language-integrated Query

    Get PDF
    Language-integrated query is receiving renewed attention, in part because of its support through Microsoft’s LINQ framework. We present a practical theory of language-integrated query based on quotation and normalisation of quoted terms. Our technique supports join queries, abstraction over values and predicates, composition of queries, dynamic generation of queries, and queries with nested intermediate data. Higher-order features prove useful even for constructing first-order queries. We prove a theorem characterising when a host query is guaranteed to generate a single SQL query. We present experimental results confirming our technique works, even in situations where Microsoft’s LINQ framework either fails to produce an SQL query or, in one case, produces an avalanche of SQL queries

    Practical Normalization by Evaluation for EDSLs

    Get PDF
    Embedded domain-specific languages (eDSLs) are typically implemented in a rich host language, such as Haskell, using a combination of deep and shallow embedding techniques. While such a combination enables programmers to exploit the execution mechanism of Haskell to build and specialize eDSL programs, it blurs the distinction between the host language and the eDSL. As a consequence, extension with features such as sums and effects requires a significant amount of ingenuity from the eDSL designer. In this paper, we demonstrate that Normalization by Evaluation (NbE) provides a principled framework for building, extending, and customizing eDSLs. We present a comprehensive treatment of NbE for deeply embedded eDSLs in Haskell that involves a rich set of features such as sums, arrays, exceptions and state, while addressing practical concerns about normalization such as code expansion and the addition of domain-specific features

    Query Flattening and the Nested Data Parallelism Paradigm

    Get PDF
    This work is based on the observation that languages for two seemingly distant domains are closely related. Orthogonal query languages based on comprehension syntax admit various forms of query nesting to construct nested query results and express complex predicates. Languages for nested data parallelism allow to nest parallel iterators and thereby admit the parallel evaluation of computations that are themselves parallel. Both kinds of languages center around the application of side-effect-free functions to each element of a collection. The motivation for this work is the seamless integration of relational database queries with programming languages. In frameworks for language-integrated database queries, a host language's native collection-programming API is used to express queries. To mediate between native collection programming and relational queries, we define an expressive, orthogonal query calculus that supports nesting and order. The challenge of query flattening is to translate this calculus to bundles of efficient relational queries restricted to flat, unordered multisets. Prior approaches to query flattening either support only query languages that lack in expressiveness or employ a complex, monolithic translation that is hard to comprehend and generates inefficient code that is hard to optimize. To improve on those approaches, we draw on the similarity to nested data parallelism. Blelloch's flattening transformation is a static program transformation that translates nested data parallelism to flat data parallel programs over flat arrays. Based on the flattening transformation, we describe a pipeline of small, comprehensible lowering steps that translates our nested query calculus to a bundle of relational queries. The pipeline is based on a number of well-defined intermediate languages. Our translation adopts the key concepts of the flattening transformation but is designed with specifics of relational query processing in mind. Based on this translation, we revisit all aspects of query flattening. Our translation is fully compositional and can translate any term of the input language. Like prior work, the translation by itself produces inefficient code due to compositionality that is not fit for execution without optimization. In contrast to prior work, we show that query optimization is orthogonal to flattening and can be performed before flattening. We employ well-known work on logical query optimization for nested query languages and demonstrate that this body of work integrates well with our approach. Furthermore, we describe an improved encoding of ordered and nested collections in terms of flat, unordered multisets. Our approach emits idiomatic relational queries in which the effort required to maintain the non-relational semantics of the source language (order and nesting) is minimized. A set of experiments provides evidence that our approach to query flattening can handle complex, list-based queries with nested results and nested intermediate data well. We apply our approach to a number of flat and nested benchmark queries and compare their runtime with hand-written SQL queries. In these experiments, our SQL code generated from a list-based nested query language usually performs as well as hand-written queries

    Machine Learning on Large Databases: Transforming Hidden Markov Models to SQL Statements

    Get PDF
    Machine Learning is a research field with substantial relevance for many applications in different areas. Because of technical improvements in sensor technology, its value for real life applications has even increased within the last years. Nowadays, it is possible to gather massive amounts of data at any time with comparatively little costs. While this availability of data could be used to develop complex models, its implementation is often narrowed because of limitations in computing power. In order to overcome performance problems, developers have several options, such as improving their hardware, optimizing their code, or use parallelization techniques like the MapReduce framework. Anyhow, these options might be too cost intensive, not suitable, or even too time expensive to learn and realize. Following the premise that developers usually are not SQL experts we would like to discuss another approach in this paper: using transparent database support for Big Data Analytics. Our aim is to automatically transform Machine Learning algorithms to parallel SQL database systems. In this paper, we especially show how a Hidden Markov Model, given in the analytics language R, can be transformed to a sequence of SQL statements. These SQL statements will be the basis for a (inter-operator and intra-operator) parallel execution on parallel DBMS as a second step of our research, not being part of this paper

    Relational Algebra by Way of Adjunctions

    Get PDF
    Bulk types such as sets, bags, and lists are monads, and therefore support a notation for database queries based on comprehensions. This fact is the basis of much work on database query languages. The monadic structure easily explains most of standard relational algebra—specifically, selections and projections—allowing for an elegant mathematical foundation for those aspects of database query language design. Most, but not all: monads do not immediately offer an explanation of relational join or grouping, and hence important foundations for those crucial aspects of relational algebra are missing. The best they can offer is cartesian product followed by selection. Adjunctions come to the rescue: like any monad, bulk types also arise from certain adjunctions; we show that by paying due attention to other important adjunctions, we can elegantly explain the rest of standard relational algebra. In particular, graded monads provide a mathematical foundation for indexing and grouping, which leads directly to an efficient implementation, even of joins

    Centrality and content creation in networks: the case of German Wikipedia

    Full text link
    When contributing content on large online platforms, producers of user-generated content have to decide where to contribute. On a complex and dynamic platform like Wikipedia, this decision is expected to depend on the way the content is organized. We analyse whether the hyperlinks on Wikipedia channel the attention of producers towards more central articles. We observe a sample 7; 635 articles belonging to the category \Economics" on German Wikipedia over 153 weeks and measure their centrality both within this category and in the network of over one million German Wikipedia articles. Our analysis reveals that an additional link from the observed category is associated with around 140 bytes of additional content and with an increase in the number of authors by nearly 0:5. Moreover we observe that the rate of content generation increases notably when previously unlinked articles get connected to the main cluster in the category

    City of South Portland Annual Report 2014

    Get PDF
    corecore