4,818 research outputs found

    Building Efficient Query Engines in a High-Level Language

    Get PDF
    Abstraction without regret refers to the vision of using high-level programming languages for systems development without experiencing a negative impact on performance. A database system designed according to this vision offers both increased productivity and high performance, instead of sacrificing the former for the latter as is the case with existing, monolithic implementations that are hard to maintain and extend. In this article, we realize this vision in the domain of analytical query processing. We present LegoBase, a query engine written in the high-level language Scala. The key technique to regain efficiency is to apply generative programming: LegoBase performs source-to-source compilation and optimizes the entire query engine by converting the high-level Scala code to specialized, low-level C code. We show how generative programming allows to easily implement a wide spectrum of optimizations, such as introducing data partitioning or switching from a row to a column data layout, which are difficult to achieve with existing low-level query compilers that handle only queries. We demonstrate that sufficiently powerful abstractions are essential for dealing with the complexity of the optimization effort, shielding developers from compiler internals and decoupling individual optimizations from each other. We evaluate our approach with the TPC-H benchmark and show that: (a) With all optimizations enabled, LegoBase significantly outperforms a commercial database and an existing query compiler. (b) Programmers need to provide just a few hundred lines of high-level code for implementing the optimizations, instead of complicated low-level code that is required by existing query compilation approaches. (c) The compilation overhead is low compared to the overall execution time, thus making our approach usable in practice for compiling query engines

    DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines

    Get PDF
    Integrated data analysis (IDA) pipelines—that combine data management (DM) and query processing, high-performance computing (HPC), and machine learning (ML) training and scoring—become increasingly common in practice. Interestingly, systems of these areas share many compilation and runtime techniques, and the used—increasingly heterogeneous—hardware infrastructure converges as well. Yet, the programming paradigms, cluster resource management, data formats and representations, as well as execution strategies differ substantially. DAPHNE is an open and extensible system infrastructure for such IDA pipelines, including language abstractions, compilation and runtime techniques, multi-level scheduling, hardware (HW) accelerators, and computational storage for increasing productivity and eliminating unnecessary overheads. In this paper, we make a case for IDA pipelines, describe the overall DAPHNE system architecture, its key components, and the design of a vectorized execution engine for computational storage, HW accelerators, as well as local and distributed operations. Preliminary experiments that compare DAPHNE with MonetDB, Pandas, DuckDB, and TensorFlow show promising results

    On Fast Large-Scale Program Analysis in Datalog

    Get PDF
    Designing and crafting a static program analysis is challenging due to the complexity of the task at hand. Among the challenges are modelling the semantics of the input language, finding suitable abstractions for the analysis, and handwriting efficient code for the analysis in a traditional imperative language such as C++. Hence, the development of static program analysis tools is costly in terms of development time and resources for real world languages. To overcome, or at least alleviate the costs of developing a static program analysis, Datalog has been proposed as a domain specific language (DSL).With Datalog, a designer expresses a static program analysis in the form of a logical specification. While a domain specific language approach aids in the ease of development of program analyses, it is commonly accepted that such an approach has worse runtime performance than handcrafted static analysis tools. In this work, we introduce a new program synthesis methodology for Datalog specifications to produce highly efficient monolithic C++ analyzers. The synthesis technique requires the re-interpretation of the semi-naĂŻve evaluation as a scaffolding for translation using partial evaluation. To achieve high-performance, we employ staged compilation techniques and specialize the underlying relational data structures for a given Datalog specification. Experimentation on benchmarks for large-scale program analysis validates the superior performance of our approach over available Datalog tools and demonstrates our competitiveness with state-of-the-art handcrafted tools

    Foundations of B2B electronic contracting

    Get PDF
    Nowadays, flexible electronic cooperation paradigms are required for core business processes to meet the speed and flexibility requirements dictated by fast-changing markets. These paradigms should include the functionality to establish the formal business relationship required by the importance of these core processes. The business relationship should be established in an automated, electronic way in order to match the speed and flexibility requirements mentioned above. As such, it should considerably improve on the ineffectiveness and inefficiency of traditional contracting in this context. The result of the establishment should be a detailed electronic contract that contains a complete specification of the intended cooperation between organizations. Electronic contracts should contain a precise and unambiguous specification of the collaboration at both the conceptual and technological level. Existing commercial software solutions for business-to-business contracting provide low level of automation and concentrate solely on the automated management of the contract enactment. However, in the modern, dynamic, business settings, an econtracting system has to support high automation of the e-contract establishment, enactment, and management. In the thesis, the business, legal, and technological requirements for the development of a highly automated e-contracting system are investigated. Models that satisfy these requirements and that can be used as a foundation for the implementation of an electronic contracting system are defined. First, the thesis presents the business benefits introduced to companies by highly automated electronic contracting. Next, a data and process analysis of electronic contracting is presented. The specification of electronic contracts and the required process support for electronic contract establishment and enactment are investigated. The business benefits and data and process models defined in the thesis are validated on the basis of two business cases from on-line advertising, namely the cases of online advertising in "De Telegraaf" and "Google". Finally, the thesis presents a specification of the functionalities that must be provided by an e-contracting system. A conceptual reference architecture that can be used as a starting point in the design and implementation of an electronic contracting system is defined. The work in the thesis is conducted on the intersection of the scientific areas of conceptual information and process modeling and specification on the one hand and distributed information system architecture modeling on the other hand

    Building Efficient Query Engines in a High-Level Language

    Get PDF
    In this paper we advocate that it is time for a radical rethinking of database systems design. Developers should be able to leverage high-level programming languages without having to pay a price in efficiency. To realize our vision of abstraction without regret, we present LegoBase, a query engine written in the high-level programming language Scala. The key technique to regain efficiency is to apply generative programming: the Scala code that constitutes the query engine, despite its high-level appearance, is actually a program generator that emits specialized, low-level C code. We show how the combination of high-level and generative programming allows to easily implement a wide spectrum of optimizations that are difficult to achieve with existing low-level query compilers, and how it can continuously optimize the query engine. We evaluate our approach with the TPC-H benchmark and show that: (a) with all optimizations enabled, our architecture significantly outperforms a commercial in-memory database system as well as an existing query compiler, (b) these performance improvements require programming just a few hundred lines of high-level code instead of complicated low-level code that is required by existing query compilers and, finally, that (c) the compilation overhead is low compared to the overall execution time, thus making our approach usable in practice for efficiently compiling query engines
    • …
    corecore