76,683 research outputs found

    Building Efficient Query Engines in a High-Level Language

    Get PDF
    Abstraction without regret refers to the vision of using high-level programming languages for systems development without experiencing a negative impact on performance. A database system designed according to this vision offers both increased productivity and high performance, instead of sacrificing the former for the latter as is the case with existing, monolithic implementations that are hard to maintain and extend. In this article, we realize this vision in the domain of analytical query processing. We present LegoBase, a query engine written in the high-level language Scala. The key technique to regain efficiency is to apply generative programming: LegoBase performs source-to-source compilation and optimizes the entire query engine by converting the high-level Scala code to specialized, low-level C code. We show how generative programming allows to easily implement a wide spectrum of optimizations, such as introducing data partitioning or switching from a row to a column data layout, which are difficult to achieve with existing low-level query compilers that handle only queries. We demonstrate that sufficiently powerful abstractions are essential for dealing with the complexity of the optimization effort, shielding developers from compiler internals and decoupling individual optimizations from each other. We evaluate our approach with the TPC-H benchmark and show that: (a) With all optimizations enabled, LegoBase significantly outperforms a commercial database and an existing query compiler. (b) Programmers need to provide just a few hundred lines of high-level code for implementing the optimizations, instead of complicated low-level code that is required by existing query compilation approaches. (c) The compilation overhead is low compared to the overall execution time, thus making our approach usable in practice for compiling query engines

    Parameterized Compilation Lower Bounds for Restricted CNF-formulas

    Full text link
    We show unconditional parameterized lower bounds in the area of knowledge compilation, more specifically on the size of circuits in decomposable negation normal form (DNNF) that encode CNF-formulas restricted by several graph width measures. In particular, we show that - there are CNF formulas of size nn and modular incidence treewidth kk whose smallest DNNF-encoding has size nΩ(k)n^{\Omega(k)}, and - there are CNF formulas of size nn and incidence neighborhood diversity kk whose smallest DNNF-encoding has size nΩ(k)n^{\Omega(\sqrt{k})}. These results complement recent upper bounds for compiling CNF into DNNF and strengthen---quantitatively and qualitatively---known conditional low\-er bounds for cliquewidth. Moreover, they show that, unlike for many graph problems, the parameters considered here behave significantly differently from treewidth

    On the Role of Canonicity in Bottom-up Knowledge Compilation

    Get PDF
    We consider the problem of bottom-up compilation of knowledge bases, which is usually predicated on the existence of a polytime function for combining compilations using Boolean operators (usually called an Apply function). While such a polytime Apply function is known to exist for certain languages (e.g., OBDDs) and not exist for others (e.g., DNNF), its existence for certain languages remains unknown. Among the latter is the recently introduced language of Sentential Decision Diagrams (SDDs), for which a polytime Apply function exists for unreduced SDDs, but remains unknown for reduced ones (i.e. canonical SDDs). We resolve this open question in this paper and consider some of its theoretical and practical implications. Some of the findings we report question the common wisdom on the relationship between bottom-up compilation, language canonicity and the complexity of the Apply function

    Improvements to Inference Compilation for Probabilistic Programming in Large-Scale Scientific Simulators

    Full text link
    We consider the problem of Bayesian inference in the family of probabilistic models implicitly defined by stochastic generative models of data. In scientific fields ranging from population biology to cosmology, low-level mechanistic components are composed to create complex generative models. These models lead to intractable likelihoods and are typically non-differentiable, which poses challenges for traditional approaches to inference. We extend previous work in "inference compilation", which combines universal probabilistic programming and deep learning methods, to large-scale scientific simulators, and introduce a C++ based probabilistic programming library called CPProb. We successfully use CPProb to interface with SHERPA, a large code-base used in particle physics. Here we describe the technical innovations realized and planned for this library.Comment: 7 pages, 2 figure

    On the Complexity of Optimization Problems based on Compiled NNF Representations

    Full text link
    Optimization is a key task in a number of applications. When the set of feasible solutions under consideration is of combinatorial nature and described in an implicit way as a set of constraints, optimization is typically NP-hard. Fortunately, in many problems, the set of feasible solutions does not often change and is independent from the user's request. In such cases, compiling the set of constraints describing the set of feasible solutions during an off-line phase makes sense, if this compilation step renders computationally easier the generation of a non-dominated, yet feasible solution matching the user's requirements and preferences (which are only known at the on-line step). In this article, we focus on propositional constraints. The subsets L of the NNF language analyzed in Darwiche and Marquis' knowledge compilation map are considered. A number of families F of representations of objective functions over propositional variables, including linear pseudo-Boolean functions and more sophisticated ones, are considered. For each language L and each family F, the complexity of generating an optimal solution when the constraints are compiled into L and optimality is to be considered w.r.t. a function from F is identified

    Incremental Recompilation of Knowledge

    Full text link
    Approximating a general formula from above and below by Horn formulas (its Horn envelope and Horn core, respectively) was proposed by Selman and Kautz (1991, 1996) as a form of ``knowledge compilation,'' supporting rapid approximate reasoning; on the negative side, this scheme is static in that it supports no updates, and has certain complexity drawbacks pointed out by Kavvadias, Papadimitriou and Sideri (1993). On the other hand, the many frameworks and schemes proposed in the literature for theory update and revision are plagued by serious complexity-theoretic impediments, even in the Horn case, as was pointed out by Eiter and Gottlob (1992), and is further demonstrated in the present paper. More fundamentally, these schemes are not inductive, in that they may lose in a single update any positive properties of the represented sets of formulas (small size, Horn structure, etc.). In this paper we propose a new scheme, incremental recompilation, which combines Horn approximation and model-based updates; this scheme is inductive and very efficient, free of the problems facing its constituents. A set of formulas is represented by an upper and lower Horn approximation. To update, we replace the upper Horn formula by the Horn envelope of its minimum-change update, and similarly the lower one by the Horn core of its update; the key fact which enables this scheme is that Horn envelopes and cores are easy to compute when the underlying formula is the result of a minimum-change update of a Horn formula by a clause. We conjecture that efficient algorithms are possible for more complex updates.Comment: See http://www.jair.org/ for any accompanying file
    • …
    corecore