76,683 research outputs found
Building Efficient Query Engines in a High-Level Language
Abstraction without regret refers to the vision of using high-level
programming languages for systems development without experiencing a negative
impact on performance. A database system designed according to this vision
offers both increased productivity and high performance, instead of sacrificing
the former for the latter as is the case with existing, monolithic
implementations that are hard to maintain and extend. In this article, we
realize this vision in the domain of analytical query processing. We present
LegoBase, a query engine written in the high-level language Scala. The key
technique to regain efficiency is to apply generative programming: LegoBase
performs source-to-source compilation and optimizes the entire query engine by
converting the high-level Scala code to specialized, low-level C code. We show
how generative programming allows to easily implement a wide spectrum of
optimizations, such as introducing data partitioning or switching from a row to
a column data layout, which are difficult to achieve with existing low-level
query compilers that handle only queries. We demonstrate that sufficiently
powerful abstractions are essential for dealing with the complexity of the
optimization effort, shielding developers from compiler internals and
decoupling individual optimizations from each other. We evaluate our approach
with the TPC-H benchmark and show that: (a) With all optimizations enabled,
LegoBase significantly outperforms a commercial database and an existing query
compiler. (b) Programmers need to provide just a few hundred lines of
high-level code for implementing the optimizations, instead of complicated
low-level code that is required by existing query compilation approaches. (c)
The compilation overhead is low compared to the overall execution time, thus
making our approach usable in practice for compiling query engines
Parameterized Compilation Lower Bounds for Restricted CNF-formulas
We show unconditional parameterized lower bounds in the area of knowledge
compilation, more specifically on the size of circuits in decomposable negation
normal form (DNNF) that encode CNF-formulas restricted by several graph width
measures. In particular, we show that
- there are CNF formulas of size and modular incidence treewidth
whose smallest DNNF-encoding has size , and
- there are CNF formulas of size and incidence neighborhood diversity
whose smallest DNNF-encoding has size .
These results complement recent upper bounds for compiling CNF into DNNF and
strengthen---quantitatively and qualitatively---known conditional low\-er
bounds for cliquewidth. Moreover, they show that, unlike for many graph
problems, the parameters considered here behave significantly differently from
treewidth
On the Role of Canonicity in Bottom-up Knowledge Compilation
We consider the problem of bottom-up compilation of knowledge bases, which is
usually predicated on the existence of a polytime function for combining
compilations using Boolean operators (usually called an Apply function). While
such a polytime Apply function is known to exist for certain languages (e.g.,
OBDDs) and not exist for others (e.g., DNNF), its existence for certain
languages remains unknown. Among the latter is the recently introduced language
of Sentential Decision Diagrams (SDDs), for which a polytime Apply function
exists for unreduced SDDs, but remains unknown for reduced ones (i.e. canonical
SDDs). We resolve this open question in this paper and consider some of its
theoretical and practical implications. Some of the findings we report question
the common wisdom on the relationship between bottom-up compilation, language
canonicity and the complexity of the Apply function
Improvements to Inference Compilation for Probabilistic Programming in Large-Scale Scientific Simulators
We consider the problem of Bayesian inference in the family of probabilistic
models implicitly defined by stochastic generative models of data. In
scientific fields ranging from population biology to cosmology, low-level
mechanistic components are composed to create complex generative models. These
models lead to intractable likelihoods and are typically non-differentiable,
which poses challenges for traditional approaches to inference. We extend
previous work in "inference compilation", which combines universal
probabilistic programming and deep learning methods, to large-scale scientific
simulators, and introduce a C++ based probabilistic programming library called
CPProb. We successfully use CPProb to interface with SHERPA, a large code-base
used in particle physics. Here we describe the technical innovations realized
and planned for this library.Comment: 7 pages, 2 figure
On the Complexity of Optimization Problems based on Compiled NNF Representations
Optimization is a key task in a number of applications. When the set of
feasible solutions under consideration is of combinatorial nature and described
in an implicit way as a set of constraints, optimization is typically NP-hard.
Fortunately, in many problems, the set of feasible solutions does not often
change and is independent from the user's request. In such cases, compiling the
set of constraints describing the set of feasible solutions during an off-line
phase makes sense, if this compilation step renders computationally easier the
generation of a non-dominated, yet feasible solution matching the user's
requirements and preferences (which are only known at the on-line step). In
this article, we focus on propositional constraints. The subsets L of the NNF
language analyzed in Darwiche and Marquis' knowledge compilation map are
considered. A number of families F of representations of objective functions
over propositional variables, including linear pseudo-Boolean functions and
more sophisticated ones, are considered. For each language L and each family F,
the complexity of generating an optimal solution when the constraints are
compiled into L and optimality is to be considered w.r.t. a function from F is
identified
Incremental Recompilation of Knowledge
Approximating a general formula from above and below by Horn formulas (its
Horn envelope and Horn core, respectively) was proposed by Selman and Kautz
(1991, 1996) as a form of ``knowledge compilation,'' supporting rapid
approximate reasoning; on the negative side, this scheme is static in that it
supports no updates, and has certain complexity drawbacks pointed out by
Kavvadias, Papadimitriou and Sideri (1993). On the other hand, the many
frameworks and schemes proposed in the literature for theory update and
revision are plagued by serious complexity-theoretic impediments, even in the
Horn case, as was pointed out by Eiter and Gottlob (1992), and is further
demonstrated in the present paper. More fundamentally, these schemes are not
inductive, in that they may lose in a single update any positive properties of
the represented sets of formulas (small size, Horn structure, etc.). In this
paper we propose a new scheme, incremental recompilation, which combines Horn
approximation and model-based updates; this scheme is inductive and very
efficient, free of the problems facing its constituents. A set of formulas is
represented by an upper and lower Horn approximation. To update, we replace the
upper Horn formula by the Horn envelope of its minimum-change update, and
similarly the lower one by the Horn core of its update; the key fact which
enables this scheme is that Horn envelopes and cores are easy to compute when
the underlying formula is the result of a minimum-change update of a Horn
formula by a clause. We conjecture that efficient algorithms are possible for
more complex updates.Comment: See http://www.jair.org/ for any accompanying file
- …