29 research outputs found
Distribution Policies for Datalog
Modern data management systems extensively use parallelism to speed up query processing over massive volumes of data. This trend has inspired a rich line of research on how to formally reason about the parallel complexity of join computation. In this paper, we go beyond joins and study the parallel evaluation of recursive queries. We introduce a novel framework to reason about multi-round evaluation of Datalog programs, which combines implicit predicate restriction with distribution policies to allow expressing a combination of data-parallel and query-parallel evaluation strategies. Using our framework, we reason about key properties of distributed Datalog evaluation, including parallel-correctness of the evaluation strategy, disjointness of the computation effort, and bounds on the number of communication rounds
Static Analysis of Graph Database Transformations
We investigate graph transformations, defined using Datalog-like rules based
on acyclic conjunctive two-way regular path queries (acyclic C2RPQs), and we
study two fundamental static analysis problems: type checking and equivalence
of transformations in the presence of graph schemas. Additionally, we
investigate the problem of target schema elicitation, which aims to construct a
schema that closely captures all outputs of a transformation over graphs
conforming to the input schema. We show all these problems are in EXPTIME by
reducing them to C2RPQ containment modulo schema; we also provide matching
lower bounds. We use cycle reversing to reduce query containment to the problem
of unrestricted (finite or infinite) satisfiability of C2RPQs modulo a theory
expressed in a description logic
On the Optimization of Iterative Programming with Distributed Data Collections
Big data programming frameworks are becoming increasingly important for the development of applications for which performance and scalability are critical. In those complex frameworks, optimizing code by hand is hard and time-consuming, making automated optimization particularly necessary. In order to automate optimization, a prerequisite is to find suitable abstractions to represent programs; for instance, algebras based on monads or monoids to represent distributed data collections. Currently, however, such algebras do not represent recursive programs in a way which allows for analyzing or rewriting them. In this paper, we extend a monoid algebra with a fixpoint operator for representing recursion as a first class citizen and show how it enables new optimizations. Experiments with the Spark platform illustrate performance gains brought by these systematic optimizations
Non-polynomial Worst-Case Analysis of Recursive Programs
We study the problem of developing efficient approaches for proving
worst-case bounds of non-deterministic recursive programs. Ranking functions
are sound and complete for proving termination and worst-case bounds of
nonrecursive programs. First, we apply ranking functions to recursion,
resulting in measure functions. We show that measure functions provide a sound
and complete approach to prove worst-case bounds of non-deterministic recursive
programs. Our second contribution is the synthesis of measure functions in
nonpolynomial forms. We show that non-polynomial measure functions with
logarithm and exponentiation can be synthesized through abstraction of
logarithmic or exponentiation terms, Farkas' Lemma, and Handelman's Theorem
using linear programming. While previous methods obtain worst-case polynomial
bounds, our approach can synthesize bounds of the form
as well as where is not an integer. We present
experimental results to demonstrate that our approach can obtain efficiently
worst-case bounds of classical recursive algorithms such as (i) Merge-Sort, the
divide-and-conquer algorithm for the Closest-Pair problem, where we obtain
worst-case bound, and (ii) Karatsuba's algorithm for
polynomial multiplication and Strassen's algorithm for matrix multiplication,
where we obtain bound such that is not an integer and
close to the best-known bounds for the respective algorithms.Comment: 54 Pages, Full Version to CAV 201
Stochastic Invariants for Probabilistic Termination
Termination is one of the basic liveness properties, and we study the
termination problem for probabilistic programs with real-valued variables.
Previous works focused on the qualitative problem that asks whether an input
program terminates with probability~1 (almost-sure termination). A powerful
approach for this qualitative problem is the notion of ranking supermartingales
with respect to a given set of invariants. The quantitative problem
(probabilistic termination) asks for bounds on the termination probability. A
fundamental and conceptual drawback of the existing approaches to address
probabilistic termination is that even though the supermartingales consider the
probabilistic behavior of the programs, the invariants are obtained completely
ignoring the probabilistic aspect.
In this work we address the probabilistic termination problem for
linear-arithmetic probabilistic programs with nondeterminism. We define the
notion of {\em stochastic invariants}, which are constraints along with a
probability bound that the constraints hold. We introduce a concept of {\em
repulsing supermartingales}. First, we show that repulsing supermartingales can
be used to obtain bounds on the probability of the stochastic invariants.
Second, we show the effectiveness of repulsing supermartingales in the
following three ways: (1)~With a combination of ranking and repulsing
supermartingales we can compute lower bounds on the probability of termination;
(2)~repulsing supermartingales provide witnesses for refutation of almost-sure
termination; and (3)~with a combination of ranking and repulsing
supermartingales we can establish persistence properties of probabilistic
programs.
We also present results on related computational problems and an experimental
evaluation of our approach on academic examples.Comment: Full version of a paper published at POPL 2017. 20 page