1,081 research outputs found
Tractable Query Answering and Optimization for Extensions of Weakly-Sticky Datalog+-
We consider a semantic class, weakly-chase-sticky (WChS), and a syntactic
subclass, jointly-weakly-sticky (JWS), of Datalog+- programs. Both extend that
of weakly-sticky (WS) programs, which appear in our applications to data
quality. For WChS programs we propose a practical, polynomial-time query
answering algorithm (QAA). We establish that the two classes are closed under
magic-sets rewritings. As a consequence, QAA can be applied to the optimized
programs. QAA takes as inputs the program (including the query) and semantic
information about the "finiteness" of predicate positions. For the syntactic
subclasses JWS and WS of WChS, this additional information is computable.Comment: To appear in Proc. Alberto Mendelzon WS on Foundations of Data
Management (AMW15
A survey of parallel execution strategies for transitive closure and logic programs
An important feature of database technology of the nineties is the use of parallelism for speeding up the execution of complex queries. This technology is being tested in several experimental database architectures and a few commercial systems for conventional select-project-join queries. In particular, hash-based fragmentation is used to distribute data to disks under the control of different processors in order to perform selections and joins in parallel. With the development of new query languages, and in particular with the definition of transitive closure queries and of more general logic programming queries, the new dimension of recursion has been added to query processing. Recursive queries are complex; at the same time, their regular structure is particularly suited for parallel execution, and parallelism may give a high efficiency gain. We survey the approaches to parallel execution of recursive queries that have been presented in the recent literature. We observe that research on parallel execution of recursive queries is separated into two distinct subareas, one focused on the transitive closure of Relational Algebra expressions, the other one focused on optimization of more general Datalog queries. Though the subareas seem radically different because of the approach and formalism used, they have many common features. This is not surprising, because most typical Datalog queries can be solved by means of the transitive closure of simple algebraic expressions. We first analyze the relationship between the transitive closure of expressions in Relational Algebra and Datalog programs. We then review sequential methods for evaluating transitive closure, distinguishing iterative and direct methods. We address the parallelization of these methods, by discussing various forms of parallelization. Data fragmentation plays an important role in obtaining parallel execution; we describe hash-based and semantic fragmentation. Finally, we consider Datalog queries, and present general methods for parallel rule execution; we recognize the similarities between these methods and the methods reviewed previously, when the former are applied to linear Datalog queries. We also provide a quantitative analysis that shows the impact of the initial data distribution on the performance of methods
Magic Sets for Disjunctive Datalog Programs
In this paper, a new technique for the optimization of (partially) bound
queries over disjunctive Datalog programs with stratified negation is
presented. The technique exploits the propagation of query bindings and extends
the Magic Set (MS) optimization technique.
An important feature of disjunctive Datalog is nonmonotonicity, which calls
for nondeterministic implementations, such as backtracking search. A
distinguishing characteristic of the new method is that the optimization can be
exploited also during the nondeterministic phase. In particular, after some
assumptions have been made during the computation, parts of the program may
become irrelevant to a query under these assumptions. This allows for dynamic
pruning of the search space. In contrast, the effect of the previously defined
MS methods for disjunctive Datalog is limited to the deterministic portion of
the process. In this way, the potential performance gain by using the proposed
method can be exponential, as could be observed empirically.
The correctness of MS is established thanks to a strong relationship between
MS and unfounded sets that has not been studied in the literature before. This
knowledge allows for extending the method also to programs with stratified
negation in a natural way.
The proposed method has been implemented in DLV and various experiments have
been conducted. Experimental results on synthetic data confirm the utility of
MS for disjunctive Datalog, and they highlight the computational gain that may
be obtained by the new method w.r.t. the previously proposed MS methods for
disjunctive Datalog programs. Further experiments on real-world data show the
benefits of MS within an application scenario that has received considerable
attention in recent years, the problem of answering user queries over possibly
inconsistent databases originating from integration of autonomous sources of
information.Comment: 67 pages, 19 figures, preprint submitted to Artificial Intelligenc
- …