18 research outputs found
Taking I/O seriously: resolution reconsidered for disk
Journal ArticleModern compilation techniques can give Prolog programs, in the best cases, a speed comparable to C. However, Prolog has proven to be unacceptable for data-oriented queries for two major reasons: its poor termination and complexity properties for Datalog, and its tuple-at-a-time strategy. A number of tabling frameworks and systems have addressed the first problem, including the XSB system which has achieved Prolog speeds for tabled programs. Yet tabling systems such as XSB continue to use the tuple-at-a-time paradigm. As a result, these systems are not amenable to a tight interconnection with disk-resident data. However, in a tabling framework the difference between tuple-at-a-time behavior and set-at-a-time can be viewed as one of scheduling. Accordingly, we define a breadth-first set-at-a-time tabling strategy and prove it iteration equivalent to a form of semi-naive magic evaluation. That is, we extend the well-known asymptotic results of Seki [10] by proving that each iteration of the tabling strategy produces the same information as semi-naive magic. Further, this set-at-a-time scheduling is amenable to implementation in an engine that uses Prolog compilation. We describe both the engine and its performance, which is comparable with the tuple-at-a-time strategy even for in-memory Datalog queries. Because of its performance and its fine level of integration of Prolog with a database-style search, the set-at-a-time engine appears as an important key to linking logic programming and deductive databases
Finding Cross-rule Optimization Bugs in Datalog Engines
Datalog is a popular and widely-used declarative logic programming language.
Datalog engines apply many cross-rule optimizations; bugs in them can cause
incorrect results. To detect such optimization bugs, we propose an automated
testing approach called Incremental Rule Evaluation (IRE), which
synergistically tackles the test oracle and test case generation problem. The
core idea behind the test oracle is to compare the results of an optimized
program and a program without cross-rule optimization; any difference indicates
a bug in the Datalog engine. Our core insight is that, for an optimized,
incrementally-generated Datalog program, we can evaluate all rules individually
by constructing a reference program to disable the optimizations that are
performed among multiple rules. Incrementally generating test cases not only
allows us to apply the test oracle for every new rule generated-we also can
ensure that every newly added rule generates a non-empty result with a given
probability and eschew recomputing already-known facts. We implemented IRE as a
tool named Deopt, and evaluated Deopt on four mature Datalog engines, namely
Souffl\'e, CozoDB, Z, and DDlog, and discovered a total of 30 bugs. Of
these, 13 were logic bugs, while the remaining were crash and error bugs. Deopt
can detect all bugs found by queryFuzz, a state-of-the-art approach. Out of the
bugs identified by Deopt, queryFuzz might be unable to detect 5. Our
incremental test case generation approach is efficient; for example, for test
cases containing 60 rules, our incremental approach can produce 1.17
(for DDlog) to 31.02 (for Souffl\'e) as many valid test cases with
non-empty results as the naive random method. We believe that the simplicity
and the generality of the approach will lead to its wide adoption in practice.Comment: The ACM SIGPLAN Conference on Object Oriented Programming, Systems,
Languages, and Applications (2024), Pasadena, California, United State
Incorporating Stratified Negation into Query-Subquery Nets for Evaluating Queries to Stratified Deductive Databases
Most of the previously known evaluation methods for deductive databases are either breadth-first or depth-first (and recursive). There are cases when these strategies are not the best ones. It is desirable to have an evaluation framework for stratified DatalogN that is goal-driven, set-at-a-time (as opposed to tuple-at-a-time) and adjustable w.r.t. flow-of-control strategies. These properties are important for efficient query evaluation on large and complex deductive databases. In this paper, by incorporating stratified negation into so-called query-subquery nets, we develop an evaluation framework, called QSQNSTR, with such properties for evaluating queries to stratified DatalogN databases. A variety of flow-of-control strategies can be used for QSQNSTR. The generic evaluation method QSQNSTR for stratified DatalogN is sound, complete and has a PTIME data complexity
Magic Sets for Disjunctive Datalog Programs
In this paper, a new technique for the optimization of (partially) bound
queries over disjunctive Datalog programs with stratified negation is
presented. The technique exploits the propagation of query bindings and extends
the Magic Set (MS) optimization technique.
An important feature of disjunctive Datalog is nonmonotonicity, which calls
for nondeterministic implementations, such as backtracking search. A
distinguishing characteristic of the new method is that the optimization can be
exploited also during the nondeterministic phase. In particular, after some
assumptions have been made during the computation, parts of the program may
become irrelevant to a query under these assumptions. This allows for dynamic
pruning of the search space. In contrast, the effect of the previously defined
MS methods for disjunctive Datalog is limited to the deterministic portion of
the process. In this way, the potential performance gain by using the proposed
method can be exponential, as could be observed empirically.
The correctness of MS is established thanks to a strong relationship between
MS and unfounded sets that has not been studied in the literature before. This
knowledge allows for extending the method also to programs with stratified
negation in a natural way.
The proposed method has been implemented in DLV and various experiments have
been conducted. Experimental results on synthetic data confirm the utility of
MS for disjunctive Datalog, and they highlight the computational gain that may
be obtained by the new method w.r.t. the previously proposed MS methods for
disjunctive Datalog programs. Further experiments on real-world data show the
benefits of MS within an application scenario that has received considerable
attention in recent years, the problem of answering user queries over possibly
inconsistent databases originating from integration of autonomous sources of
information.Comment: 67 pages, 19 figures, preprint submitted to Artificial Intelligenc