6 research outputs found

    Provenance, Incremental Evaluation, and Debugging in Datalog

    Get PDF
    The Datalog programming language has recently found increasing traction in research and industry. Driven by its clean declarative semantics, along with its conciseness and ease of use, Datalog has been adopted for a wide range of important applications, such as program analysis, graph problems, and networking. To enable this adoption, modern Datalog engines have implemented advanced language features and high-performance evaluation of Datalog programs. Unfortunately, critical infrastructure and tooling to support Datalog users and developers are still missing. For example, there are only limited tools addressing the crucial debugging problem, where developers can spend up to 30% of their time finding and fixing bugs. This thesis addresses Datalog’s tooling gaps, with the ultimate goal of improving the productivity of Datalog programmers. The first contribution is centered around the critical problem of debugging: we develop a new debugging approach that explains the execution steps taken to produce a faulty output. Crucially, our debugging method can be applied for large-scale applications without substantially sacrificing performance. The second contribution addresses the problem of incremental evaluation, which is necessary when program inputs change slightly, and results need to be recomputed. Incremental evaluation allows this recomputation to happen more efficiently, without discarding the previous results and recomputing from scratch. Finally, the last contribution provides a new incremental debugging approach that identifies the root causes of faulty outputs that occur after an incremental evaluation. Incremental debugging focuses on the relationship between input and output and can provide debugging suggestions to amend the inputs so that faults no longer occur. These techniques, in combination, form a corpus of critical infrastructure and tooling developments for Datalog, allowing developers and users to use Datalog more productively

    Maintenance of datalog materialisations revisited

    No full text
    Datalog is a rule-based formalism that can axiomatise recursive properties such as reachability and transitive closure. Datalog implementations often materialise (i.e., precompute and store) all facts entailed by a datalog program and a set of explicit facts. Queries can thus be answered directly in the materialised facts, which is beneficial to the performance of query answering, but the materialised facts must be updated whenever the explicit facts change. Rematerialising all facts ‘from scratch’ can be very inefficient, so numerous materialisation maintenance algorithms have been developed that aim to efficiently identify the facts that require updating and thus reduce the overall work. Most such approaches are variants of the counting or Delete/Rederive (DRed) algorithms. Algorithms in the former group maintain additional data structures and are usually applicable only if datalog rules are not recursive, which limits their applicability in practice. Algorithms in the latter group do not require additional data structures and can handle recursive rules, but they can be inefficient when facts have multiple derivations. Finally, to the best of our knowledge, these approaches have not been compared and their practical applicability has not been investigated. Datalog is becoming increasingly important in practice, so a more comprehensive understanding of the tradeoffs between different approaches to materialisation maintenance is needed. In this paper we present three such algorithms for datalog with stratified negation: a new counting algorithm that can handle recursive rules, an optimised variant of the DRed algorithm that does not repeat derivations, and a new Forward/Backward/Forward (FBF) algorithm that extends DRed to better handle facts with multiple derivations. Furthermore, we study the worst-case performance of these algorithms and compare the algorithms' behaviour on several examples. Finally, we present the results of an extensive, first-of-a-kind empirical evaluation in which we investigate the robustness and the scaling behaviour of our algorithms. We thus provide important theoretical and practical insights into all three algorithms that will provide invaluable guidance to future implementors of datalog systems

    Maintenance of datalog materialisations revisited

    No full text
    Datalog is a rule-based formalism that can axiomatise recursive properties such as reachability and transitive closure. Datalog implementations often materialise (i.e., precompute and store) all facts entailed by a datalog program and a set of explicit facts. Queries can thus be answered directly in the materialised facts, which is beneficial to the performance of query answering, but the materialised facts must be updated whenever the explicit facts change. Rematerialising all facts ‘from scratch’ can be very inefficient, so numerous materialisation maintenance algorithms have been developed that aim to efficiently identify the facts that require updating and thus reduce the overall work. Most such approaches are variants of the counting or Delete/Rederive (DRed) algorithms. Algorithms in the former group maintain additional data structures and are usually applicable only if datalog rules are not recursive, which limits their applicability in practice. Algorithms in the latter group do not require additional data structures and can handle recursive rules, but they can be inefficient when facts have multiple derivations. Finally, to the best of our knowledge, these approaches have not been compared and their practical applicability has not been investigated. Datalog is becoming increasingly important in practice, so a more comprehensive understanding of the tradeoffs between different approaches to materialisation maintenance is needed. In this paper we present three such algorithms for datalog with stratified negation: a new counting algorithm that can handle recursive rules, an optimised variant of the DRed algorithm that does not repeat derivations, and a new Forward/Backward/Forward (FBF) algorithm that extends DRed to better handle facts with multiple derivations. Furthermore, we study the worst-case performance of these algorithms and compare the algorithms' behaviour on several examples. Finally, we present the results of an extensive, first-of-a-kind empirical evaluation in which we investigate the robustness and the scaling behaviour of our algorithms. We thus provide important theoretical and practical insights into all three algorithms that will provide invaluable guidance to future implementors of datalog systems
    corecore