873 research outputs found

    Shared Arrangements: practical inter-query sharing for streaming dataflows

    Full text link
    Current systems for data-parallel, incremental processing and view maintenance over high-rate streams isolate the execution of independent queries. This creates unwanted redundancy and overhead in the presence of concurrent incrementally maintained queries: each query must independently maintain the same indexed state over the same input streams, and new queries must build this state from scratch before they can begin to emit their first results. This paper introduces shared arrangements: indexed views of maintained state that allow concurrent queries to reuse the same in-memory state without compromising data-parallel performance and scaling. We implement shared arrangements in a modern stream processor and show order-of-magnitude improvements in query response time and resource consumption for interactive queries against high-throughput streams, while also significantly improving performance in other domains including business analytics, graph processing, and program analysis

    A Differential Datalog Interpreter

    Full text link
    The core reasoning task for datalog engines is materialization, the evaluation of a datalog program over a database alongside its physical incorporation into the database itself. The de-facto method of computing it, is through the recursive application of inference rules. Due to it being a costly operation, it is a must for datalog engines to provide incremental materialization, that is, to adjust the computation to new data, instead of restarting from scratch. One of the major caveats, is that deleting data is notoriously more involved than adding, since one has to take into account all possible data that has been entailed from what is being deleted. Differential Dataflow is a computational model that provides efficient incremental maintenance, notoriously with equal performance between additions and deletions, and work distribution, of iterative dataflows. In this paper we investigate the performance of materialization with three reference datalog implementations, out of which one is built on top of a lightweight relational engine, and the two others are differential-dataflow and non-differential versions of the same rewrite algorithm, with the same optimizations

    A survey of parallel execution strategies for transitive closure and logic programs

    Get PDF
    An important feature of database technology of the nineties is the use of parallelism for speeding up the execution of complex queries. This technology is being tested in several experimental database architectures and a few commercial systems for conventional select-project-join queries. In particular, hash-based fragmentation is used to distribute data to disks under the control of different processors in order to perform selections and joins in parallel. With the development of new query languages, and in particular with the definition of transitive closure queries and of more general logic programming queries, the new dimension of recursion has been added to query processing. Recursive queries are complex; at the same time, their regular structure is particularly suited for parallel execution, and parallelism may give a high efficiency gain. We survey the approaches to parallel execution of recursive queries that have been presented in the recent literature. We observe that research on parallel execution of recursive queries is separated into two distinct subareas, one focused on the transitive closure of Relational Algebra expressions, the other one focused on optimization of more general Datalog queries. Though the subareas seem radically different because of the approach and formalism used, they have many common features. This is not surprising, because most typical Datalog queries can be solved by means of the transitive closure of simple algebraic expressions. We first analyze the relationship between the transitive closure of expressions in Relational Algebra and Datalog programs. We then review sequential methods for evaluating transitive closure, distinguishing iterative and direct methods. We address the parallelization of these methods, by discussing various forms of parallelization. Data fragmentation plays an important role in obtaining parallel execution; we describe hash-based and semantic fragmentation. Finally, we consider Datalog queries, and present general methods for parallel rule execution; we recognize the similarities between these methods and the methods reviewed previously, when the former are applied to linear Datalog queries. We also provide a quantitative analysis that shows the impact of the initial data distribution on the performance of methods

    Redacted by arXiv

    Full text link
    Redacted by arXiv.Comment: This article has been removed by arXiv due a copyright claim by a 3rd part

    Incremental Static Analysis with Differential Datalog

    Get PDF
    Πολλές εφαρμογές ενημερώνουν τον κώδικα τους με αρκετούς μετασχηματισμούς συντήρησης καθ 'όλη τη διάρκεια ζωής της εφαρμογής. Επομένως, τα αποτελέσματα της ανάλυσης μιας εφαρμογής μπορεί να χρειαστεί να αξιολογηθούν σταδιακά. Στην παρούσα πτυχιακή, διερευνούμε τις δυνατότητες σταδιακής αύξησης της στατικής ανάλυσης προγράμματος, χρησιμοποιώντας τη βιβλιοθήκη Doop και τη μηχανή Datalog της DDlog. Το Doop είναι ένα στατικό πλαίσιο ανάλυσης και η DDlog (Differential Datalog) είναι ένας μηχανισμός για αυξητική αξιολόγηση Datalog, βασισμένη σε μια βιβλιοθήκη παραλληλισμού δεδομένων, Differential Dataflow. Διαπιστώνουμε ότι οι στατικές αναλύσεις που βασίζονται σε Doop μπορούν να αξιολογηθούν αυξητικά μέσω της DDlog, η οποία απαιτεί ελάχιστες παρεμβάσεις στη λογική ανάλυσης. Παρουσιάζουμε την απόδοση της DDlog σε σύγκριση με το μηχανισμό Soufflé Datalog που το Doop ενσωματώνει.Many applications have their code updated by several maintenance transformations throughout the application's functioning lifetime. Therefore, the results of analyzing an application may need to be evaluated incrementally. In this thesis, we explore the possibilities of incrementality in static program analysis, using the Doop framework and the DDlog incremental Datalog engine. Doop is a static analysis framework and DDlog (Differential Datalog) is an engine for incremental Datalog evaluation, based on a data-parallel library, Differential Dataflow. We find that Doop-based static analyses can be incrementally evaluated via DDlog requiring minimum interventions to the analysis logic. We illustrate DDlog's performance compared to the Soufflé Datalog engine that Doop integrates

    Metamorphic Testing of Datalog Engines

    Get PDF
    corecore