53 research outputs found

    Shared Arrangements: practical inter-query sharing for streaming dataflows

    Full text link
    Current systems for data-parallel, incremental processing and view maintenance over high-rate streams isolate the execution of independent queries. This creates unwanted redundancy and overhead in the presence of concurrent incrementally maintained queries: each query must independently maintain the same indexed state over the same input streams, and new queries must build this state from scratch before they can begin to emit their first results. This paper introduces shared arrangements: indexed views of maintained state that allow concurrent queries to reuse the same in-memory state without compromising data-parallel performance and scaling. We implement shared arrangements in a modern stream processor and show order-of-magnitude improvements in query response time and resource consumption for interactive queries against high-throughput streams, while also significantly improving performance in other domains including business analytics, graph processing, and program analysis

    A Differential Datalog Interpreter

    Full text link
    The core reasoning task for datalog engines is materialization, the evaluation of a datalog program over a database alongside its physical incorporation into the database itself. The de-facto method of computing it, is through the recursive application of inference rules. Due to it being a costly operation, it is a must for datalog engines to provide incremental materialization, that is, to adjust the computation to new data, instead of restarting from scratch. One of the major caveats, is that deleting data is notoriously more involved than adding, since one has to take into account all possible data that has been entailed from what is being deleted. Differential Dataflow is a computational model that provides efficient incremental maintenance, notoriously with equal performance between additions and deletions, and work distribution, of iterative dataflows. In this paper we investigate the performance of materialization with three reference datalog implementations, out of which one is built on top of a lightweight relational engine, and the two others are differential-dataflow and non-differential versions of the same rewrite algorithm, with the same optimizations

    Foundations of Differential Dataflow

    Get PDF
    Abstract. Differential dataflow is a recent approach to incremental computation that relies on a partially ordered set of differences. In the present paper, we aim to develop its foundations. We define a small pro-gramming language whose types are abelian groups equipped with linear inverses, and provide both a standard and a differential denotational se-mantics. The two semantics coincide in that the differential semantics is the differential of the standard one. Möbius inversion, a well-known idea from combinatorics, permits a systematic treatment of various operators and constructs.

    Redacted by arXiv

    Full text link
    Redacted by arXiv.Comment: This article has been removed by arXiv due a copyright claim by a 3rd part

    Pathway: a fast and flexible unified stream data processing framework for analytical and Machine Learning applications

    Full text link
    We present Pathway, a new unified data processing framework that can run workloads on both bounded and unbounded data streams. The framework was created with the original motivation of resolving challenges faced when analyzing and processing data from the physical economy, including streams of data generated by IoT and enterprise systems. These required rapid reaction while calling for the application of advanced computation paradigms (machinelearning-powered analytics, contextual analysis, and other elements of complex event processing). Pathway is equipped with a Table API tailored for Python and Python/SQL workflows, and is powered by a distributed incremental dataflow in Rust. We describe the system and present benchmarking results which demonstrate its capabilities in both batch and streaming contexts, where it is able to surpass state-of-the-art industry frameworks in both scenarios. We also discuss streaming use cases handled by Pathway which cannot be easily resolved with state-of-the-art industry frameworks, such as streaming iterative graph algorithms (PageRank, etc.)

    LINVIEW: Incremental View Maintenance for Complex Analytical Queries

    Full text link
    Many analytics tasks and machine learning problems can be naturally expressed by iterative linear algebra programs. In this paper, we study the incremental view maintenance problem for such complex analytical queries. We develop a framework, called LINVIEW, for capturing deltas of linear algebra programs and understanding their computational cost. Linear algebra operations tend to cause an avalanche effect where even very local changes to the input matrices spread out and infect all of the intermediate results and the final view, causing incremental view maintenance to lose its performance benefit over re-evaluation. We develop techniques based on matrix factorizations to contain such epidemics of change. As a consequence, our techniques make incremental view maintenance of linear algebra practical and usually substantially cheaper than re-evaluation. We show, both analytically and experimentally, the usefulness of these techniques when applied to standard analytics tasks. Our evaluation demonstrates the efficiency of LINVIEW in generating parallel incremental programs that outperform re-evaluation techniques by more than an order of magnitude.Comment: 14 pages, SIGMO

    Incremental Static Analysis with Differential Datalog

    Get PDF
    Πολλές εφαρμογές ενημερώνουν τον κώδικα τους με αρκετούς μετασχηματισμούς συντήρησης καθ 'όλη τη διάρκεια ζωής της εφαρμογής. Επομένως, τα αποτελέσματα της ανάλυσης μιας εφαρμογής μπορεί να χρειαστεί να αξιολογηθούν σταδιακά. Στην παρούσα πτυχιακή, διερευνούμε τις δυνατότητες σταδιακής αύξησης της στατικής ανάλυσης προγράμματος, χρησιμοποιώντας τη βιβλιοθήκη Doop και τη μηχανή Datalog της DDlog. Το Doop είναι ένα στατικό πλαίσιο ανάλυσης και η DDlog (Differential Datalog) είναι ένας μηχανισμός για αυξητική αξιολόγηση Datalog, βασισμένη σε μια βιβλιοθήκη παραλληλισμού δεδομένων, Differential Dataflow. Διαπιστώνουμε ότι οι στατικές αναλύσεις που βασίζονται σε Doop μπορούν να αξιολογηθούν αυξητικά μέσω της DDlog, η οποία απαιτεί ελάχιστες παρεμβάσεις στη λογική ανάλυσης. Παρουσιάζουμε την απόδοση της DDlog σε σύγκριση με το μηχανισμό Soufflé Datalog που το Doop ενσωματώνει.Many applications have their code updated by several maintenance transformations throughout the application's functioning lifetime. Therefore, the results of analyzing an application may need to be evaluated incrementally. In this thesis, we explore the possibilities of incrementality in static program analysis, using the Doop framework and the DDlog incremental Datalog engine. Doop is a static analysis framework and DDlog (Differential Datalog) is an engine for incremental Datalog evaluation, based on a data-parallel library, Differential Dataflow. We find that Doop-based static analyses can be incrementally evaluated via DDlog requiring minimum interventions to the analysis logic. We illustrate DDlog's performance compared to the Soufflé Datalog engine that Doop integrates
    corecore