Search CORE

53 research outputs found

Shared Arrangements: practical inter-query sharing for streaming dataflows

Author: Lattuada Andrea
McSherry Frank
Roscoe Timothy
Schwarzkopf Malte
Publication venue
Publication date: 01/06/2020
Field of study

Current systems for data-parallel, incremental processing and view maintenance over high-rate streams isolate the execution of independent queries. This creates unwanted redundancy and overhead in the presence of concurrent incrementally maintained queries: each query must independently maintain the same indexed state over the same input streams, and new queries must build this state from scratch before they can begin to emit their first results. This paper introduces shared arrangements: indexed views of maintained state that allow concurrent queries to reuse the same in-memory state without compromising data-parallel performance and scaling. We implement shared arrangements in a modern stream processor and show order-of-magnitude improvements in query response time and resource consumption for interactive queries against high-throughput streams, while also significantly improving performance in other domains including business analytics, graph processing, and program analysis

arXiv.org e-Print Archive

Repository for Publications and Research Data

A Differential Datalog Interpreter

Author: Apinis Kalmer
de Lima Bruno Rucy Carneiro Alves
Kramer Merlin
Publication venue
Publication date: 04/08/2023
Field of study

The core reasoning task for datalog engines is materialization, the evaluation of a datalog program over a database alongside its physical incorporation into the database itself. The de-facto method of computing it, is through the recursive application of inference rules. Due to it being a costly operation, it is a must for datalog engines to provide incremental materialization, that is, to adjust the computation to new data, instead of restarting from scratch. One of the major caveats, is that deleting data is notoriously more involved than adding, since one has to take into account all possible data that has been entailed from what is being deleted. Differential Dataflow is a computational model that provides efficient incremental maintenance, notoriously with equal performance between additions and deletions, and work distribution, of iterative dataflows. In this paper we investigate the performance of materialization with three reference datalog implementations, out of which one is built on top of a lightweight relational engine, and the two others are differential-dataflow and non-differential versions of the same rewrite algorithm, with the same optimizations

arXiv.org e-Print Archive

Foundations of Differential Dataflow

Author: Abadi Martín
McSherry Frank
Plotkin Gordon
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Abstract. Differential dataflow is a recent approach to incremental computation that relies on a partially ordered set of differences. In the present paper, we aim to develop its foundations. We define a small pro-gramming language whose types are abelian groups equipped with linear inverses, and provide both a standard and a differential denotational se-mantics. The two semantics coincide in that the differential semantics is the differential of the standard one. Möbius inversion, a well-known idea from combinatorics, permits a systematic treatment of various operators and constructs.

CiteSeerX

Crossref

Edinburgh Research Explorer

Redacted by arXiv

Author: Stephenson Matthew
Publication venue
Publication date: 27/07/2023
Field of study

Redacted by arXiv.Comment: This article has been removed by arXiv due a copyright claim by a 3rd part

arXiv.org e-Print Archive

Pathway: a fast and flexible unified stream data processing framework for analytical and Machine Learning applications

Author: Bartoszkiewicz Michal
Chorowski Jan
Kosowski Adrian
Kowalski Jakub
Kulik Sergey
Lewandowski Mateusz
Nowicki Krzysztof
Piechowiak Kamil
Ruas Olivier
Stamirowska Zuzanna
Uznanski Przemyslaw
Publication venue
Publication date: 12/07/2023
Field of study

We present Pathway, a new unified data processing framework that can run workloads on both bounded and unbounded data streams. The framework was created with the original motivation of resolving challenges faced when analyzing and processing data from the physical economy, including streams of data generated by IoT and enterprise systems. These required rapid reaction while calling for the application of advanced computation paradigms (machinelearning-powered analytics, contextual analysis, and other elements of complex event processing). Pathway is equipped with a Table API tailored for Python and Python/SQL workflows, and is powered by a distributed incremental dataflow in Rust. We describe the system and present benchmarking results which demonstrate its capabilities in both batch and streaming contexts, where it is able to surpass state-of-the-art industry frameworks in both scenarios. We also discuss streaming use cases handled by Pathway which cannot be easily resolved with state-of-the-art industry frameworks, such as streaming iterative graph algorithms (PageRank, etc.)

arXiv.org e-Print Archive

LINVIEW: Incremental View Maintenance for Complex Analytical Queries

Author: Abadi D.
Arasu A.
Deng L.
Grama A.
Kamvar S.
Kraska T.
McSherry F.
Motwani R.
Press W.
Seeger M.
Stonebraker M.
Stonebraker M.
Venkataraman S.
Whaley C.
Zaharia M.
Zhang Y.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 09/05/2014
Field of study

Many analytics tasks and machine learning problems can be naturally expressed by iterative linear algebra programs. In this paper, we study the incremental view maintenance problem for such complex analytical queries. We develop a framework, called LINVIEW, for capturing deltas of linear algebra programs and understanding their computational cost. Linear algebra operations tend to cause an avalanche effect where even very local changes to the input matrices spread out and infect all of the intermediate results and the final view, causing incremental view maintenance to lose its performance benefit over re-evaluation. We develop techniques based on matrix factorizations to contain such epidemics of change. As a consequence, our techniques make incremental view maintenance of linear algebra practical and usually substantially cheaper than re-evaluation. We show, both analytically and experimentally, the usefulness of these techniques when applied to standard analytics tasks. Our evaluation demonstrates the efficiency of LINVIEW in generating parallel incremental programs that outperform re-evaluation techniques by more than an order of magnitude.Comment: 14 pages, SIGMO

arXiv.org e-Print Archive

Crossref

Incremental Static Analysis with Differential Datalog

Author: RITSOGIANNI ARGYRO
ΡΙΤΣΟΓΙΑΝΝΗ ΑΡΓΥΡΩ
Publication venue
Publication date: 01/01/2019
Field of study

Πολλές εφαρμογές ενημερώνουν τον κώδικα τους με αρκετούς μετασχηματισμούς συντήρησης καθ 'όλη τη διάρκεια ζωής της εφαρμογής. Επομένως, τα αποτελέσματα της ανάλυσης μιας εφαρμογής μπορεί να χρειαστεί να αξιολογηθούν σταδιακά. Στην παρούσα πτυχιακή, διερευνούμε τις δυνατότητες σταδιακής αύξησης της στατικής ανάλυσης προγράμματος, χρησιμοποιώντας τη βιβλιοθήκη Doop και τη μηχανή Datalog της DDlog. Το Doop είναι ένα στατικό πλαίσιο ανάλυσης και η DDlog (Differential Datalog) είναι ένας μηχανισμός για αυξητική αξιολόγηση Datalog, βασισμένη σε μια βιβλιοθήκη παραλληλισμού δεδομένων, Differential Dataflow. Διαπιστώνουμε ότι οι στατικές αναλύσεις που βασίζονται σε Doop μπορούν να αξιολογηθούν αυξητικά μέσω της DDlog, η οποία απαιτεί ελάχιστες παρεμβάσεις στη λογική ανάλυσης. Παρουσιάζουμε την απόδοση της DDlog σε σύγκριση με το μηχανισμό Soufflé Datalog που το Doop ενσωματώνει.Many applications have their code updated by several maintenance transformations throughout the application's functioning lifetime. Therefore, the results of analyzing an application may need to be evaluated incrementally. In this thesis, we explore the possibilities of incrementality in static program analysis, using the Doop framework and the DDlog incremental Datalog engine. Doop is a static analysis framework and DDlog (Differential Datalog) is an engine for incremental Datalog evaluation, based on a data-parallel library, Differential Dataflow. We find that Doop-based static analyses can be incrementally evaluated via DDlog requiring minimum interventions to the analysis logic. We illustrate DDlog's performance compared to the Soufflé Datalog engine that Doop integrates

Pergamos : Unified Institutional Repository / Digital Library Platform of the National and Kapodistrian University of Athens