Search CORE

1,085 research outputs found

Pathway: a fast and flexible unified stream data processing framework for analytical and Machine Learning applications

Author: Bartoszkiewicz Michal
Chorowski Jan
Kosowski Adrian
Kowalski Jakub
Kulik Sergey
Lewandowski Mateusz
Nowicki Krzysztof
Piechowiak Kamil
Ruas Olivier
Stamirowska Zuzanna
Uznanski Przemyslaw
Publication venue
Publication date: 12/07/2023
Field of study

We present Pathway, a new unified data processing framework that can run workloads on both bounded and unbounded data streams. The framework was created with the original motivation of resolving challenges faced when analyzing and processing data from the physical economy, including streams of data generated by IoT and enterprise systems. These required rapid reaction while calling for the application of advanced computation paradigms (machinelearning-powered analytics, contextual analysis, and other elements of complex event processing). Pathway is equipped with a Table API tailored for Python and Python/SQL workflows, and is powered by a distributed incremental dataflow in Rust. We describe the system and present benchmarking results which demonstrate its capabilities in both batch and streaming contexts, where it is able to surpass state-of-the-art industry frameworks in both scenarios. We also discuss streaming use cases handled by Pathway which cannot be easily resolved with state-of-the-art industry frameworks, such as streaming iterative graph algorithms (PageRank, etc.)

arXiv.org e-Print Archive

In support of workload-aware streaming state management

Author: Kalavri Vasiliki
Liagouris John
Publication venue: USENIX Association
Publication date: 13/07/2020
Field of study

Modern distributed stream processors predominantly rely on LSM-based key-value stores to manage the state of long-running computations. We question the suitability of such general-purpose stores for streaming workloads and argue that they incur unnecessary overheads in exchange for state management capabilities. Since streaming operators are instantiated once and are long-running, state types, sizes, and access patterns, can either be inferred at compile time or learned during execution. This paper surfaces the limitations of established practices for streaming state management and advocates for configurable streaming backends, tailored to the state requirements of each operator. Using workload-aware state management, we achieve an order of magnitude improvement in p99 latency and 2x higher throughput.https://www.usenix.org/system/files/hotstorage20_paper_kalavri.pdfPublished versio

Boston University Institutional Repository (OpenBU)

Numerical aerodynamic simulation facility

Author: Bailey F. R.
Hathaway A. W.
Publication venue
Publication date
Field of study

Critical to the advancement of computational aerodynamics capability is the ability to simulate flows about three-dimensional configurations that contain both compressible and viscous effects, including turbulence and flow separation at high Reynolds numbers. Analyses were conducted of two solution techniques for solving the Reynolds averaged Navier-Stokes equations describing the mean motion of a turbulent flow with certain terms involving the transport of turbulent momentum and energy modeled by auxiliary equations. The first solution technique is an implicit approximate factorization finite-difference scheme applied to three-dimensional flows that avoids the restrictive stability conditions when small grid spacing is used. The approximate factorization reduces the solution process to a sequence of three one-dimensional problems with easily inverted matrices. The second technique is a hybrid explicit/implicit finite-difference scheme which is also factored and applied to three-dimensional flows. Both methods are applicable to problems with highly distorted grids and a variety of boundary conditions and turbulence models

NASA Technical Reports Server

Towards Implicit Parallel Programming for Systems

Author: Ertel Sebastian
Publication venue
Publication date: 30/12/2019
Field of study

Multi-core processors require a program to be decomposable into independent parts that can execute in parallel in order to scale performance with the number of cores. But parallel programming is hard especially when the program requires state, which many system programs use for optimization, such as for example a cache to reduce disk I/O. Most prevalent parallel programming models do not support a notion of state and require the programmer to synchronize state access manually, i.e., outside the realms of an associated optimizing compiler. This prevents the compiler to introduce parallelism automatically and requires the programmer to optimize the program manually. In this dissertation, we propose a programming language/compiler co-design to provide a new programming model for implicit parallel programming with state and a compiler that can optimize the program for a parallel execution. We define the notion of a stateful function along with their composition and control structures. An example implementation of a highly scalable server shows that stateful functions smoothly integrate into existing programming language concepts, such as object-oriented programming and programming with structs. Our programming model is also highly practical and allows to gradually adapt existing code bases. As a case study, we implemented a new data processing core for the Hadoop Map/Reduce system to overcome existing performance bottlenecks. Our lambda-calculus-based compiler automatically extracts parallelism without changing the program's semantics. We added further domain-specific semantic-preserving transformations that reduce I/O calls for microservice programs. The runtime format of a program is a dataflow graph that can be executed in parallel, performs concurrent I/O and allows for non-blocking live updates

Technische Universität Dresden: Qucosa