1,243 research outputs found

    Blazes: Coordination Analysis for Distributed Programs

    Full text link
    Distributed consistency is perhaps the most discussed topic in distributed systems today. Coordination protocols can ensure consistency, but in practice they cause undesirable performance unless used judiciously. Scalable distributed architectures avoid coordination whenever possible, but under-coordinated systems can exhibit behavioral anomalies under fault, which are often extremely difficult to debug. This raises significant challenges for distributed system architects and developers. In this paper we present Blazes, a cross-platform program analysis framework that (a) identifies program locations that require coordination to ensure consistent executions, and (b) automatically synthesizes application-specific coordination code that can significantly outperform general-purpose techniques. We present two case studies, one using annotated programs in the Twitter Storm system, and another using the Bloom declarative language.Comment: Updated to include additional materials from the original technical report: derivation rules, output stream label

    Metadata-Aware Query Processing over Data Streams

    Get PDF
    Many modern applications need to process queries over potentially infinite data streams to provide answers in real-time. This dissertation proposes novel techniques to optimize CPU and memory utilization in stream processing by exploiting metadata on streaming data or queries. It focuses on four topics: 1) exploiting stream metadata to optimize SPJ query operators via operator configuration, 2) exploiting stream metadata to optimize SPJ query plans via query-rewriting, 3) exploiting workload metadata to optimize parameterized queries via indexing, and 4) exploiting event constraints to optimize event stream processing via run-time early termination. The first part of this dissertation proposes algorithms for one of the most common and expensive query operators, namely join, to at runtime identify and purge no-longer-needed data from the state based on punctuations. Exploitations of the combination of punctuation and commonly-used window constraints are also studied. Extensive experimental evaluations demonstrate both reduction on memory usage and improvements on execution time due to the proposed strategies. The second part proposes herald-driven runtime query plan optimization techniques. We identify four query optimization techniques, design a lightweight algorithm to efficiently detect the optimization opportunities at runtime upon receiving heralds. We propose a novel execution paradigm to support multiple concurrent logical plans by maintaining one physical plan. Extensive experimental study confirms that our techniques significantly reduce query execution times. The third part deals with the shared execution of parameterized queries instantiated from a query template. We design a lightweight index mechanism to provide multiple access paths to data to facilitate a wide range of parameterized queries. To withstand workload fluctuations, we propose an index tuning framework to tune the index configurations in a timely manner. Extensive experimental evaluations demonstrate the effectiveness of the proposed strategies. The last part proposes event query optimization techniques by exploiting event constraints such as exclusiveness or ordering relationships among events extracted from workflows. Significant performance gains are shown to be achieved by our proposed constraint-aware event processing techniques

    Analysing Temporal Relations – Beyond Windows, Frames and Predicates

    Get PDF
    This article proposes an approach to rely on the standard operators of relational algebra (including grouping and ag- gregation) for processing complex event without requiring window specifications. In this way the approach can pro- cess complex event queries of the kind encountered in appli- cations such as emergency management in metro networks. This article presents Temporal Stream Algebra (TSA) which combines the operators of relational algebra with an analy- sis of temporal relations at compile time. This analysis de- termines which relational algebra queries can be evaluated against data streams, i. e. the analysis is able to distinguish valid from invalid stream queries. Furthermore the analysis derives functions similar to the pass, propagation and keep invariants in Tucker's et al. \Exploiting Punctuation Seman- tics in Continuous Data Streams". These functions enable the incremental evaluation of TSA queries, the propagation of punctuations, and garbage collection. The evaluation of TSA queries combines bulk-wise and out-of-order processing which makes it tolerant to workload bursts as they typically occur in emergency management. The approach has been conceived for efficiently processing complex event queries on top of a relational database system. It has been deployed and tested on MonetDB

    Temporal Stream Algebra

    Get PDF
    Data stream management systems (DSMS) so far focus on event queries and hardly consider combined queries to both data from event streams and from a database. However, applications like emergency management require combined data stream and database queries. Further requirements are the simultaneous use of multiple timestamps after different time lines and semantics, expressive temporal relations between multiple time-stamps and exible negation, grouping and aggregation which can be controlled, i. e. started and stopped, by events and are not limited to fixed-size time windows. Current DSMS hardly address these requirements. This article proposes Temporal Stream Algebra (TSA) so as to meet the afore mentioned requirements. Temporal streams are a common abstraction of data streams and data- base relations; the operators of TSA are generalizations of the usual operators of Relational Algebra. A in-depth 'analysis of temporal relations guarantees that valid TSA expressions are non-blocking, i. e. can be evaluated incrementally. In this respect TSA differs significantly from previous algebraic approaches which use specialized operators to prevent blocking expressions on a "syntactical" level

    Wire-Speed Implementation of Sliding-Window Aggregate Operator over Out-of-Order Data Streams

    Get PDF
    This paper shows the design and evaluation of an FPGA-based accelerator for sliding-window aggregation over data streams with out-of-order data arrival. We propose an order-agnostic hardware implementation technique for windowing operators based on a one-pass query evaluation strategy called Window-ID, which is originally proposed for software implementation. The proposed implementation succeeds to process out-of-order data items, or tuples, at wire speed due to the simultaneous evaluations of overlapping sliding-windows. In order to verify the effectiveness of the proposed approach, we have also implemented an experimental system as a case study. Our experiments demonstrate that the proposed accelerator with a network interface achieves an effective throughput around 760 Mbps or equivalently nearly 6 million tuples per second, by fully utilizing the available bandwidth of the network interface

    Parallel Continuous Preference Queries over Out-of-Order and Bursty Data Streams

    Get PDF
    Techniques to handle traffic bursts and out-of-order arrivals are of paramount importance to provide real-time sensor data analytics in domains like traffic surveillance, transportation management, healthcare and security applications. In these systems the amount of raw data coming from sensors must be analyzed by continuous queries that extract value-added information used to make informed decisions in real-time. To perform this task with timing constraints, parallelism must be exploited in the query execution in order to enable the real-time processing on parallel architectures. In this paper we focus on continuous preference queries, a representative class of continuous queries for decision making, and we propose a parallel query model targeting the efficient processing over out-of-order and bursty data streams. We study how to integrate punctuation mechanisms in order to enable out-of-order processing. Then, we present advanced scheduling strategies targeting scenarios with different burstiness levels, parameterized using the index of dispersion quantity. Extensive experiments have been performed using synthetic datasets and real-world data streams obtained from an existing real-time locating system. The experimental evaluation demonstrates the efficiency of our parallel solution and its effectiveness in handling the out-of-orderness degrees and burstiness levels of real-world applications

    Generic windowing support for extensible stream processing systems

    Get PDF
    Cataloged from PDF version of article.Stream processing applications process high volume, continuous feeds from live data sources, employ data-in-motion analytics to analyze these feeds, and produce near real-time insights with low latency. One of the fundamental characteristics of such applications is the on-the-fly nature of the computation, which does not require access to disk resident data. Stream processing applications store the most recent history of streams in memory and use it to perform the necessary modeling and analysis tasks. This recent history is often managed using windows. All data stream management systems provide some form of windowing functionality. Windowing makes it possible to implement streaming versions of the traditionally blocking relational operators, such as streaming aggregations, joins, and sorts, as well as any other analytic operator that requires keeping the most recent tuples as state, such as time series analysis operators and signal processing operators. In this paper, we provide a categorization of different window types and policies employed in stream processing applications and give detailed operational semantics for various window configurations. We describe an extensibility mechanism that makes it possible to integrate windowing support into user-defined operators, enabling consistent syntax and semantics across system-provided and third-party toolkits of streaming operators. We describe the design and implementation of a runtime windowing library that significantly simplifies the construction of window-based operators by decoupling the handling of window policies and operator logic from each other. We present our experience using the windowing library to implement a relational operators toolkit and compare the efficacy of the solution to an earlier implementation that did not employ a common windowing library. Copyright (c) 2013 John Wiley & Sons, Ltd
    • …
    corecore