506 research outputs found

    S-Store: Streaming Meets Transaction Processing

    Get PDF
    Stream processing addresses the needs of real-time applications. Transaction processing addresses the coordination and safety of short atomic computations. Heretofore, these two modes of operation existed in separate, stove-piped systems. In this work, we attempt to fuse the two computational paradigms in a single system called S-Store. In this way, S-Store can simultaneously accommodate OLTP and streaming applications. We present a simple transaction model for streams that integrates seamlessly with a traditional OLTP system. We chose to build S-Store as an extension of H-Store, an open-source, in-memory, distributed OLTP database system. By implementing S-Store in this way, we can make use of the transaction processing facilities that H-Store already supports, and we can concentrate on the additional implementation features that are needed to support streaming. Similar implementations could be done using other main-memory OLTP platforms. We show that we can actually achieve higher throughput for streaming workloads in S-Store than an equivalent deployment in H-Store alone. We also show how this can be achieved within H-Store with the addition of a modest amount of new functionality. Furthermore, we compare S-Store to two state-of-the-art streaming systems, Spark Streaming and Storm, and show how S-Store matches and sometimes exceeds their performance while providing stronger transactional guarantees

    Tolerating Correlated Failures in Massively Parallel Stream Processing Engines

    Full text link
    Fault-tolerance techniques for stream processing engines can be categorized into passive and active approaches. A typical passive approach periodically checkpoints a processing task's runtime states and can recover a failed task by restoring its runtime state using its latest checkpoint. On the other hand, an active approach usually employs backup nodes to run replicated tasks. Upon failure, the active replica can take over the processing of the failed task with minimal latency. However, both approaches have their own inadequacies in Massively Parallel Stream Processing Engines (MPSPE). The passive approach incurs a long recovery latency especially when a number of correlated nodes fail simultaneously, while the active approach requires extra replication resources. In this paper, we propose a new fault-tolerance framework, which is Passive and Partially Active (PPA). In a PPA scheme, the passive approach is applied to all tasks while only a selected set of tasks will be actively replicated. The number of actively replicated tasks depends on the available resources. If tasks without active replicas fail, tentative outputs will be generated before the completion of the recovery process. We also propose effective and efficient algorithms to optimize a partially active replication plan to maximize the quality of tentative outputs. We implemented PPA on top of Storm, an open-source MPSPE and conducted extensive experiments using both real and synthetic datasets to verify the effectiveness of our approach

    Recovery From Node Failure in Distributed Query Processing

    Get PDF
    While distributed query processing has many advantages, the use of many independent, physically widespread computers almost universally leads to reliability issues. Several techniques have been developed to provide redundancy and the ability to recover from node failure during query processing. In this survey, we examine three techniques--upstream backup, active standby, and passive standby--that have been used in both distributed stream data processing and the distributed processing of static data. We also compare several recent systems that use these techniques, and explore which recovery techniques work well under various conditions

    A Survey on the Evolution of Stream Processing Systems

    Full text link
    Stream processing has been an active research field for more than 20 years, but it is now witnessing its prime time due to recent successful efforts by the research community and numerous worldwide open-source communities. This survey provides a comprehensive overview of fundamental aspects of stream processing systems and their evolution in the functional areas of out-of-order data management, state management, fault tolerance, high availability, load management, elasticity, and reconfiguration. We review noteworthy past research findings, outline the similarities and differences between early ('00-'10) and modern ('11-'18) streaming systems, and discuss recent trends and open problems.Comment: 34 pages, 15 figures, 5 table

    Speculation in Parallel and Distributed Event Processing Systems

    Get PDF
    Event stream processing (ESP) applications enable the real-time processing of continuous flows of data. Algorithmic trading, network monitoring, and processing data from sensor networks are good examples of applications that traditionally rely upon ESP systems. In addition, technological advances are resulting in an increasing number of devices that are network enabled, producing information that can be automatically collected and processed. This increasing availability of on-line data motivates the development of new and more sophisticated applications that require low-latency processing of large volumes of data. ESP applications are composed of an acyclic graph of operators that is traversed by the data. Inside each operator, the events can be transformed, aggregated, enriched, or filtered out. Some of these operations depend only on the current input events, such operations are called stateless. Other operations, however, depend not only on the current event, but also on a state built during the processing of previous events. Such operations are, therefore, named stateful. As the number of ESP applications grows, there are increasingly strong requirements, which are often difficult to satisfy. In this dissertation, we address two challenges created by the use of stateful operations in a ESP application: (i) stateful operators can be bottlenecks because they are sensitive to the order of events and cannot be trivially parallelized by replication; and (ii), if failures are to be tolerated, the accumulated state of an stateful operator needs to be saved, saving this state traditionally imposes considerable performance costs. Our approach is to evaluate the use of speculation to address these two issues. For handling ordering and parallelization issues in a stateful operator, we propose a speculative approach that both reduces latency when the operator must wait for the correct ordering of the events and improves throughput when the operation in hand is parallelizable. In addition, our approach does not require that user understand concurrent programming or that he or she needs to consider out-of-order execution when writing the operations. For fault-tolerant applications, traditional approaches have imposed prohibitive performance costs due to pessimistic schemes. We extend such approaches, using speculation to mask the cost of fault tolerance.:1 Introduction 1 1.1 Event stream processing systems ......................... 1 1.2 Running example ................................. 3 1.3 Challenges and contributions ........................... 4 1.4 Outline ...................................... 6 2 Background 7 2.1 Event stream processing ............................. 7 2.1.1 State in operators: Windows and synopses ............................ 8 2.1.2 Types of operators ............................ 12 2.1.3 Our prototype system........................... 13 2.2 Software transactional memory.......................... 18 2.2.1 Overview ................................. 18 2.2.2 Memory operations............................ 19 2.3 Fault tolerance in distributed systems ...................................... 23 2.3.1 Failure model and failure detection ...................................... 23 2.3.2 Recovery semantics............................ 24 2.3.3 Active and passive replication ...................... 24 2.4 Summary ..................................... 26 3 Extending event stream processing systems with speculation 27 3.1 Motivation..................................... 27 3.2 Goals ....................................... 28 3.3 Local versus distributed speculation ....................... 29 3.4 Models and assumptions ............................. 29 3.4.1 Operators................................. 30 3.4.2 Events................................... 30 3.4.3 Failures .................................. 31 4 Local speculation 33 4.1 Overview ..................................... 33 4.2 Requirements ................................... 35 4.2.1 Order ................................... 35 4.2.2 Aborts................................... 37 4.2.3 Optimism control ............................. 38 4.2.4 Notifications ............................... 39 4.3 Applications.................................... 40 4.3.1 Out-of-order processing ......................... 40 4.3.2 Optimistic parallelization......................... 42 4.4 Extensions..................................... 44 4.4.1 Avoiding unnecessary aborts ....................... 44 4.4.2 Making aborts unnecessary........................ 45 4.5 Evaluation..................................... 47 4.5.1 Overhead of speculation ......................... 47 4.5.2 Cost of misspeculation .......................... 50 4.5.3 Out-of-order and parallel processing micro benchmarks ........... 53 4.5.4 Behavior with example operators .................... 57 4.6 Summary ..................................... 60 5 Distributed speculation 63 5.1 Overview ..................................... 63 5.2 Requirements ................................... 64 5.2.1 Speculative events ............................ 64 5.2.2 Speculative accesses ........................... 69 5.2.3 Reliable ordered broadcast with optimistic delivery .................. 72 5.3 Applications .................................... 75 5.3.1 Passive replication and rollback recovery ................................ 75 5.3.2 Active replication ............................. 80 5.4 Extensions ..................................... 82 5.4.1 Active replication and software bugs ..................................... 82 5.4.2 Enabling operators to output multiple events ........................ 87 5.5 Evaluation .................................... 87 5.5.1 Passive replication ............................ 88 5.5.2 Active replication ............................. 88 5.6 Summary ..................................... 93 6 Related work 95 6.1 Event stream processing engines ......................... 95 6.2 Parallelization and optimistic computing ................................ 97 6.2.1 Speculation ................................ 97 6.2.2 Optimistic parallelization ......................... 98 6.2.3 Parallelization in event processing .................................... 99 6.2.4 Speculation in event processing ..................... 99 6.3 Fault tolerance .................................. 100 6.3.1 Passive replication and rollback recovery ............................... 100 6.3.2 Active replication ............................ 101 6.3.3 Fault tolerance in event stream processing systems ............. 103 7 Conclusions 105 7.1 Summary of contributions ............................ 105 7.2 Challenges and future work ............................ 106 Appendices Publications 107 Pseudocode for the consensus protocol 10

    Towards the optimization of a parallel streaming engine for telco applications

    Get PDF
    Parallel and distributed computing is becoming essential to process in real time the increasingly massive volume of data collected by telecommunications companies. Existing computational paradigms such as MapReduce (and its popular open-source implementation Hadoop) provide a scalable, fault tolerant mechanism for large scale batch computations. However, many applications in the telco ecosystem require a real time, incremental streaming approach to process data in real time and enable proactive care. Storm is a scalable, fault tolerant framework for the analysis of real time streaming data. In this paper we provide a motivation for the use of real time streaming analytics in the telco ecosystem. We perform an experimental investigation into the performance of Storm, focusing in particular on the impact of parameter configuration. This investigation reveals that optimal parameter choice is highly non-trivial and we use this as motivation to create a parameter configuration engine. As first steps towards the creation of this engine we provide a deep analysis of the inner workings of Storm and provide a set of models describing data flow cost, central processing unit (CPU) cost, and system management cost. ©2014 Alcatel-Lucent

    Fault-tolerance and load management in a distributed stream processing system

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, February 2006.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 187-199).Advances in monitoring technology (e.g., sensors) and an increased demand for online information processing have given rise to a new class of applications that require continuous, low-latency processing of large-volume data streams. These "stream processing applications" arise in many areas such as sensor-based environment monitoring, financial services, network monitoring, and military applications. Because traditional database management systems are ill-suited for high-volume, low-latency stream processing, new systems, called stream processing engines (SPEs), have been developed. Furthermore, because stream processing applications are inherently distributed, and because distribution can improve performance and scalability, researchers have also proposed and developed distributed SPEs. In this dissertation, we address two challenges faced by a distributed SPE: (1) faulttolerant operation in the face of node failures, network failures, and network partitions, and (2) federated load management. For fault-tolerance, we present a replication-based scheme, called Delay, Process, and Correct (DPC), that masks most node and network failures.(cont.) When network partitions occur, DPC addresses the traditional availability-consistency trade-off by maintaining, when possible, a desired availability specified by the application or user, but eventually also delivering the correct results. While maintaining the desired availability bounds, DPC also strives to minimize the number of inaccurate results that must later be corrected. In contrast to previous proposals for fault tolerance in SPEs, DPC simultaneously supports a variety of applications that differ in their preferred trade-off between availability and consistency. For load management, we present a Bounded-Price Mechanism (BPM) that enables autonomous participants to collaboratively handle their load without individually owning the resources necessary for peak operation. BPM is based on contracts that participants negotiate offline. At runtime, participants move load only to partners with whom they have a contract and pay each other the contracted price. We show that BPM provides incentives that foster participation and leads to good system-wide load distribution. In contrast to earlier proposals based on computational economies, BPM is lightweight, enables participants to develop and exploit preferential relationships, and provides stability and predictability.(cont.) Although motivated by stream processing, BPM is general and can be applied to any federated system. We have implemented both schemes in the Borealis distributed stream processing engine. They will be available with the next release of the system.by Magdalena Balazinska.Ph.D

    Speculation in Parallel and Distributed Event Processing Systems

    Get PDF
    Event stream processing (ESP) applications enable the real-time processing of continuous flows of data. Algorithmic trading, network monitoring, and processing data from sensor networks are good examples of applications that traditionally rely upon ESP systems. In addition, technological advances are resulting in an increasing number of devices that are network enabled, producing information that can be automatically collected and processed. This increasing availability of on-line data motivates the development of new and more sophisticated applications that require low-latency processing of large volumes of data. ESP applications are composed of an acyclic graph of operators that is traversed by the data. Inside each operator, the events can be transformed, aggregated, enriched, or filtered out. Some of these operations depend only on the current input events, such operations are called stateless. Other operations, however, depend not only on the current event, but also on a state built during the processing of previous events. Such operations are, therefore, named stateful. As the number of ESP applications grows, there are increasingly strong requirements, which are often difficult to satisfy. In this dissertation, we address two challenges created by the use of stateful operations in a ESP application: (i) stateful operators can be bottlenecks because they are sensitive to the order of events and cannot be trivially parallelized by replication; and (ii), if failures are to be tolerated, the accumulated state of an stateful operator needs to be saved, saving this state traditionally imposes considerable performance costs. Our approach is to evaluate the use of speculation to address these two issues. For handling ordering and parallelization issues in a stateful operator, we propose a speculative approach that both reduces latency when the operator must wait for the correct ordering of the events and improves throughput when the operation in hand is parallelizable. In addition, our approach does not require that user understand concurrent programming or that he or she needs to consider out-of-order execution when writing the operations. For fault-tolerant applications, traditional approaches have imposed prohibitive performance costs due to pessimistic schemes. We extend such approaches, using speculation to mask the cost of fault tolerance.:1 Introduction 1 1.1 Event stream processing systems ......................... 1 1.2 Running example ................................. 3 1.3 Challenges and contributions ........................... 4 1.4 Outline ...................................... 6 2 Background 7 2.1 Event stream processing ............................. 7 2.1.1 State in operators: Windows and synopses ............................ 8 2.1.2 Types of operators ............................ 12 2.1.3 Our prototype system........................... 13 2.2 Software transactional memory.......................... 18 2.2.1 Overview ................................. 18 2.2.2 Memory operations............................ 19 2.3 Fault tolerance in distributed systems ...................................... 23 2.3.1 Failure model and failure detection ...................................... 23 2.3.2 Recovery semantics............................ 24 2.3.3 Active and passive replication ...................... 24 2.4 Summary ..................................... 26 3 Extending event stream processing systems with speculation 27 3.1 Motivation..................................... 27 3.2 Goals ....................................... 28 3.3 Local versus distributed speculation ....................... 29 3.4 Models and assumptions ............................. 29 3.4.1 Operators................................. 30 3.4.2 Events................................... 30 3.4.3 Failures .................................. 31 4 Local speculation 33 4.1 Overview ..................................... 33 4.2 Requirements ................................... 35 4.2.1 Order ................................... 35 4.2.2 Aborts................................... 37 4.2.3 Optimism control ............................. 38 4.2.4 Notifications ............................... 39 4.3 Applications.................................... 40 4.3.1 Out-of-order processing ......................... 40 4.3.2 Optimistic parallelization......................... 42 4.4 Extensions..................................... 44 4.4.1 Avoiding unnecessary aborts ....................... 44 4.4.2 Making aborts unnecessary........................ 45 4.5 Evaluation..................................... 47 4.5.1 Overhead of speculation ......................... 47 4.5.2 Cost of misspeculation .......................... 50 4.5.3 Out-of-order and parallel processing micro benchmarks ........... 53 4.5.4 Behavior with example operators .................... 57 4.6 Summary ..................................... 60 5 Distributed speculation 63 5.1 Overview ..................................... 63 5.2 Requirements ................................... 64 5.2.1 Speculative events ............................ 64 5.2.2 Speculative accesses ........................... 69 5.2.3 Reliable ordered broadcast with optimistic delivery .................. 72 5.3 Applications .................................... 75 5.3.1 Passive replication and rollback recovery ................................ 75 5.3.2 Active replication ............................. 80 5.4 Extensions ..................................... 82 5.4.1 Active replication and software bugs ..................................... 82 5.4.2 Enabling operators to output multiple events ........................ 87 5.5 Evaluation .................................... 87 5.5.1 Passive replication ............................ 88 5.5.2 Active replication ............................. 88 5.6 Summary ..................................... 93 6 Related work 95 6.1 Event stream processing engines ......................... 95 6.2 Parallelization and optimistic computing ................................ 97 6.2.1 Speculation ................................ 97 6.2.2 Optimistic parallelization ......................... 98 6.2.3 Parallelization in event processing .................................... 99 6.2.4 Speculation in event processing ..................... 99 6.3 Fault tolerance .................................. 100 6.3.1 Passive replication and rollback recovery ............................... 100 6.3.2 Active replication ............................ 101 6.3.3 Fault tolerance in event stream processing systems ............. 103 7 Conclusions 105 7.1 Summary of contributions ............................ 105 7.2 Challenges and future work ............................ 106 Appendices Publications 107 Pseudocode for the consensus protocol 10
    • …
    corecore