86 research outputs found

    Optimistic atomic broadcast: a pragmatic viewpoint

    Get PDF
    Optimistic atomic broadcast: a pragmatic viewpoint F.Pedone and A.Schiper This paper presents the Optimistic Atomic Broadcast algorithm (OPT-ABcast) which exploits the spontaneous total-order property experienced in local-area networks in order to allow fast delivery of messages. The OPT-ABcast algorithm is based on a sequence of stages, and messages can be delivered during a stage or at the end of a stage. During a stage, processes deliver messages fast. Whenever the spontaneous total-order property does not hold, processes terminate the current stage and start a new one by solving a Consensus problem which may lead to the delivery of some messages. We evaluate the efficiency of the OPT-ABcast algorithm using the notion of delivery latency. Keywords: Optimistic algorithms; Atomic broadcast; Efficient algorithms; Consensus; Asynchronous systems

    Processing Transactions over Optimistic Atomic Broadcast Protocols

    Get PDF
    Atomic broadcast primitives allow fault-tolerant cooperation between sites in adistributed system. Unfortunately, the delay incurred before a message can be delivered makes it difficult to implement high performance, scalable applications on top of atomic broadcast primitives. Recently, a new approach has been proposed which, based on optimistic assumptions about the communication system, reduces the average delay for message delivery. In this paper, we develop this idea further and present a replicated database architecture that employs the new atomic broadcast primitive in such a way that the coordination phase of the atomic broadcast is fully overlapped with th

    Using Optimistic Atomic Broadcast in Transaction Processing Systems

    Get PDF
    Atomic broadcast primitives are often proposed as a mechanism to allow fault-tolerant cooperation between sites in a distributed system. Unfortunately, the delay incurred before a message can be delivered makes it difficult to implement high performance, scalable applications on top of atomic broadcast primitives. Recently, a new approach has been proposed for atomic broadcast which, based on optimistic assumptions about the communication system, reduces the average delay for message delivery to the application. In this paper, we develop this idea further and show how applications can take even more advantage of the optimistic assumption by overlapping the coordination phase of the atomic broadcast algorithm with the processing of delivered messages. In particular, we present a replicated database architecture that employs the new atomic broadcast primitive in such a way that communication and transaction processing are fully overlapped, providing high performance without relaxing transaction correctness

    Optimistic Parallel State-Machine Replication

    Full text link
    State-machine replication, a fundamental approach to fault tolerance, requires replicas to execute commands deterministically, which usually results in sequential execution of commands. Sequential execution limits performance and underuses servers, which are increasingly parallel (i.e., multicore). To narrow the gap between state-machine replication requirements and the characteristics of modern servers, researchers have recently come up with alternative execution models. This paper surveys existing approaches to parallel state-machine replication and proposes a novel optimistic protocol that inherits the scalable features of previous techniques. Using a replicated B+-tree service, we demonstrate in the paper that our protocol outperforms the most efficient techniques by a factor of 2.4 times

    Optimistic total order in wide area networks

    Get PDF
    Total order multicast greatly simplifies the implementa- tion of fault-tolerant services using the replicated state ma- chine approach. The additional latency of total ordering can be masked by taking advantage of spontaneous order- ing observed in LANs: A tentative delivery allows the ap- plication to proceed in parallel with the ordering protocol. The effectiveness of the technique rests on the optimistic as- sumption that a large share of correctly ordered tentative deliveries offsets the cost of undoing the effect of mistakes. This paper proposes a simple technique which enables the usage of optimistic delivery also in WANs with much larger transmission delays where the optimistic assumption does not normally hold. Our proposal exploits local clocks and the stability of network delays to reduce the mistakes in the ordering of tentative deliveries. An experimental evalu- ation of a modified sequencer-based protocol is presented, illustrating the usefulness of the approach in fault-tolerant database management

    The database state machine and group communication issues

    Get PDF
    Distributed computing is reshaping the way people think about and do daily life activities. On-line ticket reservation, electronic commerce, and telebanking are examples of services that would be hardly imaginable without distributed computing. Nevertheless, widespread use of computers has some implications. As we become more depend on computers, computer malfunction increases in importance. Until recently, discussions about fault tolerant computer systems were restricted to very specific contexts, but this scenario starts to change, though. This thesis is about the design of fault tolerant computer systems. More specifically, this thesis focuses on how to develop database systems that behave correctly even in the event of failures. In order to achieve this objective, this work exploits the notions of data replication and group communication. Data replication is an intuitive way of dealing with failures: if one copy of the data is not available, access another one. However, guaranteeing the consistency of replicated data is not an easy task. Group communication is a high level abstraction that defines patterns on the communication of computer sites. The present work advocates the use of group communication in order to enforce data consistency. This thesis makes four major contributions. In the database domain, it introduces the Database State Machine and the Reordering technique. The Database State Machine is an approach to executing transactions in a cluster of database servers that communicate by message passing, and do not have access to shared memory nor to a common clock. In the Database State Machine, read-only transactions are processed locally on a database site, and update transactions are first executed locally on a database site, and them broadcast to the other database sites for certification and possibly commit. The certification test, necessary to commit update transactions, may result in aborts. In order to increase the number of transactions that successfully pass the certification test, we introduce the Reordering technique, which reorders transactions before they are committed. In the distributed system domain, the Generic Broadcast problem and the Optimistic Atomic Broadcast algorithm are proposed. Generic Broadcast is a group communication primitive that allows applications to define any order requirement they need. Reliable Broadcast, which does not guarantee any order on the delivery of messages, and Atomic Broadcast, which guarantees total order on the delivery of all messages, are special cases of Generic Broadcast. Using Generic Broadcast, we define a group communication primitive that guarantees the exact order needs of the Database State Machine. We also present an algorithm that solves Generic Broadcast. Optimistic Atomic Broadcast algorithms exploit system properties in order to implement total order delivery fast. These algorithms are based on system properties that do not always hold. However, it they hold for a certain period, ensuring total order delivery of messages is done faster than with traditional Atomic Broadcast algorithms. This thesis discusses optimism in the implementation of Atomic Broadcast primitives, and presents in detail the Optimistic Atomic Broadcast algorithm. The optimistic broadcast approach presented in this thesis is based on the spontaneous total order message reception property, which holds with high probability in local area networks under normal execution conditions (e.g., moderate load)

    Speculation in Parallel and Distributed Event Processing Systems

    Get PDF
    Event stream processing (ESP) applications enable the real-time processing of continuous flows of data. Algorithmic trading, network monitoring, and processing data from sensor networks are good examples of applications that traditionally rely upon ESP systems. In addition, technological advances are resulting in an increasing number of devices that are network enabled, producing information that can be automatically collected and processed. This increasing availability of on-line data motivates the development of new and more sophisticated applications that require low-latency processing of large volumes of data. ESP applications are composed of an acyclic graph of operators that is traversed by the data. Inside each operator, the events can be transformed, aggregated, enriched, or filtered out. Some of these operations depend only on the current input events, such operations are called stateless. Other operations, however, depend not only on the current event, but also on a state built during the processing of previous events. Such operations are, therefore, named stateful. As the number of ESP applications grows, there are increasingly strong requirements, which are often difficult to satisfy. In this dissertation, we address two challenges created by the use of stateful operations in a ESP application: (i) stateful operators can be bottlenecks because they are sensitive to the order of events and cannot be trivially parallelized by replication; and (ii), if failures are to be tolerated, the accumulated state of an stateful operator needs to be saved, saving this state traditionally imposes considerable performance costs. Our approach is to evaluate the use of speculation to address these two issues. For handling ordering and parallelization issues in a stateful operator, we propose a speculative approach that both reduces latency when the operator must wait for the correct ordering of the events and improves throughput when the operation in hand is parallelizable. In addition, our approach does not require that user understand concurrent programming or that he or she needs to consider out-of-order execution when writing the operations. For fault-tolerant applications, traditional approaches have imposed prohibitive performance costs due to pessimistic schemes. We extend such approaches, using speculation to mask the cost of fault tolerance.:1 Introduction 1 1.1 Event stream processing systems ......................... 1 1.2 Running example ................................. 3 1.3 Challenges and contributions ........................... 4 1.4 Outline ...................................... 6 2 Background 7 2.1 Event stream processing ............................. 7 2.1.1 State in operators: Windows and synopses ............................ 8 2.1.2 Types of operators ............................ 12 2.1.3 Our prototype system........................... 13 2.2 Software transactional memory.......................... 18 2.2.1 Overview ................................. 18 2.2.2 Memory operations............................ 19 2.3 Fault tolerance in distributed systems ...................................... 23 2.3.1 Failure model and failure detection ...................................... 23 2.3.2 Recovery semantics............................ 24 2.3.3 Active and passive replication ...................... 24 2.4 Summary ..................................... 26 3 Extending event stream processing systems with speculation 27 3.1 Motivation..................................... 27 3.2 Goals ....................................... 28 3.3 Local versus distributed speculation ....................... 29 3.4 Models and assumptions ............................. 29 3.4.1 Operators................................. 30 3.4.2 Events................................... 30 3.4.3 Failures .................................. 31 4 Local speculation 33 4.1 Overview ..................................... 33 4.2 Requirements ................................... 35 4.2.1 Order ................................... 35 4.2.2 Aborts................................... 37 4.2.3 Optimism control ............................. 38 4.2.4 Notifications ............................... 39 4.3 Applications.................................... 40 4.3.1 Out-of-order processing ......................... 40 4.3.2 Optimistic parallelization......................... 42 4.4 Extensions..................................... 44 4.4.1 Avoiding unnecessary aborts ....................... 44 4.4.2 Making aborts unnecessary........................ 45 4.5 Evaluation..................................... 47 4.5.1 Overhead of speculation ......................... 47 4.5.2 Cost of misspeculation .......................... 50 4.5.3 Out-of-order and parallel processing micro benchmarks ........... 53 4.5.4 Behavior with example operators .................... 57 4.6 Summary ..................................... 60 5 Distributed speculation 63 5.1 Overview ..................................... 63 5.2 Requirements ................................... 64 5.2.1 Speculative events ............................ 64 5.2.2 Speculative accesses ........................... 69 5.2.3 Reliable ordered broadcast with optimistic delivery .................. 72 5.3 Applications .................................... 75 5.3.1 Passive replication and rollback recovery ................................ 75 5.3.2 Active replication ............................. 80 5.4 Extensions ..................................... 82 5.4.1 Active replication and software bugs ..................................... 82 5.4.2 Enabling operators to output multiple events ........................ 87 5.5 Evaluation .................................... 87 5.5.1 Passive replication ............................ 88 5.5.2 Active replication ............................. 88 5.6 Summary ..................................... 93 6 Related work 95 6.1 Event stream processing engines ......................... 95 6.2 Parallelization and optimistic computing ................................ 97 6.2.1 Speculation ................................ 97 6.2.2 Optimistic parallelization ......................... 98 6.2.3 Parallelization in event processing .................................... 99 6.2.4 Speculation in event processing ..................... 99 6.3 Fault tolerance .................................. 100 6.3.1 Passive replication and rollback recovery ............................... 100 6.3.2 Active replication ............................ 101 6.3.3 Fault tolerance in event stream processing systems ............. 103 7 Conclusions 105 7.1 Summary of contributions ............................ 105 7.2 Challenges and future work ............................ 106 Appendices Publications 107 Pseudocode for the consensus protocol 10
    • …
    corecore