146 research outputs found

    On the Cost of Modularity in Atomic Broadcast

    Get PDF
    Modularity is a desirable property of complex software systems, since it simplifies code reuse, verification, maintenance, etc. However, the use of loosely coupled modules introduces a performance overhead. This overhead is often considered negligible, but this is not always the case. This paper aims at casting some light on the cost, in terms of performance, that is incurred when designing a relevant group communication protocol with modularity in mind: atomic broadcast. We conduct our experiments using two versions of atomic broadcast: a modular version and a monolithic one. We then measure the performance of both implementations under different system loads. Our results show that the overhead introduced by modularity is strongly related to the level of stress to which the system is subjected, and in the worst cases, reaches approximately 50%

    Modeling and validating the performance of atomic broadcast algorithms in high-latency networks

    Get PDF
    The performance of consensus and atomic broadcast algorithms using failure detectors is often affected by a trade-off between the number of communication steps and the number of messages needed to reach a decision. In this paper, we model the performance of three consensus and atomic broadcast algorithms using failure detectors in the oft-neglected setting of wide area networks and validate this model by experimentally evaluating the algorithms in several different setups

    Middleware-based Database Replication: The Gaps between Theory and Practice

    Get PDF
    The need for high availability and performance in data management systems has been fueling a long running interest in database replication from both academia and industry. However, academic groups often attack replication problems in isolation, overlooking the need for completeness in their solutions, while commercial teams take a holistic approach that often misses opportunities for fundamental innovation. This has created over time a gap between academic research and industrial practice. This paper aims to characterize the gap along three axes: performance, availability, and administration. We build on our own experience developing and deploying replication systems in commercial and academic settings, as well as on a large body of prior related work. We sift through representative examples from the last decade of open-source, academic, and commercial database replication systems and combine this material with case studies from real systems deployed at Fortune 500 customers. We propose two agendas, one for academic research and one for industrial R&D, which we believe can bridge the gap within 5-10 years. This way, we hope to both motivate and help researchers in making the theory and practice of middleware-based database replication more relevant to each other.Comment: 14 pages. Appears in Proc. ACM SIGMOD International Conference on Management of Data, Vancouver, Canada, June 200

    Totally Ordered Broadcast and Multicast Algorithms: A Comprehensive Survey

    Get PDF
    Total order multicast algorithms constitute an important class of problems in distributed systems, especially in the context of fault-tolerance. In short, the problem of total order multicast consists in sending messages to a set of processes, in such a way that all messages are delivered by all correct destinations in the same order. However, the huge amount of literature on the subject and the plethora of solutions proposed so far make it difficult for practitioners to select a solution adapted to their specific problem. As a result, naive solutions are often used while better solutions are ignored. This paper proposes a classification of total order multicast algorithms based on the ordering mechanism of the algorithms, and describes a set of common characteristics (e.g., assumptions, properties) with which to evaluate them. In this classification, more than fifty total order broadcast and multicast algorithms are surveyed. The presentation includes asynchronous algorithms as well as algorithms based on the more restrictive synchronous model. Fault-tolerance issues are also considered as the paper studies the properties and behavior of the different algorithms with respect to failures

    Supporting multiple isolation levels in replicated environments

    Full text link
    Replication is used by databases to implement reliability and provide scalability. However, achieving transparent replication is not an easy task. A replicated database is transparent if it can seamlessly replace a standard stand-alone database without requiring any changes to the components of the system. Database replication transparency can be achieved if: (a) replication protocols remain hidden for all other components of the system; and (b) the functionality of a stand-alone database is provided. The ability to simultaneously execute transactions under different isolation levels is a functionality offered by all stand-alone databases but not by their replicated counterparts. Allowing different isolation levels may improve overall system performance. For example, the TPC-C benchmark specification tolerates execution of some transactions at weaker isolation levels in order to increase throughput of committed transactions. In this paper, we show how replication protocols can be extended to enable transactions to be executed under different isolation levels. © 2012 Elsevier B.V. All rights reserved.This work has been supported by the Spanish Ministerio de Ciencia e Innovation (MICINN) and the European Regional Development Fund (ERDF/FEDER) under research grants TIN2009-14460-C03-01 and TIN2010-17193. The translation of this paper was funded by the Universitat Politecnica de Valencia, Spain.Bernabe Gisbert, JM.; Muñoz Escoí, FD. (2012). Supporting multiple isolation levels in replicated environments. Data and Knowledge Engineering. 79-80:1-16. doi:10.1016/j.datak.2012.05.001S11679-8

    Agreement-related problems:from semi-passive replication to totally ordered broadcast

    Get PDF
    Agreement problems constitute a fundamental class of problems in the context of distributed systems. All agreement problems follow a common pattern: all processes must agree on some common decision, the nature of which depends on the specific problem. This dissertation mainly focuses on three important agreements problems: Replication, Total Order Broadcast, and Consensus. Replication is a common means to introduce redundancy in a system, in order to improve its availability. A replicated server is a server that is composed of multiple copies so that, if one copy fails, the other copies can still provide the service. Each copy of the server is called a replica. The replicas must all evolve in manner that is consistent with the other replicas. Hence, updating the replicated server requires that every replica agrees on the set of modifications to carry over. There are two principal replication schemes to ensure this consistency: active replication and passive replication. In Total Order Broadcast, processes broadcast messages to all processes. However, all messages must be delivered in the same order. Also, if one process delivers a message m, then all correct processes must eventually deliver m. The problem of Consensus gives an abstraction to most other agreement problems. All processes initiate a Consensus by proposing a value. Then, all processes must eventually decide the same value v that must be one of the proposed values. These agreement problems are closely related to each other. For instance, Chandra and Toueg [CT96] show that Total Order Broadcast and Consensus are equivalent problems. In addition, Lamport [Lam78] and Schneider [Sch90] show that active replication needs Total Order Broadcast. As a result, active replication is also closely related to the Consensus problem. The first contribution of this dissertation is the definition of the semi-passive replication technique. Semi-passive replication is a passive replication scheme based on a variant of Consensus (called Lazy Consensus and also defined here). From a conceptual point of view, the result is important as it helps to clarify the relation between passive replication and the Consensus problem. In practice, this makes it possible to design systems that react more quickly to failures. The problem of Total Order Broadcast is well-known in the field of distributed systems and algorithms. In fact, there have been already more than fifty algorithms published on the problem so far. Although quite similar, it is difficult to compare these algorithms as they often differ with respect to their actual properties, assumptions, and objectives. The second main contribution of this dissertation is to define five classes of total order broadcast algorithms, and to relate existing algorithms to those classes. The third contribution of this dissertation is to compare the expected performance of the various classes of total order broadcast algorithms. To achieve this goal, we define a set of metrics to predict the performance of distributed algorithms
    • …
    corecore