126 research outputs found

    Improving the Latency and Throughput of ZooKeeper Atomic Broadcast

    Get PDF
    ZooKeeper is a crash-tolerant system that offers fundamental services to Internet-scale applications, thereby reducing the development and hosting of the latter. It consists of >3 servers that form a replicated state machine. Maintaining these replicas in a mutually consistent state requires executing an Atomic Broadcast Protocol, Zab, so that concurrent requests for state changes are serialised identically at all replicas before being acted upon. Thus, ZooKeeper performance for update operations is determined by Zab performance. We contribute by presenting two easy-to-implement Zab variants, called ZabAC and ZabAA. They are designed to offer small atomic-broadcast latencies and to reduce the processing load on the primary node that plays a leading role in Zab. The former improves ZooKeeper performance and the latter enables ZooKeeper to face more challenging load conditions

    Mechanisms for improving ZooKeeper Atomic Broadcast performance

    Get PDF
    PhD ThesisCoordination services are essential for building higher-level primitives that are often used in today’s data-center infrastructures, as they greatly facilitate the operation of distributed client applications. Examples of typical functionalities offered by coordination services include the provision of group membership, support for leader election, distributed synchronization, as well as reliable low-volume storage and naming. To provide reliable services to the client applications, coordination services in general are replicated for fault tolerance and should deliver high performance to ensure that they do not become bottlenecks for dependent applications. Apache ZooKeeper, for example, is a well-known coordination service and applies a primary-backup approach in which the leader server processes all state-modifying requests and then forwards the corresponding state updates to a set of follower servers using an atomic broadcast protocol called Zab. Having analyzed state-of-the-art coordination services, we identified two main limitations that prevent existing systems such as Apache ZooKeeper from achieving a higher write performance: First, while this approach prevents the data stored by client applications from being lost as a result of server crashes, it also comes at the cost of a performance penalty. In particular, the fact that it relies on a leader-based protocol, means that its performance becomes bottlenecked when the leader server has to handle an increased message traffic as the number of client requests and replicas increases. Second, Zab requires significant communication between instances (as it entails three communication steps). This can potentially lead to performance overhead and uses up more computer resources, resulting in less guarantees for users who must then build more complex applications to handle these issues. To this end, the work makes four contributions. First, we implement ZooKeeper atomic broadcast, extracting from ZooKeeper in order to make it easier for other developers to build their applications on top of Zab without the complexity of integrating the entire ZooKeeper codebase. Second, we propose three variations of Zab, which are all capable of reaching an agreement in fewer communication steps than Zab. The v variations are built with restriction assumptions that server crashes are independent and a server quorum remains operative at all times. The first variation offers excellent performance but can only be used for 3-server systems; the other two are built without this limitation. Then, we redesigned the latest two Zab variations to operate under the least-restricted Zab fault assumptions. Third, we design and implement a ZooKeeper coin-tossing protocol, called ZabCT which addresses the above concerns by having the other, non-leader server replicas toss a coin and broadcast their acknowledgment of a leader’s proposal only if the toss results in an outcome of Head. We model the ZabCT process and derive analytical expressions for estimating the coin-tossing probability of Head for a given arrival rate of service requests such that the dual objectives of performance gains and traffic reduction can be accomplished. If a coin-tossing protocol, ZabCT is judged not to offer performance benefits over Zab, processes should be able to switch autonomously to Zab. We design protocol switching by letting processes switch between ZabCT and Zab without stopping message delivery. Finally, an extensive performance evaluation is provided for Zab and Zab-variant protocols

    Incremental Consistency Guarantees for Replicated Objects

    Get PDF
    Programming with replicated objects is difficult. Developers must face the fundamental trade-off between consistency and performance head on, while struggling with the complexity of distributed storage stacks. We introduce Correctables, a novel abstraction that hides most of this complexity, allowing developers to focus on the task of balancing consistency and performance. To aid developers with this task, Correctables provide incremental consistency guarantees, which capture successive refinements on the result of an ongoing operation on a replicated object. In short, applications receive both a preliminary---fast, possibly inconsistent---result, as well as a final---consistent---result that arrives later. We show how to leverage incremental consistency guarantees by speculating on preliminary values, trading throughput and bandwidth for improved latency. We experiment with two popular storage systems (Cassandra and ZooKeeper) and three applications: a Twissandra-based microblogging service, an ad serving system, and a ticket selling system. Our evaluation on the Amazon EC2 platform with YCSB workloads A, B, and C shows that we can reduce the latency of strongly consistent operations by up to 40% (from 100ms to 60ms) at little cost (10% bandwidth increase, 6% throughput drop) in the ad system. Even if the preliminary result is frequently inconsistent (25% of accesses), incremental consistency incurs a bandwidth overhead of only 27%.Comment: 16 total pages, 12 figures. OSDI'16 (to appear

    HT-Paxos: High Throughput State-Machine Replication Protocol for Large Clustered Data Centers

    Get PDF
    Paxos is a prominent theory of state machine replication. Recent data intensive Systems those implement state machine replication generally require high throughput. Earlier versions of Paxos as few of them are classical Paxos, fast Paxos and generalized Paxos have a major focus on fault tolerance and latency but lacking in terms of throughput and scalability. A major reason for this is the heavyweight leader. Through offloading the leader, we can further increase throughput of the system. Ring Paxos, Multi Ring Paxos and S-Paxos are few prominent attempts in this direction for clustered data centers. In this paper, we are proposing HT-Paxos, a variant of Paxos that one is the best suitable for any large clustered data center. HT-Paxos further offloads the leader very significantly and hence increases the throughput and scalability of the system. While at the same time, among high throughput state-machine replication protocols, HT-Paxos provides reasonably low latency and response time

    FairLedger: A Fair Blockchain Protocol for Financial Institutions

    Get PDF
    Financial institutions are currently looking into technologies for permissioned blockchains. A major effort in this direction is Hyperledger, an open source project hosted by the Linux Foundation and backed by a consortium of over a hundred companies. A key component in permissioned blockchain protocols is a byzantine fault tolerant (BFT) consensus engine that orders transactions. However, currently available BFT solutions in Hyperledger (as well as in the literature at large) are inadequate for financial settings; they are not designed to ensure fairness or to tolerate selfish behavior that arises when financial institutions strive to maximize their own profit. We present FairLedger, a permissioned blockchain BFT protocol, which is fair, designed to deal with rational behavior, and, no less important, easy to understand and implement. The secret sauce of our protocol is a new communication abstraction, called detectable all-to-all (DA2A), which allows us to detect participants (byzantine or rational) that deviate from the protocol, and punish them. We implement FairLedger in the Hyperledger open source project, using Iroha framework, one of the biggest projects therein. To evaluate FairLegder's performance, we also implement it in the PBFT framework and compare the two protocols. Our results show that in failure-free scenarios FairLedger achieves better throughput than both Iroha's implementation and PBFT in wide-area settings

    Arquitetura de elevada disponibilidade para bases de dados na cloud

    Get PDF
    Dissertação de mestrado em Computer ScienceCom a constante expansão de sistemas informáticos nas diferentes áreas de aplicação, a quantidade de dados que exigem persistência aumenta exponencialmente. Assim, por forma a tolerar faltas e garantir a disponibilidade de dados, devem ser implementadas técnicas de replicação. Atualmente existem várias abordagens e protocolos, tendo diferentes tipos de aplicações em vista. Existem duas grandes vertentes de protocolos de replicação, protocolos genéricos, para qualquer serviço, e protocolos específicos destinados a bases de dados. No que toca a protocolos de replicação genéricos, as principais técnicas existentes, apesar de completa mente desenvolvidas e em utilização, têm algumas limitações, nomeadamente: problemas de performance relativamente a saturação da réplica primária na replicação passiva e o determinismo necessário associado à replicação ativa. Algumas destas desvantagens são mitigadas pelos protocolos específicos de base de dados (e.g., com recurso a multi-master) mas estes protocolos não permitem efetuar uma separação entre a lógica da replicação e os respetivos dados. Abordagens mais recentes tendem a basear-se em técnicas de repli cação com fundamentos em mecanismos distribuídos de logging. Tais mecanismos propor cionam alta disponibilidade de dados e tolerância a faltas, permitindo abordagens inovado ras baseadas puramente em logs. Por forma a atenuar as limitações encontradas não só no mecanismo de replicação ativa e passiva, mas também nas suas derivações, esta dissertação apresenta uma solução de replicação híbrida baseada em middleware, o SQLware. A grande vantagem desta abor dagem baseia-se na divisão entre a camada de replicação e a camada de dados, utilizando um log distribuído altamente escalável que oferece tolerância a faltas e alta disponibilidade. O protótipo desenvolvido foi validado com recurso à execução de testes de desempenho, sendo avaliado em duas infraestruturas diferentes, nomeadamente, um servidor privado de média gama e um grupo de servidores de computação de alto desempenho. Durante a avaliação do protótipo, o standard da indústria TPC-C, tipicamente utilizado para avaliar sistemas de base de dados transacionais, foi utilizado. Os resultados obtidos demonstram que o SQLware oferece uma aumento de throughput de 150 vezes, comparativamente ao mecanismo de replicação nativo da base de dados considerada, o PostgreSQL.With the constant expansion of computational systems, the amount of data that requires durability increases exponentially. All data persistence must be replicated in order to provide high-availability and fault tolerance according to the surrogate application or use-case. Currently, there are numerous approaches and replication protocols developed supporting different use-cases. There are two prominent variations of replication protocols, generic protocols, and database specific ones. The two main techniques associated with generic replication protocols are the active and passive replication. Although generic replication techniques are fully matured and widely used, there are inherent problems associated with those protocols, namely: performance issues of the primary replica of passive replication and the determinism required by the active replication. Some of those disadvantages are mitigated by specific database replication protocols (e.g., using multi-master) but, those protocols do not allow a separation between logic and data and they can not be decoupled from the database engine. Moreover, recent strategies consider highly-scalable and fault tolerant distributed logging mechanisms, allowing for newer designs based purely on logs to power replication. To mitigate the shortcomings found in both active and passive replication mechanisms, but also in partial variations of these methods, this dissertation presents a hybrid replication middleware, SQLware. The cornerstone of the approach lies in the decoupling between the logical replication layer and the data store, together with the use of a highly scalable distributed log that provides fault-tolerance and high-availability. We validated the prototype by conducting a benchmarking campaign to evaluate the overall system’s performance under two distinct infrastructures, namely a private medium class server, and a private high performance computing cluster. Across the evaluation campaign, we considered the TPCC benchmark, a widely used benchmark in the evaluation of Online transaction processing (OLTP) database systems. Results show that SQLware was able to achieve 150 times more throughput when compared with the native replication mechanism of the underlying data store considered as baseline, PostgreSQL.This work was partially funded by FCT - Fundação para a Ciência e a Tecnologia, I.P., (Portuguese Foundation for Science and Technology) within project UID/EEA/50014/201
    corecore