6 research outputs found

    Mechanisms for improving ZooKeeper Atomic Broadcast performance

    Get PDF
    PhD ThesisCoordination services are essential for building higher-level primitives that are often used in today’s data-center infrastructures, as they greatly facilitate the operation of distributed client applications. Examples of typical functionalities offered by coordination services include the provision of group membership, support for leader election, distributed synchronization, as well as reliable low-volume storage and naming. To provide reliable services to the client applications, coordination services in general are replicated for fault tolerance and should deliver high performance to ensure that they do not become bottlenecks for dependent applications. Apache ZooKeeper, for example, is a well-known coordination service and applies a primary-backup approach in which the leader server processes all state-modifying requests and then forwards the corresponding state updates to a set of follower servers using an atomic broadcast protocol called Zab. Having analyzed state-of-the-art coordination services, we identified two main limitations that prevent existing systems such as Apache ZooKeeper from achieving a higher write performance: First, while this approach prevents the data stored by client applications from being lost as a result of server crashes, it also comes at the cost of a performance penalty. In particular, the fact that it relies on a leader-based protocol, means that its performance becomes bottlenecked when the leader server has to handle an increased message traffic as the number of client requests and replicas increases. Second, Zab requires significant communication between instances (as it entails three communication steps). This can potentially lead to performance overhead and uses up more computer resources, resulting in less guarantees for users who must then build more complex applications to handle these issues. To this end, the work makes four contributions. First, we implement ZooKeeper atomic broadcast, extracting from ZooKeeper in order to make it easier for other developers to build their applications on top of Zab without the complexity of integrating the entire ZooKeeper codebase. Second, we propose three variations of Zab, which are all capable of reaching an agreement in fewer communication steps than Zab. The v variations are built with restriction assumptions that server crashes are independent and a server quorum remains operative at all times. The first variation offers excellent performance but can only be used for 3-server systems; the other two are built without this limitation. Then, we redesigned the latest two Zab variations to operate under the least-restricted Zab fault assumptions. Third, we design and implement a ZooKeeper coin-tossing protocol, called ZabCT which addresses the above concerns by having the other, non-leader server replicas toss a coin and broadcast their acknowledgment of a leader’s proposal only if the toss results in an outcome of Head. We model the ZabCT process and derive analytical expressions for estimating the coin-tossing probability of Head for a given arrival rate of service requests such that the dual objectives of performance gains and traffic reduction can be accomplished. If a coin-tossing protocol, ZabCT is judged not to offer performance benefits over Zab, processes should be able to switch autonomously to Zab. We design protocol switching by letting processes switch between ZabCT and Zab without stopping message delivery. Finally, an extensive performance evaluation is provided for Zab and Zab-variant protocols

    Scalable coordination of distributed in-memory transactions

    Get PDF
    Phd ThesisCoordinating transactions involves ensuring serializability in the presence of concurrent data accesses. Accomplishing it in an scalable manner for distributed in-memory transactions is the aim of this thesis work. To this end, the work makes three contributions. It first experimentally demonstrates that transaction latency and throughput scale considerably well when an atomic multicast service is offered to transaction nodes by a crash-tolerant ensemble of dedicated nodes and that using such a service is the most scalable approach compared to practices advocated in the literature. Secondly, we design, implement and evaluate a crash-tolerant and non-blocking atomic broadcast protocol, called ABcast, which is then used as the foundation for building the aforementioned multicast service. ABcast is a hybrid protocol, which consists of a pair of primary and backup protocols executing in parallel. The primary protocol is a deterministic atomic broadcast protocol that provides high performance when node crashes are absent, but blocks in their presence until a group membership service detects such failures. The backup protocol, Aramis, is a probabilistic protocol that does not block in the event of node crashes and allows message delivery to continue post-crash until the primary protocol is able to resume. Aramis design avoids blocking by assuming that message delays remain within a known bound with a high probability that can be estimated in advance, provided that recent delay estimates are used to (i) continually adjust that bound and (ii) regulate flow control. Aramis delivery of broadcasts preserve total order with a probability that can be tuned to be close to 1. Comprehensive evaluations show that this probability can be 99.99% or more. Finally, we assess the effect of low-probability order violations on implementing various isolation levels commonly considered in transaction systems. These three contributions together advance the state-of-art in two major ways: (i) identifying a service based approach to transactional scalability and (ii) establishing a practical alternative to the complex PAXOSiii style approach to building such a service, by using novel but simple protocols and open-source software frameworks.Red Ha
    corecore