2,326 research outputs found
HT-Paxos: High Throughput State-Machine Replication Protocol for Large Clustered Data Centers
Paxos is a prominent theory of state machine replication. Recent data
intensive Systems those implement state machine replication generally require
high throughput. Earlier versions of Paxos as few of them are classical Paxos,
fast Paxos and generalized Paxos have a major focus on fault tolerance and
latency but lacking in terms of throughput and scalability. A major reason for
this is the heavyweight leader. Through offloading the leader, we can further
increase throughput of the system. Ring Paxos, Multi Ring Paxos and S-Paxos are
few prominent attempts in this direction for clustered data centers. In this
paper, we are proposing HT-Paxos, a variant of Paxos that one is the best
suitable for any large clustered data center. HT-Paxos further offloads the
leader very significantly and hence increases the throughput and scalability of
the system. While at the same time, among high throughput state-machine
replication protocols, HT-Paxos provides reasonably low latency and response
time
Mechanisms for improving ZooKeeper Atomic Broadcast performance
PhD ThesisCoordination services are essential for building higher-level primitives that are often
used in today’s data-center infrastructures, as they greatly facilitate the operation of
distributed client applications. Examples of typical functionalities offered by coordination
services include the provision of group membership, support for leader election,
distributed synchronization, as well as reliable low-volume storage and naming.
To provide reliable services to the client applications, coordination services in general
are replicated for fault tolerance and should deliver high performance to ensure that
they do not become bottlenecks for dependent applications. Apache ZooKeeper, for
example, is a well-known coordination service and applies a primary-backup approach
in which the leader server processes all state-modifying requests and then forwards
the corresponding state updates to a set of follower servers using an atomic broadcast
protocol called Zab.
Having analyzed state-of-the-art coordination services, we identified two main
limitations that prevent existing systems such as Apache ZooKeeper from achieving a
higher write performance: First, while this approach prevents the data stored by client
applications from being lost as a result of server crashes, it also comes at the cost of a
performance penalty. In particular, the fact that it relies on a leader-based protocol,
means that its performance becomes bottlenecked when the leader server has to handle
an increased message traffic as the number of client requests and replicas increases.
Second, Zab requires significant communication between instances (as it entails three
communication steps). This can potentially lead to performance overhead and uses up
more computer resources, resulting in less guarantees for users who must then build
more complex applications to handle these issues.
To this end, the work makes four contributions. First, we implement ZooKeeper
atomic broadcast, extracting from ZooKeeper in order to make it easier for other
developers to build their applications on top of Zab without the complexity of integrating
the entire ZooKeeper codebase. Second, we propose three variations of Zab, which
are all capable of reaching an agreement in fewer communication steps than Zab. The
v
variations are built with restriction assumptions that server crashes are independent
and a server quorum remains operative at all times. The first variation offers excellent
performance but can only be used for 3-server systems; the other two are built without
this limitation. Then, we redesigned the latest two Zab variations to operate under the
least-restricted Zab fault assumptions. Third, we design and implement a ZooKeeper
coin-tossing protocol, called ZabCT which addresses the above concerns by having the
other, non-leader server replicas toss a coin and broadcast their acknowledgment of a
leader’s proposal only if the toss results in an outcome of Head. We model the ZabCT
process and derive analytical expressions for estimating the coin-tossing probability
of Head for a given arrival rate of service requests such that the dual objectives of
performance gains and traffic reduction can be accomplished. If a coin-tossing protocol,
ZabCT is judged not to offer performance benefits over Zab, processes should be able to
switch autonomously to Zab. We design protocol switching by letting processes switch
between ZabCT and Zab without stopping message delivery. Finally, an extensive
performance evaluation is provided for Zab and Zab-variant protocols
Multi-Shot Distributed Transaction Commit
Atomic Commit Problem (ACP) is a single-shot agreement problem similar to consensus, meant to model the properties of transaction commit protocols in fault-prone distributed systems. We argue that ACP is too restrictive to capture the complexities of modern transactional data stores, where commit protocols are integrated with concurrency control, and their executions for different transactions are interdependent. As an alternative, we introduce Transaction Certification Service (TCS), a new formal problem that captures safety guarantees of multi-shot transaction commit protocols with integrated concurrency control. TCS is parameterized by a certification function that can be instantiated to support common isolation levels, such as serializability and snapshot isolation. We then derive a provably correct crash-resilient protocol for implementing TCS through successive refinement. Our protocol achieves a better time complexity than mainstream approaches that layer two-phase commit on top of Paxos-style replication
- …