16 research outputs found
Impact of Message Losses on Push-Sum Protocol in Chosen Topologies
In this paper, we examine the natural robustness of the push-sum protocol to message losses in a tree, a star, a mesh, a ring and a link topology. We experimentally verify the impact of this failure on the character of the estimations, the deviation of the final estimation from the real value and the impact on the change of the convergence rate
APUS: Fast and Scalable PAXOS on RDMA
State machine replication (SMR) uses Paxos to enforce the same
inputs for a program (e.g., Redis) replicated on a number of hosts,
tolerating various types of failures. Unfortunately, traditional Paxos
protocols incur prohibitive performance overhead on server programs
due to their high consensus latency on TCP/IP. Worse, the
consensus latency of extant Paxos protocols increases drastically
when more concurrent client connections or hosts are added. This
paper presents APUS, the first RDMA-based Paxos protocol that
aims to be fast and scalable to client connections and hosts. APUS
intercepts inbound socket calls of an unmodified server program,
assigns a total order for all input requests, and uses fast RDMA
primitives to replicate these requests concurrently.
We evaluated APUS on nine widely-used server programs (e.g.,
Redis and MySQL). APUS incurred a mean overhead of 4.3% in
response time and 4.2% in throughput. We integrated APUS with an
SMR system Calvin. Our Calvin-APUS integration was 8.2X faster
than the extant Calvin-ZooKeeper integration. The consensus
latency of APUS outperformed an RDMA-based consensus protocol
by 4.9X. APUS source code and raw results are released on github.
com/hku-systems/apus.published_or_final_versio
An Analysis of Partial Network Partitioning Failures in Modern Distributed Systems
We present a comprehensive study of system failures from 12 popular systems caused by a peculiar type of network partitioning faults: partial partitions. Partial partitions isolate a set of nodes from some, but not all, nodes in the cluster. Our study reveals the studied failures are catastrophic; they lead to data loss, complete system unavailability, or stale and dirty reads. Furthermore, our study reveals that these failures are easy to manifest, they are deterministic, they can be triggered by isolating a single node, and without any interaction with the system’s clients.
We dissected the implemented fault tolerance techniques in eight popular systems. We identified four principled approaches for building a fault tolerance mechanism for partial partitions and identified the shortcomings of the current approaches. The currently implemented fault tolerance techniques are either specific to a particular protocol or implementation or may lead to a complete cluster shut down despite the availability of alternative network paths between the nodes.
Finally, we present NIFTY, a generic communication layer that leverages the capabilities of modern software-defined networking to monitor and recover the connectivity of the cluster in case of partial network partitions. NIFTY is transparent to the application running on top of it. We built NiftyDB, a database system atop NIFTY. NiftyDB implements a set of optimizations that reduce the network overhead and operation latency in case of partial network partitioning. Our analysis and evaluation show that the proposed approach can effectively mask partial network partitioning faults without incurring additional overheads