562 research outputs found
Paxos Made Switch-y
This paper describes an implementation of the well-known consensus protocol,
Paxos, in the P4 programming language. P4 is a language for programming the
behavior of network forwarding devices (i.e., the network data plane). Moving
consensus logic into network devices could significantly improve the
performance of the core infrastructure and services in data centers. Moreover,
implementing Paxos in P4 provides a critical use case and set of requirements
for data plane language designers. In the long term, we imagine that consensus
could someday be offered as a network service, just as point-to-point
communication is provided today
Network Hardware-Accelerated Consensus
Consensus protocols are the foundation for building many fault-tolerant
distributed systems and services. This paper posits that there are significant
performance benefits to be gained by offering consensus as a network service
(CAANS). CAANS leverages recent advances in commodity networking hardware
design and programmability to implement consensus protocol logic in network
devices. CAANS provides a complete Paxos protocol, is a drop-in replacement for
software-based implementations of Paxos, makes no restrictions on network
topologies, and is implemented in a higher-level, data-plane programming
language, allowing for portability across a range of target devices. At the
same time, CAANS significantly increases throughput and reduces latency for
consensus operations. Consensus logic executing in hardware can transmit
consensus messages at line speed, with latency only slightly higher than simply
forwarding packets
The Performance of Paxos and Fast Paxos
Paxos and Fast Paxos are optimal consensus algorithms that are simple and
elegant, while suitable for efficient implementation. In this paper, we compare
the performance of both algorithms in failure-free and failure-prone runs using
Treplica, a general replication toolkit that implements these algorithms in a
modular and efficient manner. We have found that Paxos outperforms Fast Paxos
for small number of replicas and that collisions are not the cause of this
performance difference.Comment: 14 pages, published in the Proc. of the 27th Brazilian Symposium on
Computer Networks, Recife, Brazil, May 200
Ring Paxos: High-Throughput Atomic Broadcast
Atomic broadcast is an important communication primitive often used to
implement state-machine replication. Despite the large number of atomic
broadcast algorithms proposed in the literature, few papers have discussed how
to turn these algorithms into efficient executable protocols. This paper
focuses on a class of atomic broadcast algorithms based on Paxos, with its
corresponding desirable properties: safety under asynchrony assumptions,
liveness under weak synchrony assumptions, and resiliency-optimality. The paper
presents two protocols, M-Ring Paxos and U-Ring Paxos, derived from Paxos. The
protocols inherit the properties of Paxos and can be implemented very
efficiently. We report a detailed performance analysis of M-Ring Paxos and
U-Ring Paxos and compare them to other atomic broadcast protocols
Spectrum: A Framework for Adapting Consensus Protocols
There exists a plethora of consensus protocols in literature. The reason is
that there is no one-size-fits-all solution, since every protocol is unique and
its performance is directly tied to the deployment settings and workload
configurations. Some protocols are well suited for geographical scale
environments, e.g., leaderless, while others provide high performance under
workloads with high contention, e.g., single leader-based. Thus, existing
protocols seldom adapt to changing workload conditions. To overcome this
limitation, we propose Spectrum, a consensus framework that is able to switch
consensus protocols at run-time, to enable a dynamic reaction to changes in the
workload characteristics and deployment scenarios. With this framework, we
provide transparent instantiation of various consensus protocols, and a
completely asynchronous switching mechanism with zero downtime. We assess the
effectiveness of Spectrum via an extensive experimental evaluation, which shows
that Spectrum is able to limit the increase of the user perceived latency when
switching among consensus protocols
NetChain: Scale-Free Sub-RTT Coordination (Extended Version)
Coordination services are a fundamental building block of modern cloud
systems, providing critical functionalities like configuration management and
distributed locking. The major challenge is to achieve low latency and high
throughput while providing strong consistency and fault-tolerance. Traditional
server-based solutions require multiple round-trip times (RTTs) to process a
query. This paper presents NetChain, a new approach that provides scale-free
sub-RTT coordination in datacenters. NetChain exploits recent advances in
programmable switches to store data and process queries entirely in the network
data plane. This eliminates the query processing at coordination servers and
cuts the end-to-end latency to as little as half of an RTT---clients only
experience processing delay from their own software stack plus network delay,
which in a datacenter setting is typically much smaller. We design new
protocols and algorithms based on chain replication to guarantee strong
consistency and to efficiently handle switch failures. We implement a prototype
with four Barefoot Tofino switches and four commodity servers. Evaluation
results show that compared to traditional server-based solutions like
ZooKeeper, our prototype provides orders of magnitude higher throughput and
lower latency, and handles failures gracefully
Harmonia: Near-Linear Scalability for Replicated Storage with In-Network Conflict Detection
Distributed storage employs replication to mask failures and improve
availability. However, these systems typically exhibit a hard tradeoff between
consistency and performance. Ensuring consistency introduces coordination
overhead, and as a result the system throughput does not scale with the number
of replicas. We present Harmonia, a replicated storage architecture that
exploits the capability of new-generation programmable switches to obviate this
tradeoff by providing near-linear scalability without sacrificing consistency.
To achieve this goal, Harmonia detects read-write conflicts in the network,
which enables any replica to serve reads for objects with no pending writes.
Harmonia implements this functionality at line rate, thus imposing no
performance overhead. We have implemented a prototype of Harmonia on a cluster
of commodity servers connected by a Barefoot Tofino switch, and have integrated
it with Redis. We demonstrate the generality of our approach by supporting a
variety of replication protocols, including primary-backup, chain replication,
Viewstamped Replication, and NOPaxos. Experimental results show that Harmonia
improves the throughput of these protocols by up to 10X for a replication
factor of 10, providing near-linear scalability up to the limit of our testbed
Seamless Paxos Coordinators
The Paxos algorithm requires a single correct coordinator process to operate.
After a failure, the replacement of the coordinator may lead to a temporary
unavailability of the application implemented atop Paxos. So far, this
unavailability has been addressed by reducing the coordinator replacement rate
through the use of stable coordinator selection algorithms. We have observed
that the cost of recovery of the newly elected coordinator's state is at the
core of this unavailability problem. In this paper we present a new technique
to manage coordinator replacement that allows the recovery to occur
concurrently with new consensus rounds. Experimental results show that our
seamless approach effectively solves the temporary unavailability problem, its
adoption entails uninterrupted execution of the application. Our solution
removes the restriction that the occurrence of coordinator replacements is
something to be avoided, allowing the decoupling of the application execution
from the accuracy of the mechanism used to choose a coordinator. This result
increases the performance of the application even in the presence of failures,
it is of special importance to the autonomous operation of replicated
applications that have to adapt to varying network conditions and partial
failures.Comment: 11 pages, final published version, with correct experimental dat
Response Time and Availability Study of RAFT Consensus in Distributed SDN Control Plane
Software defined networking (SDN) promises unprecedented flexibility and ease
of network operations. While flexibility is an important factor when leveraging
advantages of a new technology, critical infrastructure networks also have
stringent requirements on network robustness and control plane delays.
Robustness in the SDN control plane is realized by deploying multiple
distributed controllers, formed into clusters for durability and fast-failover
purposes. However, the effect of the controller clustering on the total system
response time is not well investigated in current literature. Hence, in this
work we provide a detailed analytical study of the distributed consensus
algorithm RAFT, implemented in OpenDaylight and ONOS SDN controller platforms.
In those controllers, RAFT implements the data-store replication, leader
election after controller failures and controller state recovery on successful
repairs. To evaluate its performance, we introduce a framework for numerical
analysis of various SDN cluster organizations w.r.t. their response time and
availability metrics. We use Stochastic Activity Networks for modeling the RAFT
operations, failure injection and cluster recovery processes, and using
real-world experiments, we collect the rate parameters to provide realistic
inputs for a representative cluster recovery model. We also show how a fast
rejuvenation mechanism for the treatment of failures induced by software errors
can minimize the total response time experienced by the controller clients,
while guaranteeing a higher system availability in the long-term.Comment: 14 page
Exploiting Commutativity For Practical Fast Replication
Traditional approaches to replication require client requests to be ordered
before making them durable by copying them to replicas. As a result, clients
must wait for two round-trip times (RTTs) before updates complete. In this
paper, we show that this entanglement of ordering and durability is unnecessary
for strong consistency. Consistent Unordered Replication Protocol (CURP) allows
clients to replicate requests that have not yet been ordered, as long as they
are commutative. This strategy allows most operations to complete in 1 RTT (the
same as an unreplicated system). We implemented CURP in the Redis and RAMCloud
storage systems. In RAMCloud, CURP improved write latency by ~2x (13.8 us ->
7.3 us) and write throughput by 4x. Compared to unreplicated RAMCloud, CURP's
latency overhead for 3-way replication is just 0.4 us (6.9 us vs 7.3 us). CURP
transformed a non-durable Redis cache into a consistent and durable storage
system with only a small performance overhead.Comment: 16 pages, 13 figure
- …