Search CORE

562 research outputs found

Paxos Made Switch-y

Author: Canini Marco
Dang Huynh Tu
Pedone Fernando
Soulé Robert
Publication venue
Publication date: 16/11/2015
Field of study

This paper describes an implementation of the well-known consensus protocol, Paxos, in the P4 programming language. P4 is a language for programming the behavior of network forwarding devices (i.e., the network data plane). Moving consensus logic into network devices could significantly improve the performance of the core infrastructure and services in data centers. Moreover, implementing Paxos in P4 provides a critical use case and set of requirements for data plane language designers. In the long term, we imagine that consensus could someday be offered as a network service, just as point-to-point communication is provided today

arXiv.org e-Print Archive

Network Hardware-Accelerated Consensus

Author: Bressana Pietro
Canini Marco
Dang Huynh Tu
Lee Ki Suh
Pedone Fernando
Soulé Robert
Wang Han
Weatherspoon Hakim
Publication venue
Publication date: 18/05/2016
Field of study

Consensus protocols are the foundation for building many fault-tolerant distributed systems and services. This paper posits that there are significant performance benefits to be gained by offering consensus as a network service (CAANS). CAANS leverages recent advances in commodity networking hardware design and programmability to implement consensus protocol logic in network devices. CAANS provides a complete Paxos protocol, is a drop-in replacement for software-based implementations of Paxos, makes no restrictions on network topologies, and is implemented in a higher-level, data-plane programming language, allowing for portability across a range of target devices. At the same time, CAANS significantly increases throughput and reduces latency for consensus operations. Consensus logic executing in hardware can transmit consensus messages at line speed, with latency only slightly higher than simply forwarding packets

arXiv.org e-Print Archive

The Performance of Paxos and Fast Paxos

Author: Buzato Luiz E.
Vieira Gustavo M. D.
Publication venue
Publication date: 06/08/2013
Field of study

Paxos and Fast Paxos are optimal consensus algorithms that are simple and elegant, while suitable for efficient implementation. In this paper, we compare the performance of both algorithms in failure-free and failure-prone runs using Treplica, a general replication toolkit that implements these algorithms in a modular and efficient manner. We have found that Paxos outperforms Fast Paxos for small number of replicas and that collisions are not the cause of this performance difference.Comment: 14 pages, published in the Proc. of the 27th Brazilian Symposium on Computer Networks, Recife, Brazil, May 200

arXiv.org e-Print Archive

Ring Paxos: High-Throughput Atomic Broadcast

Author: Marandi Parisa Jalili
Pedone Fernando
Primi Marco
Schiper Nicolas
Publication venue
Publication date: 23/01/2014
Field of study

Atomic broadcast is an important communication primitive often used to implement state-machine replication. Despite the large number of atomic broadcast algorithms proposed in the literature, few papers have discussed how to turn these algorithms into efficient executable protocols. This paper focuses on a class of atomic broadcast algorithms based on Paxos, with its corresponding desirable properties: safety under asynchrony assumptions, liveness under weak synchrony assumptions, and resiliency-optimality. The paper presents two protocols, M-Ring Paxos and U-Ring Paxos, derived from Paxos. The protocols inherit the properties of Paxos and can be implemented very efficiently. We report a detailed performance analysis of M-Ring Paxos and U-Ring Paxos and compare them to other atomic broadcast protocols

arXiv.org e-Print Archive

Spectrum: A Framework for Adapting Consensus Protocols

Author: Arun Balaji
Peluso Sebastiano
Ravindran Binoy
Publication venue
Publication date: 15/02/2019
Field of study

There exists a plethora of consensus protocols in literature. The reason is that there is no one-size-fits-all solution, since every protocol is unique and its performance is directly tied to the deployment settings and workload configurations. Some protocols are well suited for geographical scale environments, e.g., leaderless, while others provide high performance under workloads with high contention, e.g., single leader-based. Thus, existing protocols seldom adapt to changing workload conditions. To overcome this limitation, we propose Spectrum, a consensus framework that is able to switch consensus protocols at run-time, to enable a dynamic reaction to changes in the workload characteristics and deployment scenarios. With this framework, we provide transparent instantiation of various consensus protocols, and a completely asynchronous switching mechanism with zero downtime. We assess the effectiveness of Spectrum via an extensive experimental evaluation, which shows that Spectrum is able to limit the increase of the user perceived latency when switching among consensus protocols

arXiv.org e-Print Archive

NetChain: Scale-Free Sub-RTT Coordination (Extended Version)

Author: Foster Nate
Jin Xin
Kim Changhoon
Lee Jeongkeun
Li Xiaozhou
Soule Robert
Stoica Ion
Zhang Haoyu
Publication venue
Publication date: 22/02/2018
Field of study

Coordination services are a fundamental building block of modern cloud systems, providing critical functionalities like configuration management and distributed locking. The major challenge is to achieve low latency and high throughput while providing strong consistency and fault-tolerance. Traditional server-based solutions require multiple round-trip times (RTTs) to process a query. This paper presents NetChain, a new approach that provides scale-free sub-RTT coordination in datacenters. NetChain exploits recent advances in programmable switches to store data and process queries entirely in the network data plane. This eliminates the query processing at coordination servers and cuts the end-to-end latency to as little as half of an RTT---clients only experience processing delay from their own software stack plus network delay, which in a datacenter setting is typically much smaller. We design new protocols and algorithms based on chain replication to guarantee strong consistency and to efficiently handle switch failures. We implement a prototype with four Barefoot Tofino switches and four commodity servers. Evaluation results show that compared to traditional server-based solutions like ZooKeeper, our prototype provides orders of magnitude higher throughput and lower latency, and handles failures gracefully

arXiv.org e-Print Archive

Harmonia: Near-Linear Scalability for Replicated Storage with In-Network Conflict Detection

Author: Bai Zhihao
Jin Xin
Li Jialin
Michael Ellis
Ports Dan
Stoica Ion
Zhu Hang
Publication venue
Publication date: 18/04/2019
Field of study

Distributed storage employs replication to mask failures and improve availability. However, these systems typically exhibit a hard tradeoff between consistency and performance. Ensuring consistency introduces coordination overhead, and as a result the system throughput does not scale with the number of replicas. We present Harmonia, a replicated storage architecture that exploits the capability of new-generation programmable switches to obviate this tradeoff by providing near-linear scalability without sacrificing consistency. To achieve this goal, Harmonia detects read-write conflicts in the network, which enables any replica to serve reads for objects with no pending writes. Harmonia implements this functionality at line rate, thus imposing no performance overhead. We have implemented a prototype of Harmonia on a cluster of commodity servers connected by a Barefoot Tofino switch, and have integrated it with Redis. We demonstrate the generality of our approach by supporting a variety of replication protocols, including primary-backup, chain replication, Viewstamped Replication, and NOPaxos. Experimental results show that Harmonia improves the throughput of these protocols by up to 10X for a replication factor of 10, providing near-linear scalability up to the limit of our testbed

arXiv.org e-Print Archive

Seamless Paxos Coordinators

Author: Buzato Luiz E.
Garcia Islene C.
Vieira Gustavo M. D.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/10/2017
Field of study

The Paxos algorithm requires a single correct coordinator process to operate. After a failure, the replacement of the coordinator may lead to a temporary unavailability of the application implemented atop Paxos. So far, this unavailability has been addressed by reducing the coordinator replacement rate through the use of stable coordinator selection algorithms. We have observed that the cost of recovery of the newly elected coordinator's state is at the core of this unavailability problem. In this paper we present a new technique to manage coordinator replacement that allows the recovery to occur concurrently with new consensus rounds. Experimental results show that our seamless approach effectively solves the temporary unavailability problem, its adoption entails uninterrupted execution of the application. Our solution removes the restriction that the occurrence of coordinator replacements is something to be avoided, allowing the decoupling of the application execution from the accuracy of the mechanism used to choose a coordinator. This result increases the performance of the application even in the presence of failures, it is of special importance to the autonomous operation of replicated applications that have to adapt to varying network conditions and partial failures.Comment: 11 pages, final published version, with correct experimental dat

arXiv.org e-Print Archive

Response Time and Availability Study of RAFT Consensus in Distributed SDN Control Plane

Author: Kellerer Wolfgang
Sakic Ermin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/02/2019
Field of study

Software defined networking (SDN) promises unprecedented flexibility and ease of network operations. While flexibility is an important factor when leveraging advantages of a new technology, critical infrastructure networks also have stringent requirements on network robustness and control plane delays. Robustness in the SDN control plane is realized by deploying multiple distributed controllers, formed into clusters for durability and fast-failover purposes. However, the effect of the controller clustering on the total system response time is not well investigated in current literature. Hence, in this work we provide a detailed analytical study of the distributed consensus algorithm RAFT, implemented in OpenDaylight and ONOS SDN controller platforms. In those controllers, RAFT implements the data-store replication, leader election after controller failures and controller state recovery on successful repairs. To evaluate its performance, we introduce a framework for numerical analysis of various SDN cluster organizations w.r.t. their response time and availability metrics. We use Stochastic Activity Networks for modeling the RAFT operations, failure injection and cluster recovery processes, and using real-world experiments, we collect the rate parameters to provide realistic inputs for a representative cluster recovery model. We also show how a fast rejuvenation mechanism for the treatment of failures induced by software errors can minimize the total response time experienced by the controller clients, while guaranteeing a higher system availability in the long-term.Comment: 14 page

arXiv.org e-Print Archive

Exploiting Commutativity For Practical Fast Replication

Author: Ousterhout John
Park Seo Jin
Publication venue
Publication date: 26/10/2017
Field of study

Traditional approaches to replication require client requests to be ordered before making them durable by copying them to replicas. As a result, clients must wait for two round-trip times (RTTs) before updates complete. In this paper, we show that this entanglement of ordering and durability is unnecessary for strong consistency. Consistent Unordered Replication Protocol (CURP) allows clients to replicate requests that have not yet been ordered, as long as they are commutative. This strategy allows most operations to complete in 1 RTT (the same as an unreplicated system). We implemented CURP in the Redis and RAMCloud storage systems. In RAMCloud, CURP improved write latency by ~2x (13.8 us -> 7.3 us) and write throughput by 4x. Compared to unreplicated RAMCloud, CURP's latency overhead for 3-way replication is just 0.4 us (6.9 us vs 7.3 us). CURP transformed a non-durable Redis cache into a consistent and durable storage system with only a small performance overhead.Comment: 16 pages, 13 figure

arXiv.org e-Print Archive