Search CORE

336 research outputs found

Performance Engineering of a Lightweight Fault Tolerance Framework

Author: Chai Hua
Publication venue: EngagedScholarship@CSU
Publication date: 01/01/2009
Field of study

It is well-known that the Paxos algorithm can be used to build provably correct practical fault tolerant systems. In this thesis, a lightweight consensus framework - Paxos-Based Fault Tolerance (PFT) framework and its practical implementation is presented. It also includes how the system tolerates faults under practical conditions where the replicas might not be strictly homogeneous due to the asynchrony of their deployment environment. A comprehensive performance evaluation study is performed on the PFT framework. The approaches that can optimize the fault tolerance mechanisms under various practical scenarios are also discusse

Cleveland-Marshall College of Law

Performance Engineering of a Lightweight Fault Tolerance Framework

Author: Chai Hua
Publication venue: EngagedScholarship@CSU
Publication date: 01/01/2009
Field of study

OhioLINK Electronic Thesis and Dissertation Center

Cleveland-Marshall College of Law

ShallowForest: Optimizing All-to-All Data Transmission in WANs

Author: Tan Hao
Publication venue: 'University of Waterloo'
Publication date: 16/05/2019
Field of study

All-to-all data transmission is a typical data transmission pattern in both consensus protocols and blockchain systems. Developing an optimization scheme that provides high throughput and low latency data transmission can significantly benefit the performance of those systems. This thesis investigates the problem of optimizing all-to-all data transmission in a wide area network (WAN) using overlay multicast. I first prove that in a congestion-free core network model, using shallow tree overlays with height up to two is sufficient for all-to-all data transmission to achieve the optimal throughput allowed by the available network resources. Based on this finding, I build ShallowForest, a data plane optimization for consensus protocols and blockchain systems. The goal of ShallowForest is to improve consensus protocols' resilience to skewed client load distribution. Experiments with skewed client load across replicas in the Amazon cloud demonstrate that ShallowForest can improve the commit throughput of the EPaxos consensus protocol by up to 100% with up to 60% reduction in commit latenc

University of Waterloo's Institutional Repository

The Next 700 BFT Protocols

Author: Guerraoui Rachid
Knezevic Nikola
Quema Vivien
Vukolic Marko
Publication venue: HAL CCSD
Publication date: 13/04/2010
Field of study

International audienceCet article présente un framework permettant de faciliter le développent de protocoles de réplication de machines à états tolérant les fautes byzantines

Hal - Université Grenoble Alpes

Incremental Consistency Guarantees for Replicated Objects

Author: Guerraoui Rachid
Pavlovic Matej
Seredinschi Dragos-Adrian
Publication venue
Publication date: 08/09/2016
Field of study

Programming with replicated objects is difficult. Developers must face the fundamental trade-off between consistency and performance head on, while struggling with the complexity of distributed storage stacks. We introduce Correctables, a novel abstraction that hides most of this complexity, allowing developers to focus on the task of balancing consistency and performance. To aid developers with this task, Correctables provide incremental consistency guarantees, which capture successive refinements on the result of an ongoing operation on a replicated object. In short, applications receive both a preliminary---fast, possibly inconsistent---result, as well as a final---consistent---result that arrives later. We show how to leverage incremental consistency guarantees by speculating on preliminary values, trading throughput and bandwidth for improved latency. We experiment with two popular storage systems (Cassandra and ZooKeeper) and three applications: a Twissandra-based microblogging service, an ad serving system, and a ticket selling system. Our evaluation on the Amazon EC2 platform with YCSB workloads A, B, and C shows that we can reduce the latency of strongly consistent operations by up to 40% (from 100ms to 60ms) at little cost (10% bandwidth increase, 6% throughput drop) in the ad system. Even if the preliminary result is frequently inconsistent (25% of accesses), incremental consistency incurs a bandwidth overhead of only 27%.Comment: 16 total pages, 12 figures. OSDI'16 (to appear

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

High-performance state-machine replication

Author: Jalili Marandi Parisa
Pedone Fernando
Publication venue
Publication date: 23/09/2014
Field of study

Replication, a common approach to protecting applications against failures, refers to maintaining several copies of a service on independent machines (replicas). Unlike a stand-alone service, a replicated service remains available to its clients despite the failure of some of its copies. Consistency among replicas is an immediate concern raised by replication. In effect, an important factor for providing the illusion of an uninterrupted service to clients is to preserve consistency among the multiple copies. State-machine replication is a popular replication technique that ensures consistency by ordering client requests and making all the replicas execute them deterministically and sequentially. The overhead of ordering the requests, and the sequentiality of request execution, the two essential requirements in realizing state-machine replication, are also the two major obstacles that prevent the performance of state-machine replication from scaling. In this thesis we concentrate on the performance of state-machine replication and enhance it by overcoming the two aforementioned bottlenecks, the overhead of ordering and the overhead of sequentially executing commands. To realize a truly scalable system, one must iteratively examine and analyze all the layers and components of a system and avoid or eliminate potential performance obstructions and congestion points. In this dissertation, we iterate between optimizing the ordering of requests and the strategies of replicas at request execution, in order to stretch the performance boundaries of state-machine replication. To eliminate the negative implications of the ordering layer on performance, we devise and implement several novel and highly efficient ordering protocols. Our proposals are based on practical observations we make after closely assessing and identifying the shortcomings of existing approaches. Communication is one of the most important components of any distributed system and thus selecting efficient communication patterns is a must in designing scalable systems. We base our protocols on the most suitable communication patterns and extend their design with additional features that altogether realize our protocol's high efficiency. The outcome of this phase is the design and implementation of the Ring Paxos family of protocols. According to our evaluations these protocols are highly scalable and efficient. We then assess the performance ramifications of sequential execution of requests on the replicas of state-machine replication. We use some known techniques such as state-partitioning and speculative execution, and thoroughly examine their advantages when combined with our ordering protocols. We then exploit the features of multicore hardware and propose our final solution as a parallelized form of state-machine replication, built on top of Ring Paxos protocols, that is capable of accomplishing significantly high performance. Given the popularity of state-machine replication in designing fault-tolerant systems, we hope this thesis provides useful and practical guidelines for the enhancement of the existing and the design of future fault-tolerant systems that share similar performance goals

RERO DOC Digital Library