Search CORE

18 research outputs found

Rethinking State-Machine Replication for Parallelism

Author: Bezerra Carlos Eduardo
Marandi Parisa Jalili
Pedone Fernando
Publication venue
Publication date: 24/11/2013
Field of study

State-machine replication, a fundamental approach to designing fault-tolerant services, requires commands to be executed in the same order by all replicas. Moreover, command execution must be deterministic: each replica must produce the same output upon executing the same sequence of commands. These requirements usually result in single-threaded replicas, which hinders service performance. This paper introduces Parallel State-Machine Replication (P-SMR), a new approach to parallelism in state-machine replication. P-SMR scales better than previous proposals since no component plays a centralizing role in the execution of independent commands---those that can be executed concurrently, as defined by the service. The paper introduces P-SMR, describes a "commodified architecture" to implement it, and compares its performance to other proposals using a key-value store and a networked file system

arXiv.org e-Print Archive

Crossref

Dynamic group communication

Author: Schiper André
Publication venue
Publication date: 18/06/2018
Field of study

Group communication is the basic infrastructure for implementing fault-tolerant replicated servers. While group communication is well understood in the context of static groups (in which the membership does not change), current specifications of dynamic group communication (in which processes can join and leave groups during the computation) have not yet reached the same level of maturity. The paper proposes new specifications - in the primary partition model - for dynamic reliable broadcast (simply called "reliable multicast”), dynamic atomic broadcast (simply called "atomic multicast”) and group membership. In the special case of a static system, the new specifications are identical to the well known static specifications. Interestingly, not only are these new specifications "syntactically” close to the static specifications, but they are also "semantically” close to the dynamic specifications proposed in the literature. We believe that this should contribute to clarify a topic that has always been difficult to understand by outsiders. Finally, the paper shows how to solve atomic multicast, group membership and reliable broadcast. The solution of atomic multicast is close to the (static) atomic broadcast solution based on reduction to consensus. Group membership is solved using atomic multicast. Reliable multicast can be efficiently solved by relying on a thrifty generic multicast algorith

RERO DOC Digital Library

Dynamic Group Communication

Author: Schiper André
Publication venue
Publication date: 13/07/2005
Field of study

Group communication is the basic infrastructure for implementing fault-tolerant replicated servers. While group communication is well understood in the context of static systems (in which all processes are created at the start of the computation), current specifications of dynamic group communication (in which processes can be added and removed during the computation) are not satisfactory. The paper proposes new specifications for dynamic reliable broadcast (which we call reliable multicast), dynamic atomic broadcast (which we call atomic multicast) and group membership. In the special case of a static system, our specifications are identical to the well known static specifications. The specification of group membership is derived from the specification of atomic multicast. The paper also shows how to solve atomic multicast, group membership and reliable broadcast. The solution of atomic multicast is close to the (static) atomic broadcast solution based on reduction to consensus. Group membership is solved using atomic multicast. In the context of reliable multicast, we introduce the notion of thrifty solution, and show that such a solution can be obtained by relying on a thrifty generic multicast algorithm

Infoscience - École polytechnique fédérale de Lausanne

Scalable service-oriented replication with flexible consistency guarantee in the cloud

Author: Abdel-Rahman H. Tawil
Amjad
Andronikou
Bernstein
Birman
Chang
Coulouris
DeCandia
Défago
Ekwall
Hsiao
Lamport
Lamport
Li
Lloret
Mansouri
Osrael
Powell
Rami Bahsoon
Saltzer
Schneider
Sheng
Tao Chen
Villegas
Vogels
Wei
Zhang
Zhu
Publication venue: 'Elsevier BV'
Publication date: 28/11/2013
Field of study

Replication techniques are widely applied in and for cloud to improve scalability and availability. In such context, the well-understood problem is how to guarantee consistency amongst different replicas and govern the trade-off between consistency and scalability requirements. Such requirements are often related to specific services and can vary considerably in the cloud. However, a major drawback of existing service-oriented replication approaches is that they only allow either restricted consistency or none at all. Consequently, service-oriented systems based on such replication techniques may violate consistency requirements or not scale well. In this paper, we present a Scalable Service Oriented Replication (SSOR) solution, a middleware that is capable of satisfying applications’ consistency requirements when replicating cloud-based services. We introduce new formalism for describing services in service-oriented replication. We propose the notion of consistency regions and relevant service oriented requirements policies, by which trading between consistency and scalability requirements can be handled within regions. We solve the associated sub-problem of atomic broadcasting by introducing a Multi-fixed Sequencers Protocol (MSP), which is a requirements aware variation of the traditional fixed sequencer approach. We also present a Region-based Election Protocol (REP) that elastically balances the workload amongst sequencers. Finally, we experimentally evaluate our approach under different loads, to show that the proposed approach achieves better scalability with more flexible consistency constraints when compared with the state-of-the-art replication technique

Crossref

Birmingham City University Open Access Repository

University of Birmingham Research Portal

Nottingham Trent Institutional Repository (IRep)

BCU Open Access

Solving atomic multicast when groups crash

Author: C. Delporte-Gallet
C. Delporte-Gallet
C. Delporte-Gallet
F. Pedone
F.B. Schneider
G.V. Chockler
K.P. Birman
L. Lamport
L. Rodrigues
M. Aguilera
M.K. Aguilera
N. Schiper
R. Guerraoui
T.D. Chandra
T.D. Chandra
U. Frirzke Jr.
U. Fritzke
V. Hadzilacos
X. Défago
Publication venue
Publication date: 01/01/2008
Field of study

In this paper, we study the atomic multicast problem, a fundamental abstraction for building faulttolerant systems. In the atomic multicast problem, the system is divided into non-empty and disjoint groups of processes. Multicast messages may be addressed to any subset of groups, each message possibly being multicast to a different subset. Several papers previously studied this problem either in local area networks [3, 9, 20] or wide area networks [13, 21]. However, none of them considered atomic multicast when groups may crash. We present two atomic multicast algorithms that tolerate the crash of groups. The first algorithm tolerates an arbitrary number of failures, is genuine (i.e., to deliver a message m, only addressees of m are involved in the protocol), and uses the perfect failures detector P. We show that among realistic failure detectors, i.e., those that do not predict the future, P is necessary to solve genuine atomic multicast if we do not bound the number of processes that may fail. Thus, P is the weakest realistic failure detector for solving genuine atomic multicast when an arbitrary number of processes may crash. Our second algorithm is non-genuine and less resilient to process failures than the first algorithm but has several advantages: (i) it requires perfect failure detection within groups only, and not across the system, (ii) as we show in the paper it can be modified to rely on unreliable failure detection at the cost of a weaker liveness guarantee, and (iii) it is fast, messages addressed to multiple groups may be delivered within two inter-group message delays only

Crossref

RERO DOC Digital Library

Hermes: a Fast, Fault-Tolerant and Linearizable Replication Protocol

Author: Adya Atul
Aguilera Marcos
Aleksandar Dragojević
Anwar Ali
Baker Jason
Balakrishnan Mahesh
Behrens Jonathan
Brian
Bronson Nathan
Burrows Mike
Consistent
DeCandia Giuseppe
Gray Jim
Hunt Patrick
István Zsolt
Jha Sagar
Jin Xin
Kalia Anuj
Leslie Lamport
Li Jialin
Lim Hyeontaek
Lu Yuanwei
Mao Yanhua
Nightingale Edmund B.
Ongaro Diego
Poke Marius
Reed Benjamin
Renesse Robbert Van
Terrace Jeff
van Renesse Robbert
Wei Michael
Woo Shinae
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 27/01/2020
Field of study

Today's datacenter applications are underpinned by datastores that are responsible for providing availability, consistency, and performance. For high availability in the presence of failures, these datastores replicate data across several nodes. This is accomplished with the help of a reliable replication protocol that is responsible for maintaining the replicas strongly-consistent even when faults occur. Strong consistency is preferred to weaker consistency models that cannot guarantee an intuitive behavior for the clients. Furthermore, to accommodate high demand at real-time latencies, datastores must deliver high throughput and low latency. This work introduces Hermes, a broadcast-based reliable replication protocol for in-memory datastores that provides both high throughput and low latency by enabling local reads and fully-concurrent fast writes at all replicas. Hermes couples logical timestamps with cache-coherence-inspired invalidations to guarantee linearizability, avoid write serialization at a centralized ordering point, resolve write conflicts locally at each replica (hence ensuring that writes never abort) and provide fault-tolerance via replayable writes. Our implementation of Hermes over an RDMA-enabled reliable datastore with five replicas shows that Hermes consistently achieves higher throughput than state-of-the-art RDMA-based reliable protocols (ZAB and CRAQ) across all write ratios while also significantly reducing tail latency. At 5% writes, the tail latency of Hermes is 3.6X lower than that of CRAQ and ZAB.Comment: Accepted in ASPLOS 202

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

Solving Agreement Problems with Weak Ordering Oracles

Author: Cavin David
Pedone Fernando
Schiper André
Urbán Péter
Publication venue
Publication date: 20/05/2005
Field of study

Agreement problems, such as consensus, atomic broadcast, and group membership, are central to the implementation of fault-tolerant distributed systems. Despite the diversity of algorithms that have been proposed for solving agreement problems in the past years, almost all solutions are crash detection based (CDB). We say that an algorithm is CDB if it uses some information about the status crashed/not crashed of processes. Randomized consensus algorithms are rare exceptions non-CDB algorithms. In this paper, we revisit the issue of non-CDB algorithms. Instead of randomization, we consider ordering oracles. Ordering oracles have a theoretical interest (e.g., they extend the state of the art of non-CDB algorithms) as well as a practical interest (e.g., they remove altogether the burden involved in tuning timeout mechanisms). To illustrate their use, we present solutions to consensus and atomic broadcast, and evaluate the performance of the atomic broadcast algorithm in a cluster of workstations

Infoscience - École polytechnique fédérale de Lausanne

Group Communication: From Practice to Theory

Author: Schiper André
Publication venue
Publication date: 26/05/2008
Field of study

Infoscience - École polytechnique fédérale de Lausanne

Handling message semantics with Generic Broadcast protocols

Author: Pedone F.
Schiper A.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/05/2005
Field of study

Infoscience - École polytechnique fédérale de Lausanne

Dependable Systems

Author: Schiper André
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/02/2012
Field of study

Improving the dependability of computer systems is a critical and essential task. In this context, the paper surveys techniques that allow to achieve fault tolerance in distributed systems by replication. The main replication techniques are first explained. Then group communication is introduced as the communication infrastructure that allows the implementation of the different replication techniques. Finally the difficulty of implementing group communication is discussed, and the most important algorithms are presented

Infoscience - École polytechnique fédérale de Lausanne