Hermes: a Fast, Fault-Tolerant and Linearizable Replication Protocol

Adya Atul; Aguilera Marcos; Aleksandar Dragojević; Anwar Ali; Baker Jason; Balakrishnan Mahesh; Behrens Jonathan; Brian; Bronson Nathan; Burrows Mike; Consistent; DeCandia Giuseppe; Gray Jim; Hunt Patrick; István Zsolt; Jha Sagar; Jin Xin; Kalia Anuj; Leslie Lamport; Li Jialin; Lim Hyeontaek; Lu Yuanwei; Mao Yanhua; Nightingale Edmund B.; Ongaro Diego; Poke Marius; Reed Benjamin; Renesse Robbert Van; Terrace Jeff; van Renesse Robbert; Wei Michael; Woo Shinae

Hermes: a Fast, Fault-Tolerant and Linearizable Replication Protocol

Authors: Adya Atul
Aguilera Marcos
Aleksandar Dragojević
Anwar Ali
Baker Jason
Balakrishnan Mahesh
Behrens Jonathan
Brian
Bronson Nathan
Burrows Mike
Consistent
DeCandia Giuseppe
Gray Jim
Hunt Patrick
István Zsolt
Jha Sagar
Jin Xin
Kalia Anuj
Leslie Lamport
Li Jialin
Lim Hyeontaek
Lu Yuanwei
Mao Yanhua
Nightingale Edmund B.
Ongaro Diego
Poke Marius
Reed Benjamin
Renesse Robbert Van
Terrace Jeff
van Renesse Robbert
Wei Michael
Woo Shinae
Publication date: 27 January 2020
Publisher: 'Association for Computing Machinery (ACM)'
Doi

Abstract

Today's datacenter applications are underpinned by datastores that are responsible for providing availability, consistency, and performance. For high availability in the presence of failures, these datastores replicate data across several nodes. This is accomplished with the help of a reliable replication protocol that is responsible for maintaining the replicas strongly-consistent even when faults occur. Strong consistency is preferred to weaker consistency models that cannot guarantee an intuitive behavior for the clients. Furthermore, to accommodate high demand at real-time latencies, datastores must deliver high throughput and low latency. This work introduces Hermes, a broadcast-based reliable replication protocol for in-memory datastores that provides both high throughput and low latency by enabling local reads and fully-concurrent fast writes at all replicas. Hermes couples logical timestamps with cache-coherence-inspired invalidations to guarantee linearizability, avoid write serialization at a centralized ordering point, resolve write conflicts locally at each replica (hence ensuring that writes never abort) and provide fault-tolerance via replayable writes. Our implementation of Hermes over an RDMA-enabled reliable datastore with five replicas shows that Hermes consistently achieves higher throughput than state-of-the-art RDMA-based reliable protocols (ZAB and CRAQ) across all write ratios while also significantly reducing tail latency. At 5% writes, the tail latency of Hermes is 3.6X lower than that of CRAQ and ZAB.Comment: Accepted in ASPLOS 202

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Edinburgh Research Explorer

oai:pure.ed.ac.uk:publications...

Last time updated on 11/05/2020

Crossref

Last time updated on 10/08/2021