Scaling the Infrastructure of Practical Blockchain Systems

Abstract

196 pagesThe infrastructure of a blockchain system consists of a replication service that tolerates limited adversarial behavior among participants. It requires both a Byzantine fault tolerant (BFT) replication protocol to defend against the adversaries and an underlying storage system to preserve states. This dissertation explores two designs for BFT replication and one design for the persistent storage. We first present HotStuff, a leader-based BFT replication protocol for the partially synchronous system model. Once network communication becomes synchronous, HotStuff enables a correct leader to drive the protocol to consensus at the pace of actual (vs. maximum) network delay—a property called responsiveness—and with communication complexity that is linear in the number of replicas. To the best of our knowledge, HotStuff is the first partially synchronous BFT replication protocol exhibiting these combined properties. HotStuff is built around a novel framework that forms a bridge between classical BFT foundations and blockchains. It allows the expression of other known protocols (DLS, PBFT, Tendermint, Casper), and ours, in a common framework. Our deployment of HotStuff over a network with over 100 replicas achieves throughput and latency comparable to that of BFT-SMaRt, while enjoying a linear communication footprint during leader failover (vs. cubic with BFT-SMaRt). Then, we introduce a family of leaderless BFT protocols, exploiting metastable properties of network subsampling. These protocols provide a strong probabilistic safety guarantee in the presence of Byzantine adversaries while their concurrent and leaderless nature enables them to achieve high throughput and scalability. Unlike blockchains that rely on Proof-of-Work, blockchains built on our protocols are quiescent and green. Unlike traditional consensus protocols where typically one or more nodes must process a linear number of bits in the number of total nodes per decision, no node processes more than a logarithmic number of bits. It does not require accurate knowledge of all participants and exposes new possible tradeoffs and improvements in safety and liveness for building consensus protocols. We describe the Snow protocol family, and how it can be used to construct the core of an internet-scale electronic payment system, Avalanche, which is evaluated in a large scale deployment. Experiments demonstrate that the system can achieve high throughput, provide low confirmation latency and scale well compared to existing systems that deliver similar functionality. For our implementation and setup, the bottleneck of the system is in transaction verification. Finally we propose a new in-memory index that is also storage-friendly. A “lazy-trie” is a variant of the hash-trie data structure that achieves near-optimal height, has practical storage overhead, and can be maintained on-disk with standard write-ahead logging. We present CedrusDB, a persistent key-value store based on a lazy-trie. The lazy-trie is kept on disk while made available in memory using standard memory-mapping. The lazy-trie organization in virtual memory allows CedrusDB to better leverage concurrent processing than other on-disk index schemes (LSMs, B+-trees). CedrusDB achieves comparable or superior performance to recent log-based in-memory key-value stores in mixed workloads while being able to recover quickly from failures

    Similar works

    Full text

    thumbnail-image

    Available Versions