333 research outputs found
Hyperledger Fabric: A Distributed Operating System for Permissioned Blockchains
Fabric is a modular and extensible open-source system for deploying and
operating permissioned blockchains and one of the Hyperledger projects hosted
by the Linux Foundation (www.hyperledger.org).
Fabric is the first truly extensible blockchain system for running
distributed applications. It supports modular consensus protocols, which allows
the system to be tailored to particular use cases and trust models. Fabric is
also the first blockchain system that runs distributed applications written in
standard, general-purpose programming languages, without systemic dependency on
a native cryptocurrency. This stands in sharp contrast to existing blockchain
platforms that require "smart-contracts" to be written in domain-specific
languages or rely on a cryptocurrency. Fabric realizes the permissioned model
using a portable notion of membership, which may be integrated with
industry-standard identity management. To support such flexibility, Fabric
introduces an entirely novel blockchain design and revamps the way blockchains
cope with non-determinism, resource exhaustion, and performance attacks.
This paper describes Fabric, its architecture, the rationale behind various
design decisions, its most prominent implementation aspects, as well as its
distributed application programming model. We further evaluate Fabric by
implementing and benchmarking a Bitcoin-inspired digital currency. We show that
Fabric achieves end-to-end throughput of more than 3500 transactions per second
in certain popular deployment configurations, with sub-second latency, scaling
well to over 100 peers.Comment: Appears in proceedings of EuroSys 2018 conferenc
Byzantine-Resilient Federated Learning with Heterogeneous Data Distribution
For mitigating Byzantine behaviors in federated learning (FL), most
state-of-the-art approaches, such as Bulyan, tend to leverage the similarity of
updates from the benign clients. However, in many practical FL scenarios, data
is non-IID across clients, thus the updates received from even the benign
clients are quite dissimilar. Hence, using similarity based methods result in
wasted opportunities to train a model from interesting non-IID data, and also
slower model convergence. We propose DiverseFL to overcome this challenge in
heterogeneous data distribution settings. Rather than comparing each client's
update with other client updates to detect Byzantine clients, DiverseFL
compares each client's update with a guiding update of that client. Any client
whose update diverges from its associated guiding update is then tagged as a
Byzantine node. The FL server in DiverseFL computes the guiding update in every
round for each client over a small sample of the client's local data that is
received only once before start of the training. However, sharing even a small
sample of client's data with the FL server can compromise client's data privacy
needs. To tackle this challenge, DiverseFL creates a Trusted Execution
Environment (TEE)-based enclave to receive each client's sample and to compute
its guiding updates. TEE provides a hardware assisted verification and
attestation to each client that its data is not leaked outside of TEE. Through
experiments involving neural networks, benchmark datasets and popular Byzantine
attacks, we demonstrate that DiverseFL not only performs Byzantine mitigation
quite effectively, it also almost matches the performance of OracleSGD, where
the server only aggregates the updates from the benign clients
The Bedrock of Byzantine Fault Tolerance: A Unified Platform for BFT Protocol Design and Implementation
Byzantine Fault-Tolerant (BFT) protocols have recently been extensively used
by decentralized data management systems with non-trustworthy infrastructures,
e.g., permissioned blockchains. BFT protocols cover a broad spectrum of design
dimensions from infrastructure settings such as the communication topology, to
more technical features such as commitment strategy and even fundamental social
choice properties like order-fairness. The proliferation of different BFT
protocols has rendered it difficult to navigate the BFT landscape, let alone
determine the protocol that best meets application needs. This paper presents
Bedrock, a unified platform for BFT protocols design, analysis, implementation,
and experiments. Bedrock proposes a design space consisting of a set of design
choices capturing the trade-offs between different design space dimensions and
providing fundamentally new insights into the strengths and weaknesses of BFT
protocols. Bedrock enables users to analyze and experiment with BFT protocols
within the space of plausible choices, evolve current protocols to design new
ones, and even uncover previously unknown protocols. Our experimental results
demonstrate the capability of Bedrock to uniformly evaluate BFT protocols in
new ways that were not possible before due to the diverse assumptions made by
these protocols. The results validate Bedrock's ability to analyze and derive
BFT protocols
Keeping checkpoint/restart viable for exascale systems
Next-generation exascale systems, those capable of performing a quintillion operations per second, are expected to be delivered in the next 8-10 years. These systems, which will be 1,000 times faster than current systems, will be of unprecedented scale. As these systems continue to grow in size, faults will become increasingly common, even over the course of small calculations. Therefore, issues such as fault tolerance and reliability will limit application scalability. Current techniques to ensure progress across faults like checkpoint/restart, the dominant fault tolerance mechanism for the last 25 years, are increasingly problematic at the scales of future systems due to their excessive overheads. In this work, we evaluate a number of techniques to decrease the overhead of checkpoint/restart and keep this method viable for future exascale systems. More specifically, this work evaluates state-machine replication to dramatically increase the checkpoint interval (the time between successive checkpoints) and hash-based, probabilistic incremental checkpointing using graphics processing units to decrease the checkpoint commit time (the time to save one checkpoint). Using a combination of empirical analysis, modeling, and simulation, we study the costs and benefits of these approaches on a wide range of parameters. These results, which cover of number of high-performance computing capability workloads, different failure distributions, hardware mean time to failures, and I/O bandwidths, show the potential benefits of these techniques for meeting the reliability demands of future exascale platforms
Sui Lutris: A Blockchain Combining Broadcast and Consensus
Sui Lutris is the first smart-contract platform to sustainably achieve
sub-second finality. It achieves this significant decrease in latency by
employing consensusless agreement not only for simple payments but for a large
variety of transactions. Unlike prior work, Sui Lutris neither compromises
expressiveness nor throughput and can run perpetually without restarts. Sui
Lutris achieves this by safely integrating consensuless agreement with a
high-throughput consensus protocol that is invoked out of the critical finality
path but makes sure that when a transaction is at risk of inconsistent
concurrent accesses its settlement is delayed until the total ordering is
resolved. Building such a hybrid architecture is especially delicate during
reconfiguration events, where the system needs to preserve the safety of the
consensusless path without compromising the long-term liveness of potentially
misconfigured clients. We thus develop a novel reconfiguration protocol, the
first to show the safe and efficient reconfiguration of a consensusless
blockchain. Sui Lutris is currently running in production as part of a major
smart-contract platform. Combined with the Move Programming language it enables
the safe execution of smart contracts that expose objects as a first-class
resource. In our experiments Sui Lutris achieves latency lower than 0.5 seconds
for throughput up to 5,000 certificates per second (150k ops/s with bundling),
compared to the state-of-the-art real-world consensus latencies of 3 seconds.
Furthermore, it gracefully handles validators crash-recovery and does not
suffer visible performance degradation during reconfiguration
- …