1,027 research outputs found
TANDEM: taming failures in next-generation datacenters with emerging memory
The explosive growth of online services, leading to unforeseen scales, has made modern datacenters highly prone to failures. Taming these failures hinges on fast and correct recovery, minimizing service interruptions.
Applications, owing to recovery, entail additional measures to maintain a recoverable state of data and computation logic during their failure-free execution. However, these precautionary measures have
severe implications on performance, correctness, and programmability, making recovery incredibly challenging to realize in practice.
Emerging memory, particularly non-volatile memory (NVM) and disaggregated memory (DM), offers a promising opportunity to achieve fast recovery with maximum performance. However, incorporating these technologies into datacenter architecture presents significant challenges; Their distinct architectural attributes, differing significantly from traditional memory devices, introduce new semantic challenges for
implementing recovery, complicating correctness and programmability.
Can emerging memory enable fast, performant, and correct recovery in the datacenter? This thesis aims to answer this question while addressing the associated challenges.
When architecting datacenters with emerging memory, system architects face four key challenges: (1) how to guarantee correct semantics; (2) how to efficiently enforce correctness with optimal performance; (3) how to validate end-to-end correctness including recovery; and (4) how to preserve programmer productivity (Programmability).
This thesis aims to address these challenges through the following approaches: (a)
defining precise consistency models that formally specify correct end-to-end semantics
in the presence of failures (consistency models also play a crucial role in programmability); (b) developing new low-level mechanisms to efficiently enforce the prescribed models given the capabilities of emerging memory; and (c) creating robust testing frameworks to validate end-to-end correctness and recovery.
We start our exploration with non-volatile memory (NVM), which offers fast persistence capabilities directly accessible through the processor’s load-store (memory) interface. Notably, these capabilities can be leveraged to enable fast recovery for Log-Free Data Structures (LFDs) while maximizing performance. However, due to the complexity of modern cache hierarchies, data hardly persist in any specific order, jeop-
ardizing recovery and correctness. Therefore, recovery needs primitives that explicitly control the order of updates to NVM (known as persistency models). We outline the precise specification of a novel persistency model – Release Persistency (RP) – that provides a consistency guarantee for LFDs on what remains in non-volatile memory upon failure. To efficiently enforce RP, we propose a novel microarchitecture mechanism,
lazy release persistence (LRP). Using standard LFDs benchmarks, we show that LRP achieves fast recovery while incurring minimal overhead on performance.
We continue our discussion with memory disaggregation which decouples memory from traditional monolithic servers, offering a promising pathway for achieving very high availability in replicated in-memory data stores. Achieving such availability hinges on transaction protocols that can efficiently handle recovery in this setting, where
compute and memory are independent. However, there is a challenge: disaggregated memory (DM) fails to work with RPC-style protocols, mandating one-sided transaction protocols. Exacerbating the problem, one-sided transactions expose critical low-level
ordering to architects, posing a threat to correctness. We present a highly available transaction protocol, Pandora, that is specifically designed to achieve fast recovery in disaggregated key-value stores (DKVSes).
Pandora is the first one-sided transactional protocol that ensures correct, non-blocking, and fast recovery in DKVS. Our experimental implementation artifacts demonstrate that Pandora achieves fast recovery and high availability while causing minimal disruption to services.
Finally, we introduce a novel target litmus-testing framework – DART – to validate the end-to-end correctness of transactional protocols with recovery. Using DART’s target testing capabilities, we have found several critical bugs in Pandora, highlighting the need for robust end-to-end testing methods in the design loop to iteratively fix correctness bugs. Crucially, DART is lightweight and black-box, thereby eliminating
any intervention from the programmers
Language Design for Reactive Systems: On Modal Models, Time, and Object Orientation in Lingua Franca and SCCharts
Reactive systems play a crucial role in the embedded domain. They continuously interact with their environment, handle concurrent operations, and are commonly expected to provide deterministic behavior to enable application in safety-critical systems. In this context, language design is a key aspect, since carefully tailored language constructs can aid in addressing the challenges faced in this domain, as illustrated by the various concurrency models that prevent the known pitfalls of regular threads. Today, many languages exist in this domain and often provide unique characteristics that make them specifically fit for certain use cases. This thesis evolves around two distinctive languages: the actor-oriented polyglot coordination language Lingua Franca and the synchronous statecharts dialect SCCharts. While they take different approaches in providing reactive modeling capabilities, they share clear similarities in their semantics and complement each other in design principles. This thesis analyzes and compares key design aspects in the context of these two languages. For three particularly relevant concepts, it provides and evaluates lean and seamless language extensions that are carefully aligned with the fundamental principles of the underlying language. Specifically, Lingua Franca is extended toward coordinating modal behavior, while SCCharts receives a timed automaton notation with an efficient execution model using dynamic ticks and an extension toward the object-oriented modeling paradigm
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
Auditable and performant Byzantine consensus for permissioned ledgers
Permissioned ledgers allow users to execute transactions against a data store, and retain proof of their execution in a replicated ledger. Each replica verifies the transactions’ execution and ensures that, in perpetuity, a committed transaction cannot be removed from the ledger. Unfortunately, this is not guaranteed by today’s permissioned ledgers, which can be re-written if an arbitrary number of replicas collude. In addition, the transaction throughput of permissioned ledgers is low, hampering real-world deployments, by not taking advantage of multi-core CPUs and hardware accelerators.
This thesis explores how permissioned ledgers and their consensus protocols can be made auditable in perpetuity; even when all replicas collude and re-write the ledger. It also addresses how Byzantine consensus protocols can be changed to increase the execution throughput of complex transactions. This thesis makes the following contributions:
1. Always auditable Byzantine consensus protocols. We present a permissioned ledger system that can assign blame to individual replicas regardless of how many of them misbehave. This is achieved by signing and storing consensus protocol messages in the ledger and providing clients with signed, universally-verifiable receipts.
2. Performant transaction execution with hardware accelerators. Next, we describe a cloud-based ML inference service that provides strong integrity guarantees, while staying compatible with current inference APIs. We change the Byzantine consensus protocol to execute machine learning (ML) inference computation on GPUs to optimize throughput and latency of ML inference computation.
3. Parallel transactions execution on multi-core CPUs. Finally, we introduce a permissioned ledger that executes transactions, in parallel, on multi-core CPUs. We separate the execution of transactions between the primary and secondary replicas. The primary replica executes transactions on multiple CPU cores and creates a dependency graph of the transactions that the backup replicas utilize to execute transactions in parallel.Open Acces
Distributed System Fuzzing
Grey-box fuzzing is the lightweight approach of choice for finding bugs in
sequential programs. It provides a balance between efficiency and effectiveness
by conducting a biased random search over the domain of program inputs using a
feedback function from observed test executions. For distributed system
testing, however, the state-of-practice is represented today by only black-box
tools that do not attempt to infer and exploit any knowledge of the system's
past behaviours to guide the search for bugs.
In this work, we present Mallory: the first framework for grey-box
fuzz-testing of distributed systems. Unlike popular black-box distributed
system fuzzers, such as Jepsen, that search for bugs by randomly injecting
network partitions and node faults or by following human-defined schedules,
Mallory is adaptive. It exercises a novel metric to learn how to maximize the
number of observed system behaviors by choosing different sequences of faults,
thus increasing the likelihood of finding new bugs. The key enablers for our
approach are the new ideas of timeline-driven testing and timeline abstraction
that provide the feedback function guiding a biased random search for failures.
Mallory dynamically constructs Lamport timelines of the system behaviour,
abstracts these timelines into happens-before summaries, and introduces faults
guided by its real-time observation of the summaries.
We have evaluated Mallory on a diverse set of widely-used industrial
distributed systems. Compared to the start-of-the-art black-box fuzzer Jepsen,
Mallory explores more behaviours and takes less time to find bugs. Mallory
discovered 22 zero-day bugs (of which 18 were confirmed by developers),
including 10 new vulnerabilities, in rigorously-tested distributed systems such
as Braft, Dqlite, and Redis. 6 new CVEs have been assigned
Mining Butterflies in Streaming Graphs
This thesis introduces two main-memory systems sGrapp and sGradd for performing the fundamental analytic tasks of biclique counting and concept drift detection over a streaming graph. A data-driven heuristic is used to architect the systems. To this end, initially, the growth patterns of bipartite streaming graphs are mined and the emergence principles of streaming motifs are discovered. Next, the discovered principles are (a) explained by a graph generator called sGrow; and (b) utilized to establish the requirements for efficient, effective, explainable, and interpretable management and processing of streams. sGrow is used to benchmark stream analytics, particularly in the case of concept drift detection.
sGrow displays robust realization of streaming growth patterns independent of initial conditions, scale and temporal characteristics, and model configurations. Extensive evaluations confirm the simultaneous effectiveness and efficiency of sGrapp and sGradd. sGrapp achieves mean absolute percentage error up to 0.05/0.14 for the cumulative butterfly count in streaming graphs with uniform/non-uniform temporal distribution and a processing throughput of 1.5 million data records per second. The throughput and estimation error of sGrapp are 160x higher and 0.02x lower than baselines. sGradd demonstrates an improving performance over time, achieves zero false detection rates when there is not any drift and when drift is already detected, and detects sequential drifts in zero to a few seconds after their occurrence regardless of drift intervals
Colordag: An Incentive-Compatible Blockchain
We present , a blockchain protocol where following the prescribed strategy is, with high probability, a best response as long as all miners have less than of the mining power. We prove the correctness of Colordag even if there is an extremely powerful adversary who knows future actions of the scheduler: specifically, when agents will generate blocks and when messages will arrive. The state-of-the-art protocol, Fruitchain, is an -Nash equilibrium as long as all miners have less than of the mining power. However, there is a simple deviation that guarantees that deviators are never worse off than they would be by following Fruitchain, and can sometimes do better. Thus, agents are motivated to deviate. Colordag implements a solution concept that we call and does not suffer from this problem. Because it is an -sure Nash equilibrium, Colordag is an -Nash equilibrium and with probability is a best response
Design and Real-World Evaluation of Dependable Wireless Cyber-Physical Systems
The ongoing effort for an efficient, sustainable, and automated interaction between humans, machines, and our environment will make cyber-physical systems (CPS) an integral part of the industry and our daily lives. At their core, CPS integrate computing elements, communication networks, and physical processes that are monitored and controlled through sensors and actuators. New and innovative applications become possible by extending or replacing static and expensive cable-based communication infrastructures with wireless technology. The flexibility of wireless CPS is a key enabler for many envisioned scenarios, such as intelligent factories, smart farming, personalized healthcare systems, autonomous search and rescue, and smart cities.
High dependability, efficiency, and adaptivity requirements complement the demand for wireless and low-cost solutions in such applications. For instance, industrial and medical systems should work reliably and predictably with performance guarantees, even if parts of the system fail. Because emerging CPS will feature mobile and battery-driven devices that can execute various tasks, the systems must also quickly adapt to frequently changing conditions. Moreover, as applications become ever more sophisticated, featuring compact embedded devices that are deployed densely and at scale, efficient designs are indispensable to achieve desired operational lifetimes and satisfy high bandwidth demands.
Meeting these partly conflicting requirements, however, is challenging due to imperfections of wireless communication and resource constraints along several dimensions, for example, computing, memory, and power constraints of the devices. More precisely, frequent and correlated message losses paired with very limited bandwidth and varying delays for the message exchange significantly complicate the control design. In addition, since communication ranges are limited, messages must be relayed over multiple hops to cover larger distances, such as an entire factory. Although the resulting mesh networks are more robust against interference, efficient communication is a major challenge as wireless imperfections get amplified, and significant coordination effort is needed, especially if the networks are dynamic.
CPS combine various research disciplines, which are often investigated in isolation, ignoring their complex interaction. However, to address this interaction and build trust in the proposed solutions, evaluating CPS using real physical systems and wireless networks paired with formal guarantees of a system’s end-to-end behavior is necessary. Existing works that take this step can only satisfy a few of the abovementioned requirements. Most notably, multi-hop communication has only been used to control slow physical processes while providing no guarantees. One of the reasons is that the current communication protocols are not suited for dynamic multi-hop networks.
This thesis closes the gap between existing works and the diverse needs of emerging wireless CPS. The contributions address different research directions and are split into two parts. In the first part, we specifically address the shortcomings of existing communication protocols and make the following contributions to provide a solid networking foundation:
• We present Mixer, a communication primitive for the reliable many-to-all message exchange in dynamic wireless multi-hop networks. Mixer runs on resource-constrained low-power embedded devices and combines synchronous transmissions and network coding for a highly scalable and topology-agnostic message exchange. As a result, it supports mobile nodes and can serve any possible traffic patterns, for example, to efficiently realize distributed control, as required by emerging CPS applications.
• We present Butler, a lightweight and distributed synchronization mechanism with formally guaranteed correctness properties to improve the dependability of synchronous transmissions-based protocols. These protocols require precise time synchronization provided by a specific node. Upon failure of this node, the entire network cannot communicate. Butler removes this single point of failure by quickly synchronizing all nodes in the network without affecting the protocols’ performance.
In the second part, we focus on the challenges of integrating communication and various control concepts using classical time-triggered and modern event-based approaches. Based on the design, implementation, and evaluation of the proposed solutions using real systems and networks, we make the following contributions, which in many ways push the boundaries of previous approaches:
• We are the first to demonstrate and evaluate fast feedback control over low-power wireless multi-hop networks. Essential for this achievement is a novel co-design and integration of communication and control. Our wireless embedded platform tames the imperfections impairing control, for example, message loss and varying delays, and considers the resulting key properties in the control design. Furthermore, the careful orchestration of control and communication tasks enables real-time operation and makes our system amenable to an end-to-end analysis. Due to this, we can provably guarantee closed-loop stability for physical processes with linear time-invariant dynamics.
• We propose control-guided communication, a novel co-design for distributed self-triggered control over wireless multi-hop networks. Self-triggered control can save energy by transmitting data only when needed. However, there are no solutions that bring those savings to multi-hop networks and that can reallocate freed-up resources, for example, to other agents. Our control system informs the communication system of its transmission demands ahead of time so that communication resources can be allocated accordingly. Thus, we can transfer the energy savings from the control to the communication side and achieve an end-to-end benefit.
• We present a novel co-design of distributed control and wireless communication that resolves overload situations in which the communication demand exceeds the available bandwidth. As systems scale up, featuring more agents and higher bandwidth demands, the available bandwidth will be quickly exceeded, resulting in overload. While event-triggered control and self-triggered control approaches reduce the communication demand on average, they cannot prevent that potentially all agents want to communicate simultaneously. We address this limitation by dynamically allocating the available bandwidth to the agents with the highest need. Thus, we can formally prove that our co-design guarantees closed-loop stability for physical systems with stochastic linear time-invariant dynamics.:Abstract
Acknowledgements
List of Abbreviations
List of Figures
List of Tables
1 Introduction
1.1 Motivation
1.2 Application Requirements
1.3 Challenges
1.4 State of the Art
1.5 Contributions and Road Map
2 Mixer: Efficient Many-to-All Broadcast in Dynamic Wireless Mesh Networks
2.1 Introduction
2.2 Overview
2.3 Design
2.4 Implementation
2.5 Evaluation
2.6 Discussion
2.7 Related Work
3 Butler: Increasing the Availability of Low-Power Wireless Communication Protocols
3.1 Introduction
3.2 Motivation and Background
3.3 Design
3.4 Analysis
3.5 Implementation
3.6 Evaluation
3.7 Related Work
4 Feedback Control Goes Wireless: Guaranteed Stability over Low-Power Multi-Hop Networks
4.1 Introduction
4.2 Related Work
4.3 Problem Setting and Approach
4.4 Wireless Embedded System Design
4.5 Control Design and Analysis
4.6 Experimental Evaluation
4.A Control Details
5 Control-Guided Communication: Efficient Resource Arbitration and Allocation in Multi-Hop Wireless Control Systems
5.1 Introduction
5.2 Problem Setting
5.3 Co-Design Approach
5.4 Wireless Communication System Design
5.5 Self-Triggered Control Design
5.6 Experimental Evaluation
6 Scaling Beyond Bandwidth Limitations: Wireless Control With Stability Guarantees Under Overload
6.1 Introduction
6.2 Problem and Related Work
6.3 Overview of Co-Design Approach
6.4 Predictive Triggering and Control System
6.5 Adaptive Communication System
6.6 Integration and Stability Analysis
6.7 Testbed Experiments
6.A Proof of Theorem 4
6.B Usage of the Network Bandwidth for Control
7 Conclusion and Outlook
7.1 Contributions
7.2 Future Directions
Bibliography
List of Publication
Concurrent Asynchronous Byzantine Agreement in Expected-Constant Rounds, Revisited
It is well known that without randomization, Byzantine agreement (BA) requires a linear number of rounds in the synchronous setting, while it is flat out impossible in the asynchronous setting. The primitive which allows to bypass the above limitation is known as oblivious common coin (OCC). It allows parties to agree with constant probability on a random coin, where agreement is oblivious, i.e., players are not aware whether or not agreement has been achieved.
The starting point of our work is the observation that no known protocol exists for information-theoretic multi-valued OCC---i.e., OCC where the coin might take a value from a domain of cardinality larger than 2---with optimal resiliency in the asynchronous (with eventual message delivery) setting. This apparent hole in the literature is particularly problematic, as multi-valued OCC is implicitly or explicitly used in several constructions. (In fact, it is often falsely attributed to the asynchronous BA result by Canetti and Rabin [STOC ’93], which, however, only achieves binary OCC and does not translate to a multi-valued OCC protocol.)
In this paper, we present the first information-theoretic multi-valued OCC protocol in the asynchronous setting with optimal resiliency, i.e., tolerating corruptions, thereby filling this important gap. Further, our protocol efficiently implements OCC with an exponential-size domain, a property which is not even achieved by known constructions in the simpler, synchronous setting.
We then turn to the problem of round-preserving parallel composition of asynchronous BA. A protocol for this task was proposed by Ben-Or and El-Yaniv [Distributed Computing ’03].
Their construction, however, is flawed in several ways: For starters, it relies on multi-valued OCC instantiated by Canetti and Rabin\u27s result (which, as mentioned above, only provides binary OCC).
This shortcoming can be repaired by plugging in our above multi-valued OCC construction. However, as we show, even with this fix it remains unclear whether the protocol of Ben-Or and El-Yaniv achieves its goal of expected-constant-round parallel asynchronous BA, as the proof is incorrect. Thus, as a second contribution, we provide a simpler, more modular protocol for the above task. Finally, and as a contribution of independent interest, we provide proofs in Canetti\u27s Universal Composability framework; this makes our work the first one offering composability guarantees, which are important as BA is a core building block of secure multi-party computation protocols
Breaking the Consensus Bound: Asynchronous Dynamic Proactive Secret Sharing under Honest Majority
A proactive secret sharing scheme (PSS), expressed in the dynamic-membership setting, enables a committee of n holders of secret-shares, dubbed as players, to securely hand-over new shares of the same secret to a new committee. We dub such a sub-protocol as a Refresh. All existing PSS under an honest majority, require the use of a broadcast (BC) in each refresh. BC is costly to implement, and its security relies on timing assumptions on the network. So the privacy of the secret and/or its guaranteed delivery, either depend on network assumptions, or, on the reliability of a public ledger.
By contrast, PSS over asynchronous channels do not have these constraints. However, all of them (but one, with exponential complexity) use asynchronous verifiable secret sharing (AVSS) and consensus (MVBA and/or ACS), which are impossible under asynchrony beyond t<n/3 corruptions, whatever the setup.
We present a PSS, named asynchronous-proactive secret sharing (APSS), which is the first PSS under honest majority with guaranteed output delivery in a completely asynchronous network. More generally, APSS allows any flexible threshold , such that privacy and correctness are guaranteed up to t corruptions, and liveness as soon as players behave honestly.
Correctness can be lifted to any number of corruptions, provided a linearly homomorphic commitment scheme.
Moreover, each refresh completes at the record speed of , where is the actual message delivery delay.
APSS demonstrates that proactive refreshes are possible as long as players of the initial committee only, have a common view on a set of (publicly committed or encrypted) shares.
Despite not providing consensus on a unique set of shares, APSS surprisingly enables the opening of any linear map over secrets { non-interactively, without consensus }. This, in turn, applies to threshold signing, decryption and randomness generation.
APSS can also be directly integrated into the asynchronous Schnorr threshold signing scheme Roast [CCS\u2722].
Of independent interest, we:
- provide the first UC formalization (and proof) of proactive AVSS, furthermore for arbitrary thresholds;
- provide additional mechanisms enabling players of a committee to start a refresh then erase their old shares, synchronously up to from each other;
- improve by 50x the verification speed of the NIZKs of encrypted re-sharing of [Cascudo et al, Asiacrypt\u2722], by using novel optimizations of batch Schnorr proofs of knowledge.
We demonstrate efficiency of APSS with an implementation which uses this optimization as baseline
- …