44 research outputs found
The Impact of RDMA on Agreement
Remote Direct Memory Access (RDMA) is becoming widely available in data
centers. This technology allows a process to directly read and write the memory
of a remote host, with a mechanism to control access permissions. In this
paper, we study the fundamental power of these capabilities. We consider the
well-known problem of achieving consensus despite failures, and find that RDMA
can improve the inherent trade-off in distributed computing between failure
resilience and performance. Specifically, we show that RDMA allows algorithms
that simultaneously achieve high resilience and high performance, while
traditional algorithms had to choose one or another. With Byzantine failures,
we give an algorithm that only requires processes (where
is the maximum number of faulty processes) and decides in two (network)
delays in common executions. With crash failures, we give an algorithm that
only requires processes and also decides in two delays. Both
algorithms tolerate a minority of memory failures inherent to RDMA, and they
provide safety in asynchronous systems and liveness with standard additional
assumptions.Comment: Full version of PODC'19 paper, strengthened broadcast algorith
The FIDS Theorems: Tensions between Multinode and Multicore Performance in Transactional Systems
Traditionally, distributed and parallel transactional systems have been
studied in isolation, as they targeted different applications and experienced
different bottlenecks. However, modern high-bandwidth networks have made the
study of systems that are both distributed (i.e., employ multiple nodes) and
parallel (i.e., employ multiple cores per node) necessary to truly make use of
the available hardware.
In this paper, we study the performance of these combined systems and show
that there are inherent tradeoffs between a system's ability to have fast and
robust distributed communication and its ability to scale to multiple cores.
More precisely, we formalize the notions of a \emph{fast deciding} path of
communication to commit transactions quickly in good executions, and
\emph{seamless fault tolerance} that allows systems to remain robust to server
failures. We then show that there is an inherent tension between these two
natural distributed properties and well-known multicore scalability properties
in transactional systems. Finally, we show positive results; it is possible to
construct a parallel distributed transactional system if any one of the
properties we study is removed
Brief Announcement: Survey of Persistent Memory Correctness Conditions
In this brief paper, we survey existing correctness definitions for concurrent persistent programs
Implicit Decomposition for Write-Efficient Connectivity Algorithms
The future of main memory appears to lie in the direction of new technologies
that provide strong capacity-to-performance ratios, but have write operations
that are much more expensive than reads in terms of latency, bandwidth, and
energy. Motivated by this trend, we propose sequential and parallel algorithms
to solve graph connectivity problems using significantly fewer writes than
conventional algorithms. Our primary algorithmic tool is the construction of an
-sized "implicit decomposition" of a bounded-degree graph on
nodes, which combined with read-only access to enables fast answers to
connectivity and biconnectivity queries on . The construction breaks the
linear-write "barrier", resulting in costs that are asymptotically lower than
conventional algorithms while adding only a modest cost to querying time. For
general non-sparse graphs on edges, we also provide the first writes
and operations parallel algorithms for connectivity and biconnectivity.
These algorithms provide insight into how applications can efficiently process
computations on large graphs in systems with read-write asymmetry
Contention in Structured Concurrency: Provably Efficient Dynamic Non-Zero Indicators for Nested Parallelism
International audienceOver the past two decades, many concurrent data structures have been designed and implemented. Nearly all such work analyzes concurrent data structures empirically, omitting asymptotic bounds on their efficiency, partly because of the complexity of the analysis needed, and partly because of the difficulty of obtaining relevant asymptotic bounds: when the analysis takes into account important practical factors, such as contention, it is difficult or even impossible to prove desirable bounds. In this paper, we show that considering structured concurrency or relaxed concurrency models can enable establishing strong bounds, also for contention. To this end, we first present a dynamic relaxed counter data structure that indicates the non-zero status of the counter. Our data structure extends a recently proposed data structure, called SNZI, allowing our structure to grow dynamically in response to the increasing degree of concurrency in the system. Using the dynamic SNZI data structure, we then present a concurrent data structure for series-parallel directed acyclic graphs (sp-dags), a key data structure widely used in the implementation of modern parallel programming languages. The key component of sp-dags is an in-counter data structure that is an instance of our dynamic SNZI. We analyze the efficiency of our concurrent sp-dags and in-counter data structures under nested-parallel computing paradigm. This paradigm offers a structured model for concurrency. Under this model, we prove that our data structures require amortized O(1) shared memory steps, including contention. We present an implementation and an experimental evaluation that suggests that the sp-dags data structure is practical and can perform well in practice
Efficient and Adaptively Secure Asynchronous Binary Agreement via Binding Crusader Agreement
We present a new abstraction based on crusader agreement called (BCA) for solving binary consensus in the setting against an adversary. BCA has the validity, agreement, and termination properties of crusader agreement in addition to a new property called . Binding states that before the first non-faulty party terminates, there is a value such that no non-faulty party can output the value in any continuation of the execution.
We believe that reasoning about binding explicitly, as a first order goal, greatly helps algorithm design, clarity, and analysis.
Using our framework, we solve several versions of asynchronous binary agreement against an adaptive adversary in a simple and modular manner that either improves or matches the efficiency of state of the art solutions. We do this via new BCA protocols, given a strong common coin, and via new Graded BCA protocols given an -good common coin.
For crash failures, we reduce the expected time to terminate and we provide termination bounds that are linear in the goodness of the common coin.
For Byzantine failures, we improve the expected time to terminate in the computational setting with threshold signatures, and match the state of the art in the information theoretic setting, both with a strong common coin and with an -good common coin
Frugal Byzantine Computing
Traditional techniques for handling Byzantine failures are expensive: digital signatures are too costly, while using 3f+1 replicas is uneconomical (f denotes the maximum number of Byzantine processes). We seek algorithms that reduce the number of replicas to 2f+1 and minimize the number of signatures. While the first goal can be achieved in the message-and-memory model, accomplishing the second goal simultaneously is challenging. We first address this challenge for the problem of broadcasting messages reliably. We study two variants of this problem, Consistent Broadcast and Reliable Broadcast, typically considered very close. Perhaps surprisingly, we establish a separation between them in terms of signatures required. In particular, we show that Consistent Broadcast requires at least 1 signature in some execution, while Reliable Broadcast requires O(n) signatures in some execution. We present matching upper bounds for both primitives within constant factors. We then turn to the problem of consensus and argue that this separation matters for solving consensus with Byzantine failures: we present a practical consensus algorithm that uses Consistent Broadcast as its main communication primitive. This algorithm works for n = 2f+1 and avoids signatures in the common case - properties that have not been simultaneously achieved previously. Overall, our work approaches Byzantine computing in a frugal manner and motivates the use of Consistent Broadcast - rather than Reliable Broadcast - as a key primitive for reaching agreement
On the Round Complexity of Asynchronous Crusader Agreement
We present new lower and upper bounds on the number of communication rounds required for asynchronous Crusader Agreement (CA) and Binding Crusader Agreement (BCA), two primitives that are used for solving binary consensus. We show results for the information theoretic and authenticated settings. In doing so, we present a generic model for proving round complexity lower bounds in the asynchronous setting.
In some settings, our attempts to prove lower bounds on round complexity fail. Instead, we show new, tight, rather surprising round complexity upper bounds for Byzantine fault tolerant BCA with and without a PKI setup
Multiversion Concurrency with Bounded Delay and Precise Garbage Collection
In this paper we are interested in bounding the number of instructions taken
to process transactions. The main result is a multiversion transactional system
that supports constant delay (extra instructions beyond running in isolation)
for all read-only transactions, delay equal to the number of processes for
writing transactions that are not concurrent with other writers, and
lock-freedom for concurrent writers. The system supports precise garbage
collection in that versions are identified for collection as soon as the last
transaction releases them. As far as we know these are first results that bound
delays for multiple readers and even a single writer. The approach is
particularly useful in situations where read-transactions dominate write
transactions, or where write transactions come in as streams or batches and can
be processed by a single writer (possibly in parallel).
The approach is based on using functional data structures to support multiple
versions, and an efficient solution to the Version Maintenance (VM) problem for
acquiring, updating and releasing versions. Our solution to the VM problem is
precise, safe and wait-free (PSWF).
We experimentally validate our approach by applying it to balanced tree data
structures for maintaining ordered maps. We test the transactional system using
multiple algorithms for the VM problem, including our PSWF VM algorithm, and
implementations with weaker guarantees based on epochs, hazard pointers, and
read-copy-update. To evaluate the functional data structure for concurrency and
multi-versioning, we implement batched updates for functional tree structures
and compare the performance with state-of-the-art concurrent data structures
for balanced trees. The experiments indicate our approach works well in
practice over a broad set of criteria