19,047 research outputs found
Sharing Memory between Byzantine Processes using Policy-enforced Tuple Spaces
Abstract—Despite the large amount of Byzantine fault-tolerant algorithms for message-passing systems designed through the years, only recent algorithms for the coordination of processes subject to Byzantine failures using shared memory have appeared. This paper presents a new computing model in which shared memory objects are protected by fine-grained access policies, and a new shared memory object, the Policy-Enforced Augmented Tuple Space (PEATS). We show the benefits of this model by providing simple and efficient consensus algorithms. These algorithms are much simpler and require less shared memory operations, using also less memory bits than previous algorithms based on access control lists (ACLs) and sticky bits. We also prove that PEATS objects are universal, i.e., that they can be used to implement any other shared memory object, and present lock-free and wait-free universal constructions. Index Terms—Byzantine fault-tolerance, shared memory algorithms, tuple spaces, consensus, universal constructions. Ç
Toward self-stabilizing wait-free shared memory objects
Past research on fault tolerant distributed systems has focussed on either processor failures, ranging from benign crash failures to the malicious byzantine failure types, or on transient memory failures, which can suddenly corrupt the state of the system. An interesting question in the theory of distributed computing is whether one can device highly fault tolerant protocols which can tolerate both processor failures as well as transient errors. To answer this question we consider the construction of self-stabilizing wait-free shared memory objects. These objects occur naturally in distributed systems in which both processors and memory may be faulty. Our contribution in this paper is threefold. First, we propose a general definition of a self-stabilizing wait-free shared memory object that expresses safety guarantees even in the face of processor failures. Second, we show that within this framework one cannot construct a self-stabilizing single-reader single-writer regular bit from single-reader single-writer safe bits. This result leads us to postulate a self-stabilizing dual-reader single-writer safe bit with which, as a third contribution, we construct self-stabilizing regular and atomic registers
On the Complexity of Implementing Certain Classes of Shared Objects
We consider shared memory systems in which asynchronous processes cooperate with each other by communicating via shared data objects, such as counters, queues, stacks, and priority queues. The common approach to implementing such shared objects is based on locking: To perform an operation on a shared object, a process obtains a lock, accesses the object, and then releases the lock. Locking, however, has several drawbacks, including convoying, priority inversion, and deadlocks. Furthermore, lock-based implementations are not fault-tolerant: if a process crashes while holding a lock, other processes can end up waiting forever for the lock.
Wait-free linearizable implementations were conceived to overcome most of the above drawbacks of locking. A wait-free implementation guarantees that if a process repeatedly takes steps, then its operation on the implemented data object will eventually complete, regardless of whether other processes are slow, or fast, or have crashed.
In this thesis, we first present an efficient wait-free linearizable implementation of a class of object types, called closed and closable types, and then prove time and space lower bounds on wait-free linearizable implementations of another class of object types, called perturbable types.
(1) We present a wait-free linearizable implementation of n-process closed and closable types (such as swap, fetch&add, fetch&multiply, and fetch&L, where L is any of the boolean operations and, or, or complement) using registers that support load-link (LL) and store-conditional (SC) as base objects.
The time complexity of the implementation grows linearly with contention, but is never more than O(log ^2 n). We believe that this is the first implementation of a class of types (as opposed to a specific type) to achieve a sub-linear time complexity.
(2) We prove linear time and space lower bounds on the wait-free linearizable implementations of n-process perturbable types (such as increment, fetch&add, modulo k counter, LL/SC bit, k-valued compare&swap (for any k \u3e= n), single-writer snapshot) that use resettable consensus and historyless objects (such as registers that support read and write) as base objects.
This improves on some previously known Omega(sqrt{n}) space complexity lower bounds. It also shows the near space optimality of some known wait-free linearizable implementations
Randomized protocols for asynchronous consensus
The famous Fischer, Lynch, and Paterson impossibility proof shows that it is
impossible to solve the consensus problem in a natural model of an asynchronous
distributed system if even a single process can fail. Since its publication,
two decades of work on fault-tolerant asynchronous consensus algorithms have
evaded this impossibility result by using extended models that provide (a)
randomization, (b) additional timing assumptions, (c) failure detectors, or (d)
stronger synchronization mechanisms than are available in the basic model.
Concentrating on the first of these approaches, we illustrate the history and
structure of randomized asynchronous consensus protocols by giving detailed
descriptions of several such protocols.Comment: 29 pages; survey paper written for PODC 20th anniversary issue of
Distributed Computin
The Impact of RDMA on Agreement
Remote Direct Memory Access (RDMA) is becoming widely available in data
centers. This technology allows a process to directly read and write the memory
of a remote host, with a mechanism to control access permissions. In this
paper, we study the fundamental power of these capabilities. We consider the
well-known problem of achieving consensus despite failures, and find that RDMA
can improve the inherent trade-off in distributed computing between failure
resilience and performance. Specifically, we show that RDMA allows algorithms
that simultaneously achieve high resilience and high performance, while
traditional algorithms had to choose one or another. With Byzantine failures,
we give an algorithm that only requires processes (where
is the maximum number of faulty processes) and decides in two (network)
delays in common executions. With crash failures, we give an algorithm that
only requires processes and also decides in two delays. Both
algorithms tolerate a minority of memory failures inherent to RDMA, and they
provide safety in asynchronous systems and liveness with standard additional
assumptions.Comment: Full version of PODC'19 paper, strengthened broadcast algorith
- …