14 research outputs found
The Weakest Failure Detector for Eventual Consistency
In its classical form, a consistent replicated service requires all replicas
to witness the same evolution of the service state. Assuming a message-passing
environment with a majority of correct processes, the necessary and sufficient
information about failures for implementing a general state machine replication
scheme ensuring consistency is captured by the {\Omega} failure detector. This
paper shows that in such a message-passing environment, {\Omega} is also the
weakest failure detector to implement an eventually consistent replicated
service, where replicas are expected to agree on the evolution of the service
state only after some (a priori unknown) time. In fact, we show that {\Omega}
is the weakest to implement eventual consistency in any message-passing
environment, i.e., under any assumption on when and where failures might occur.
Ensuring (strong) consistency in any environment requires, in addition to
{\Omega}, the quorum failure detector {\Sigma}. Our paper thus captures, for
the first time, an exact computational difference be- tween building a
replicated state machine that ensures consistency and one that only ensures
eventual consistency
Self-stabilizing virtual synchrony
Virtual synchrony (VS) is an important abstraction that is proven to be extremely useful when implemented over asynchronous, typically large, message-passing distributed systems. Fault tolerant design is critical for the success of such implementations since large distributed systems can be highly available as long as they do not depend on the full operational status of every system participant. Self-stabilizing systems can tolerate transient faults that drive the system to an arbitrary unpredictable configuration. Such systems automatically regain consistency from any such configuration, and then produce the desired system behavior ensuring it for practically infinite number of successive steps, e.g., 264 steps. We present a new multi-purpose self-stabilizing counter algorithm establishing an efficient practically unbounded counter, that can directly yield a self-stabilizing Multiple-Writer Multiple-Reader (MWMR) register emulation. We use our counter algorithm, together with a selfstabilizing group membership and a self-stabilizing multicast service to devise the first practically stabilizing VS algorithm and a self-stabilizing VS-based emulation of state machine replication (SMR). As we base the SMR implementation on VS, rather than consensus, the system progresses in more extreme asynchronous settings in relation to consensusbased SMR
Self-stabilizing Byzantine Multivalued Consensus
Consensus, abstracting a myriad of problems in which processes have to agree
on a single value, is one of the most celebrated problems of fault-tolerant
distributed computing. Consensus applications include fundamental services for
the environments of the Cloud and Blockchain, and in such challenging
environments, malicious behaviors are often modeled as adversarial Byzantine
faults.
At OPODIS 2010, Mostefaoui and Raynal (in short MR) presented a
Byzantine-tolerant solution to consensus in which the decided value cannot be a
value proposed only by Byzantine processes. MR has optimal resilience coping
with up to t < n/3 Byzantine nodes over n processes. MR provides this
multivalued consensus object (which accepts proposals taken from a finite set
of values) assuming the availability of a single Binary consensus object (which
accepts proposals taken from the set {0,1}).
This work, which focuses on multivalued consensus, aims at the design of an
even more robust solution than MR. Our proposal expands MR's fault-model with
self-stabilization, a vigorous notion of fault-tolerance. In addition to
tolerating Byzantine, self-stabilizing systems can automatically recover after
the occurrence of arbitrary transient-faults. These faults represent any
violation of the assumptions according to which the system was designed to
operate (provided that the algorithm code remains intact).
To the best of our knowledge, we propose the first self-stabilizing solution
for intrusion-tolerant multivalued consensus for asynchronous message-passing
systems prone to Byzantine failures. Our solution has a O(t) stabilization time
from arbitrary transient faults.Comment: arXiv admin note: text overlap with arXiv:2110.0859
Near-optimal self-stabilising counting and firing squads
Consider a fully-connected synchronous distributed system consisting of n nodes, where up to f nodes may be faulty and every node starts in an arbitrary initial state. In the synchronous C-counting problem, all nodes need to eventually agree on a counter that is increased by one modulo C in each round for given C>1. In the self-stabilising firing squad problem, the task is to eventually guarantee that all non-faulty nodes have simultaneous responses to external inputs: if a subset of the correct nodes receive an external “go” signal as input, then all correct nodes should agree on a round (in the not-too-distant future) in which to jointly output a “fire” signal. Moreover, no node should generate a “fire” signal without some correct node having previously received a “go” signal as input. We present a framework reducing both tasks to binary consensus at very small cost. For example, we obtain a deterministic algorithm for self-stabilising Byzantine firing squads with optimal resilience f<n/3, asymptotically optimal stabilisation and response time O(f), and message size O(log f). As our framework does not restrict the type of consensus routines used, we also obtain efficient randomised solutions
self-stabilizing
Consider a fully-connected synchronous distributed system consisting of n nodes, where up to f nodes may be faulty and every node starts in an arbitrary initial state. In the synchronous C-counting problem, all nodes need to eventually agree on a counter that is increased by one modulo C in each round for given C>1. In the self-stabilising firing squad problem, the task is to eventually guarantee that all non-faulty nodes have simultaneous responses to external inputs: if a subset of the correct nodes receive an external “go” signal as input, then all correct nodes should agree on a round (in the not-too-distant future) in which to jointly output a “fire” signal. Moreover, no node should generate a “fire” signal without some correct node having previously received a “go” signal as input. We present a framework reducing both tasks to binary consensus at very small cost. For example, we obtain a deterministic algorithm for self-stabilising Byzantine firing squads with optimal resilience f<n/3, asymptotically optimal stabilisation and response time O(f), and message size O(log f). As our framework does not restrict the type of consensus routines used, we also obtain efficient randomised solutions
Self-Stabilizing and Private Distributed Shared Atomic Memory in Seldomly Fair Message Passing Networks
We study the problem of privately emulating shared memory in message-passing networks. The system includes clients that store and retrieve replicated information on N servers, out of which e are data-corrupting malicious. When a client accesses a data-corrupting malicious server, the data field of that server response might be different from the value it originally stored. However, all other control variables in the server reply and protocol actions are according to the server algorithm. For the coded atomic storage algorithms by Cadambe et al., we present an enhancement that ensures no information leakage and data-corrupting malicious fault-tolerance. We also consider recovery after the occurrence of transient faults that violate the assumptions according to which the system was designed to operate. After their last occurrence, transient faults leave the system in an arbitrary state (while the program code stays intact). We present a self-stabilizing algorithm, which recovers after the occurrence of transient faults. This addition to Cadambe et al. considers asynchronous settings as long as no transient faults occur. The recovery from transient faults that bring the system counters (close) to their maximal values may include the use of a global reset procedure, which requires the system run to be controlled by a fair scheduler. After the recovery period, the safety properties are provided for asynchronous system runs that are not necessarily controlled by fair schedulers. Since the recovery period is bounded and the occurrence of transient faults is extremely rare, we call this design criteria self-stabilization in the presence of seldom fairness. Our self-stabilizing algorithm uses a bounded amount of storage during asynchronous executions (that are not necessarily controlled by fair schedulers). To the best of our knowledge, we are the first to address privacy, data-corrupting malicious behavior, and self-stabilization in the context of emulating atomic shared memory in message-passing systems