1,380 research outputs found
The Weakest Failure Detector for Eventual Consistency
In its classical form, a consistent replicated service requires all replicas
to witness the same evolution of the service state. Assuming a message-passing
environment with a majority of correct processes, the necessary and sufficient
information about failures for implementing a general state machine replication
scheme ensuring consistency is captured by the {\Omega} failure detector. This
paper shows that in such a message-passing environment, {\Omega} is also the
weakest failure detector to implement an eventually consistent replicated
service, where replicas are expected to agree on the evolution of the service
state only after some (a priori unknown) time. In fact, we show that {\Omega}
is the weakest to implement eventual consistency in any message-passing
environment, i.e., under any assumption on when and where failures might occur.
Ensuring (strong) consistency in any environment requires, in addition to
{\Omega}, the quorum failure detector {\Sigma}. Our paper thus captures, for
the first time, an exact computational difference be- tween building a
replicated state machine that ensures consistency and one that only ensures
eventual consistency
Enhanced Failure Detection Mechanism in MapReduce
The popularity of MapReduce programming model has increased interest in the research community for its improvement. Among the other directions, the point of fault tolerance, concretely the failure detection issue seems to be a crucial one, but that until now has not reached its satisfying level. Motivated by this, I decided to devote my main research during this period into having a prototype system architecture of MapReduce framework with a new failure detection service, containing both analytical (theoretical) and implementation part. I am confident that this work should lead the way for further contributions in detecting failures to any NoSQL App frameworks, and cloud storage systems in general
Using Oracle to Solve ZooKeeper on Two-Replica Problems
The project introduces an Oracle, a failure detector, in Apache ZooKeeper and makes it fault-tolerant in a two-node system. The project demonstrates the Oracle authorizes the primary process to maintain the liveness when the majorityâs rule becomes an obstacle to continue Apache ZooKeeper service. In addition to the property of accuracy and completeness from Chandra et al.âs research, the project proposes the property of see to avoid losing transactions and the property of mutual exclusion to avoid split-brain issues. The hybrid properties render not only more sounder flexibility in the implementation but also stronger guarantees on safety. Thus, the Oracle complements Apache ZooKeeperâs availability
Distributed eventual leader election in the crash-recovery and general omission failure models.
102 p.Distributed applications are present in many aspects of everyday life. Banking, healthcare or transportation are examples of such applications. These applications are built on top of distributed systems. Roughly speaking, a distributed system is composed of a set of processes that collaborate among them to achieve a common goal. When building such systems, designers have to cope with several issues, such as different synchrony assumptions and failure occurrence. Distributed systems must ensure that the delivered service is trustworthy.Agreement problems compose a fundamental class of problems in distributed systems. All agreement problems follow the same pattern: all processes must agree on some common decision. Most of the agreement problems can be considered as a particular instance of the Consensus problem. Hence, they can be solved by reduction to consensus. However, a fundamental impossibility result, namely (FLP), states that in an asynchronous distributed system it is impossible to achieve consensus deterministically when at least one process may fail. A way to circumvent this obstacle is by using unreliable failure detectors. A failure detector allows to encapsulate synchrony assumptions of the system, providing (possibly incorrect) information about process failures. A particular failure detector, called Omega, has been shown to be the weakest failure detector for solving consensus with a majority of correct processes. Informally, Omega lies on providing an eventual leader election mechanism
Anonymous Asynchronous Systems: The Case of Failure Detectors
Due the multiplicity of loci of control, a main issue distributed systems have to cope with lies in the uncertainty on the system state created by the adversaries that are asynchrony, failures, dynamicity, mobility, etc. Considering message-passing systems, this paper considers the uncertainty created by the net effect of three of these adversaries, namely, asynchrony, failures, and anonymity. This means that, in addition to be asynchronous and crash-prone, the processes have no identity. Trivially, agreement problems (e.g., consensus) that cannot be solved in presence of asynchrony and failures cannot be solved either when adding anonymity. The paper consequently proposes anonymous failure detectors to circumvent these impossibilities. It has several contributions. First it presents three classes of failure detectors (denoted AP, Aâ© and Aâ) and show that they are the anonymous counterparts of the classes of perfect failure detectors, eventual leader failure detectors and quorum failure detectors, respectively. The class Aâ is new and showing it is the anonymous counterpart of the class â is not trivial. Then, the paper presents and proves correct a genuinely anonymous consensus algorithm based on the pair of anonymous failure detector classes (Aâ©, Aâ) (âgenuinelyâ means that, not only processes have no identity, but no process is aware of the total number of processes). This new algorithm is not a âstraightforward extensionâ of an algorithm designed for non-anonymous systems. To benefit from Aâ, it uses a novel message exchange pattern where each phase of every round is made up of sub-rounds in which appropriate control information is exchanged. Finally, the paper discusses the notions of failure detector class hierarchy and weakest failure detector class for a given problem in the context of anonymous systems
Interactive Consistency in practical, mostly-asynchronous systems
Interactive consistency is the problem in which n nodes, where up to t may be
byzantine, each with its own private value, run an algorithm that allows all
non-faulty nodes to infer the values of each other node. This problem is
relevant to critical applications that rely on the combination of the opinions
of multiple peers to provide a service. Examples include monitoring a content
source to prevent equivocation or to track variability in the content provided,
and resolving divergent state amongst the nodes of a distributed system.
Previous works assume a fully synchronous system, where one can make strong
assumptions such as negligible message delivery delays and/or detection of
absent messages. However, practical, real-world systems are mostly
asynchronous, i.e., they exhibit only some periods of synchrony during which
message delivery is timely, thus requiring a different approach. In this paper,
we present a thorough study on practical interactive consistency. We leverage
the vast prior work on broadcast and byzantine consensus algorithms to design,
implement and evaluate a set of algorithms, with varying timing assumptions and
message complexity, that can be used to achieve interactive consistency in
real-world distributed systems. We provide a complete, open-source
implementation of each proposed interactive consistency algorithm by building a
multi-layered stack of protocols that include several broadcast protocols, as
well as a binary and a multi-valued consensus protocol. Most of these protocols
have never been implemented and evaluated in a real system before. We analyze
the performance of our suite of algorithms experimentally by engaging in both
single instance and multiple parallel instances of each alternative.Comment: 13 pages, 10 figure
Perspectives on the CAP Theorem
Almost twelve years ago, in 2000, Eric Brewer introduced the idea that there is a fundamental trade-off between consistency, availability, and partition tolerance. This trade-off, which has become known as the CAP Theorem, has been widely discussed ever since. In this paper, we review the CAP Theorem and situate it within the broader context of distributed computing theory. We then discuss the practical implications of the CAP Theorem, and explore some general techniques for coping with the inherent trade-offs that it implies
- âŠ