656 research outputs found
Enhanced Failure Detection Mechanism in MapReduce
The popularity of MapReduce programming model has increased interest in the research community for its improvement. Among the other directions, the point of fault tolerance, concretely the failure detection issue seems to be a crucial one, but that until now has not reached its satisfying level. Motivated by this, I decided to devote my main research during this period into having a prototype system architecture of MapReduce framework with a new failure detection service, containing both analytical (theoretical) and implementation part. I am confident that this work should lead the way for further contributions in detecting failures to any NoSQL App frameworks, and cloud storage systems in general
The Failure Detector Abstraction
This paper surveys the failure detector concept through two dimensions. First we study failure detectors as building blocks to simplify the design of reliable distributed algorithms. More specifically, we illustrate how failure detectors can factor out timing assumptions to detect failures in distributed agreement algorithms. Second, we study failure detectors as computability benchmarks. That is, we survey the weakest failure detector question and illustrate how failure detectors can be used to classify problems. We also highlights some limitations of the failure detector abstraction along each of the dimensions
Optimistic fair transaction processing in mobile ad-hoc networks
Mobile ad-hoc networks (MANETs) are unstable. Link errors, which are
considered as an exception in fixed-wired networks must be assumed to be the
default case in MANETs. Hence designing fault tolerant systems efficiently
offering transactional guarantees in these unstable environments is
considerably more complex. The efficient support for such guarantees is
essential for business applications, e.g. for the exchange of electronic
goods. This class of applications demands for transactional properties such as
money and goods atomicity. Within this technical report we present an
architecture, which allows for fair and atomic transaction processing in
MANETs, together with an associated application that enables exchange of
electronic tokens
Distributed eventual leader election in the crash-recovery and general omission failure models.
102 p.Distributed applications are present in many aspects of everyday life. Banking, healthcare or transportation are examples of such applications. These applications are built on top of distributed systems. Roughly speaking, a distributed system is composed of a set of processes that collaborate among them to achieve a common goal. When building such systems, designers have to cope with several issues, such as different synchrony assumptions and failure occurrence. Distributed systems must ensure that the delivered service is trustworthy.Agreement problems compose a fundamental class of problems in distributed systems. All agreement problems follow the same pattern: all processes must agree on some common decision. Most of the agreement problems can be considered as a particular instance of the Consensus problem. Hence, they can be solved by reduction to consensus. However, a fundamental impossibility result, namely (FLP), states that in an asynchronous distributed system it is impossible to achieve consensus deterministically when at least one process may fail. A way to circumvent this obstacle is by using unreliable failure detectors. A failure detector allows to encapsulate synchrony assumptions of the system, providing (possibly incorrect) information about process failures. A particular failure detector, called Omega, has been shown to be the weakest failure detector for solving consensus with a majority of correct processes. Informally, Omega lies on providing an eventual leader election mechanism
The Failure Detector Abstraction
A failure detector is a fundamental abstraction in distributed computing. This paper surveys this abstraction through two dimensions. First we study failure detectors as building blocks to simplify the design of reliable distributed algorithms. In particular, we illustrate how failure detectors can factor out timing assumptions to detect failures in distributed agreement algorithms. Second, we study failure detectors as computability benchmarks. That is, we survey the weakest failure detector question and illustrate how failure detectors can be used to classify problems. We also highlight some limitations of the failure detector abstraction along each of the dimensions
Strong Equivalence Relations for Iterated Models
The Iterated Immediate Snapshot model (IIS), due to its elegant geometrical
representation, has become standard for applying topological reasoning to
distributed computing. Its modular structure makes it easier to analyze than
the more realistic (non-iterated) read-write Atomic-Snapshot memory model (AS).
It is known that AS and IIS are equivalent with respect to \emph{wait-free
task} computability: a distributed task is solvable in AS if and only if it
solvable in IIS. We observe, however, that this equivalence is not sufficient
in order to explore solvability of tasks in \emph{sub-models} of AS (i.e.
proper subsets of its runs) or computability of \emph{long-lived} objects, and
a stronger equivalence relation is needed. In this paper, we consider
\emph{adversarial} sub-models of AS and IIS specified by the sets of processes
that can be \emph{correct} in a model run. We show that AS and IIS are
equivalent in a strong way: a (possibly long-lived) object is implementable in
AS under a given adversary if and only if it is implementable in IIS under the
same adversary. %This holds whether the object is one-shot or long-lived.
Therefore, the computability of any object in shared memory under an
adversarial AS scheduler can be equivalently investigated in IIS
Consensus on Transaction Commit
The distributed transaction commit problem requires reaching agreement on
whether a transaction is committed or aborted. The classic Two-Phase Commit
protocol blocks if the coordinator fails. Fault-tolerant consensus algorithms
also reach agreement, but do not block whenever any majority of the processes
are working. Running a Paxos consensus algorithm on the commit/abort decision
of each participant yields a transaction commit protocol that uses 2F +1
coordinators and makes progress if at least F +1 of them are working. In the
fault-free case, this algorithm requires one extra message delay but has the
same stable-storage write delay as Two-Phase Commit. The classic Two-Phase
Commit algorithm is obtained as the special F = 0 case of the general Paxos
Commit algorithm.Comment: Original at
http://research.microsoft.com/research/pubs/view.aspx?tr_id=70
- âŠ