656 research outputs found

    Enhanced Failure Detection Mechanism in MapReduce

    Get PDF
    The popularity of MapReduce programming model has increased interest in the research community for its improvement. Among the other directions, the point of fault tolerance, concretely the failure detection issue seems to be a crucial one, but that until now has not reached its satisfying level. Motivated by this, I decided to devote my main research during this period into having a prototype system architecture of MapReduce framework with a new failure detection service, containing both analytical (theoretical) and implementation part. I am confident that this work should lead the way for further contributions in detecting failures to any NoSQL App frameworks, and cloud storage systems in general

    The Failure Detector Abstraction

    Full text link
    This paper surveys the failure detector concept through two dimensions. First we study failure detectors as building blocks to simplify the design of reliable distributed algorithms. More specifically, we illustrate how failure detectors can factor out timing assumptions to detect failures in distributed agreement algorithms. Second, we study failure detectors as computability benchmarks. That is, we survey the weakest failure detector question and illustrate how failure detectors can be used to classify problems. We also highlights some limitations of the failure detector abstraction along each of the dimensions

    Optimistic fair transaction processing in mobile ad-hoc networks

    Get PDF
    Mobile ad-hoc networks (MANETs) are unstable. Link errors, which are considered as an exception in fixed-wired networks must be assumed to be the default case in MANETs. Hence designing fault tolerant systems efficiently offering transactional guarantees in these unstable environments is considerably more complex. The efficient support for such guarantees is essential for business applications, e.g. for the exchange of electronic goods. This class of applications demands for transactional properties such as money and goods atomicity. Within this technical report we present an architecture, which allows for fair and atomic transaction processing in MANETs, together with an associated application that enables exchange of electronic tokens

    Distributed eventual leader election in the crash-recovery and general omission failure models.

    Get PDF
    102 p.Distributed applications are present in many aspects of everyday life. Banking, healthcare or transportation are examples of such applications. These applications are built on top of distributed systems. Roughly speaking, a distributed system is composed of a set of processes that collaborate among them to achieve a common goal. When building such systems, designers have to cope with several issues, such as different synchrony assumptions and failure occurrence. Distributed systems must ensure that the delivered service is trustworthy.Agreement problems compose a fundamental class of problems in distributed systems. All agreement problems follow the same pattern: all processes must agree on some common decision. Most of the agreement problems can be considered as a particular instance of the Consensus problem. Hence, they can be solved by reduction to consensus. However, a fundamental impossibility result, namely (FLP), states that in an asynchronous distributed system it is impossible to achieve consensus deterministically when at least one process may fail. A way to circumvent this obstacle is by using unreliable failure detectors. A failure detector allows to encapsulate synchrony assumptions of the system, providing (possibly incorrect) information about process failures. A particular failure detector, called Omega, has been shown to be the weakest failure detector for solving consensus with a majority of correct processes. Informally, Omega lies on providing an eventual leader election mechanism

    The Failure Detector Abstraction

    Get PDF
    A failure detector is a fundamental abstraction in distributed computing. This paper surveys this abstraction through two dimensions. First we study failure detectors as building blocks to simplify the design of reliable distributed algorithms. In particular, we illustrate how failure detectors can factor out timing assumptions to detect failures in distributed agreement algorithms. Second, we study failure detectors as computability benchmarks. That is, we survey the weakest failure detector question and illustrate how failure detectors can be used to classify problems. We also highlight some limitations of the failure detector abstraction along each of the dimensions

    Strong Equivalence Relations for Iterated Models

    Full text link
    The Iterated Immediate Snapshot model (IIS), due to its elegant geometrical representation, has become standard for applying topological reasoning to distributed computing. Its modular structure makes it easier to analyze than the more realistic (non-iterated) read-write Atomic-Snapshot memory model (AS). It is known that AS and IIS are equivalent with respect to \emph{wait-free task} computability: a distributed task is solvable in AS if and only if it solvable in IIS. We observe, however, that this equivalence is not sufficient in order to explore solvability of tasks in \emph{sub-models} of AS (i.e. proper subsets of its runs) or computability of \emph{long-lived} objects, and a stronger equivalence relation is needed. In this paper, we consider \emph{adversarial} sub-models of AS and IIS specified by the sets of processes that can be \emph{correct} in a model run. We show that AS and IIS are equivalent in a strong way: a (possibly long-lived) object is implementable in AS under a given adversary if and only if it is implementable in IIS under the same adversary. %This holds whether the object is one-shot or long-lived. Therefore, the computability of any object in shared memory under an adversarial AS scheduler can be equivalently investigated in IIS

    Consensus on Transaction Commit

    Full text link
    The distributed transaction commit problem requires reaching agreement on whether a transaction is committed or aborted. The classic Two-Phase Commit protocol blocks if the coordinator fails. Fault-tolerant consensus algorithms also reach agreement, but do not block whenever any majority of the processes are working. Running a Paxos consensus algorithm on the commit/abort decision of each participant yields a transaction commit protocol that uses 2F +1 coordinators and makes progress if at least F +1 of them are working. In the fault-free case, this algorithm requires one extra message delay but has the same stable-storage write delay as Two-Phase Commit. The classic Two-Phase Commit algorithm is obtained as the special F = 0 case of the general Paxos Commit algorithm.Comment: Original at http://research.microsoft.com/research/pubs/view.aspx?tr_id=70
    • 

    corecore