143 research outputs found

    Power and limits of distributed computing shared memory models

    Get PDF
    Due to the advent of multicore machines, shared memory distributed computing models taking into account asynchrony and process crashes are becoming more and more important. This paper visits some of the models for these systems, and analyses their properties from a computability point of view. Among them, the snapshot model and the iterated model are particularly investigated. The paper visits also several approaches that have been proposed to model crash failures. Among them, the wait-free case where any number of processes can crash is fundamental. The paper also considers models where up to t processes can crash, and where the crashes are not independent. The aim of this survey is to help the reader to better understand recent advances on what is known about the power and limits of distributed computing shared memory models and their underlying mathematics.Ce rapport est une introduction au modèles de calcul asynchrone pour les systèmes à mémoire partagée

    The notion of Timed Registers and its application to Indulgent Synchronization

    Get PDF
    A new type of shared object, called timed register, is proposed and used to design indulgent timing-based algorithms.A timed register generalizes the notion of an atomic register as follows: if a process invokes two consecutive operations on the same timed register which are a read followed by a write, then the write operation is executed only if it is invoked at most d time units after the read operation, where d is defined as part of the read operation. In this context, a timing-based algorithm is an algorithm whose correctness relies on the existence of a bound Δ\Delta such that any pair of consecutive constrained read and write operations issued by the same process on the same timed register are separated by at most Δ\Delta time units. An indulgent algorithm is an algorithm that always guarantees the safety properties, and ensures the liveness property as soon as the timing assumptions are satisfied. The usefulness of this new type of shared object is demonstrated by presenting simple and elegant indulgent timing-based algorithms that solve the mutual exclusion, ℓ\ell-exclusion, adaptive renaming, test&set, and consensus problems. Interestingly, timed registers are universal objects in systems with process crashes and transient timing failures (i.e., they allow building any concurrent object with a sequential specification). The paper also suggests connections with schedulers and contention managers

    On the Power of Rounds : Explorations of the Heard-Of Model

    Get PDF
    Distributed computing studies which problems can be solved by communicating processes -- computers, people,.... Because communication can take many shapes, and because of its uncertainty, lots of different models exist. So many that it's easy to get lost. One way to deal with this overabundance constrains processes to use rounds: they repeatedly broadcast a message tagged with their current round number, wait for messages with this same round number, and then use them to compute their next state and change round. The Heard-Of model leverages this idea through heard-of predicates, which constrain which messages is received at which round. Yet this model lacks the attention that it deserves from the research community. I believe the reason lies on the following three unsolved problems: how to find the heard-of predicate corresponding to a given model, is anything lost in this translation, and how to prove general results on heard-of predicates. This thesis addresses all three

    Characterizing Consensus in the Heard-Of Model

    Get PDF
    The Heard-Of model is a simple and relatively expressive model of distributed computation. Because of this, it has gained a considerable attention of the verification community. We give a characterization of all algorithms solving consensus in a fragment of this model. The fragment is big enough to cover many prominent consensus algorithms. The characterization is purely syntactic: it is expressed in terms of some conditions on the text of the algorithm

    A Prescription for Partial Synchrony

    Get PDF
    Algorithms in message-passing distributed systems often require partial synchrony to tolerate crash failures. Informally, partial synchrony refers to systems where timing bounds on communication and computation may exist, but the knowledge of such bounds is limited. Traditionally, the foundation for the theory of partial synchrony has been real time: a time base measured by counting events external to the system, like the vibrations of Cesium atoms or piezoelectric crystals. Unfortunately, algorithms that are correct relative to many real-time based models of partial synchrony may not behave correctly in empirical distributed systems. For example, a set of popular theoretical models, which we call M_*, assume (eventual) upper bounds on message delay and relative process speeds, regardless of message size and absolute process speeds. Empirical systems with bounded channel capacity and bandwidth cannot realize such assumptions either natively, or through algorithmic constructions. Consequently, empirical deployment of the many M_*-based algorithms risks anomalous behavior. As a result, we argue that real time is the wrong basis for such a theory. Instead, the appropriate foundation for partial synchrony is fairness: a time base measured by counting events internal to the system, like the steps executed by the processes. By way of example, we redefine M_* models with fairness-based bounds and provide algorithmic techniques to implement fairness-based M_* models on a significant subset of the empirical systems. The proposed techniques use failure detectors — system services that provide hints about process crashes — as intermediaries that preserve the fairness constraints native to empirical systems. In effect, algorithms that are correct in M_* models are now proved correct in such empirical systems as well. Demonstrating our results requires solving three open problems. (1) We propose the first unified mathematical framework based on Timed I/O Automata to specify empirical systems, partially synchronous systems, and algorithms that execute within the aforementioned systems. (2) We show that crash tolerance capabilities of popular distributed systems can be denominated exclusively through fairness constraints. (3) We specify exemplar system models that identify the set of weakest system models to implement popular failure detectors

    k-Set Agreement in Communication Networks with Omission Faults

    Get PDF
    We consider an arbitrary communication network G where at most f messages can be lost at each round, and consider the classical k-set agreement problem in this setting. We characterize exactly for which f the k-set agreement problem can be solved on G. The case with k = 1, that is the Consensus problem, has first been introduced by Santoro and Widmayer in 1989, the characterization is already known from [Coulouma/Godard/Peters, TCS, 2015]. As a first contribution, we present a detailed and complete characterization for the 2-set problem. The proof of the impossibility result uses topological methods. We introduce a new subdivision approach for these topological methods that is of independent interest. In the second part, we show how to extend to the general case with k in N. This characterization is the first complete characterization for this kind of synchronous message passing model, a model that is a subclass of the family of oblivious message adversaries

    Notes on Theory of Distributed Systems

    Full text link
    Notes for the Yale course CPSC 465/565 Theory of Distributed Systems

    A Prescription for Partial Synchrony

    Get PDF
    Algorithms in message-passing distributed systems often require partial synchrony to tolerate crash failures. Informally, partial synchrony refers to systems where timing bounds on communication and computation may exist, but the knowledge of such bounds is limited. Traditionally, the foundation for the theory of partial synchrony has been real time: a time base measured by counting events external to the system, like the vibrations of Cesium atoms or piezoelectric crystals. Unfortunately, algorithms that are correct relative to many real-time based models of partial synchrony may not behave correctly in empirical distributed systems. For example, a set of popular theoretical models, which we call M_*, assume (eventual) upper bounds on message delay and relative process speeds, regardless of message size and absolute process speeds. Empirical systems with bounded channel capacity and bandwidth cannot realize such assumptions either natively, or through algorithmic constructions. Consequently, empirical deployment of the many M_*-based algorithms risks anomalous behavior. As a result, we argue that real time is the wrong basis for such a theory. Instead, the appropriate foundation for partial synchrony is fairness: a time base measured by counting events internal to the system, like the steps executed by the processes. By way of example, we redefine M_* models with fairness-based bounds and provide algorithmic techniques to implement fairness-based M_* models on a significant subset of the empirical systems. The proposed techniques use failure detectors — system services that provide hints about process crashes — as intermediaries that preserve the fairness constraints native to empirical systems. In effect, algorithms that are correct in M_* models are now proved correct in such empirical systems as well. Demonstrating our results requires solving three open problems. (1) We propose the first unified mathematical framework based on Timed I/O Automata to specify empirical systems, partially synchronous systems, and algorithms that execute within the aforementioned systems. (2) We show that crash tolerance capabilities of popular distributed systems can be denominated exclusively through fairness constraints. (3) We specify exemplar system models that identify the set of weakest system models to implement popular failure detectors

    Randomized versus Deterministic Implementations of Concurrent Data Structures

    Get PDF
    One of the key trends in computing over the past two decades has been increased distribution, both at the processor level, where multi-core architectures are now the norm, and at the system level, where many key services are currently distributed overmultiple machines. Thus, understanding the power and limitations of computing in a concurrent, distributed setting is one of the major challenges in Computer Science. In this thesis, we analyze the complexity of implementing concurrent data structures in asynchronous shared memory systems. We focus on the complexity of a classic distributed coordination task called renaming, in which a set of processes need to pick distinct names from a small set of identifiers. We present the first tight bounds for the time complexity of this problem, both for deterministic and randomized implementations, solving a long-standing open problem in the field. For deterministic algorithms, we prove a tight linear lower bound; for randomized solutions, we provide logarithmic upper and lower bounds on time complexity. Together, these results show an exponential separation between deterministic and randomized renaming solutions. Importantly, the lower bounds extend to implementations of practical shared-memory data structures, such as queues, stacks, and counters. From a technical perspective, this thesis highlights new connections between the distributed renaming problem and other fundamental objects, such as sorting networks, mutual exclusion, and counters. In particular, we show that sorting networks can be used to obtain optimal randomized solutions to renaming, and that, in turn, the existence of these solutions implies a linear lower bound on the complexity of the problem. In sum, the results in this thesis suggest that deterministic implementations of shared-memory data structures do not scale well in terms of worst-case time complexity. On the positive side, we emphasize randomization as a natural alternative, which can circumvent the deterministic lower bounds with high probability. Thus, a promising direction for future work is to extend our randomized renaming techniques to obtain efficient implementations of concurrent data structures
    • …
    corecore