109 research outputs found

    Nontrivial and Universal Helping for Wait-Free Queues and Stacks

    Get PDF
    This paper studies two approaches to formalize helping in wait-free implementations of shared objects. The first approach is based on operation valency, and it allows us to make the important distinction between trivial and nontrivial helping. We show that a wait-free implementation of a queue from common2 objects (e.g., Test&Set) requires nontrivial helping. In contrast, there is a wait-free implementation of a stack from Common2 objects with only trivial helping. This separation might shed light on the difficulty of implementing a queue from Common2 objects. The other approach formalizes the helping mechanism employed by Herlihy\u27s universal wait-free construction and is based on having an operation by one process restrict the possible linearizations of operations by other processes. We show that objects possessing such universal helping can be used to solve consensus

    On Helping and Stacks

    Get PDF
    International audienceA concurrent algorithm exhibits helping when one process performs work on behalf of other processes. More formally, helping is observed when the order of some operation in a linearization is fixed by a step of another process. In this paper, we show that no wait-free linearizable implementation of a stack using read, write, compare&swap and fetch&add operations can be help-free, correcting a mistake in an earlier proof by Censor-Hillel et al

    Impossibility of Strongly-Linearizable Message-Passing Objects via Simulation by Single-Writer Registers

    Get PDF
    A key way to construct complex distributed systems is through modular composition of linearizable concurrent objects. A prominent example is shared registers, which have crash-tolerant implementations on top of message-passing systems, allowing the advantages of shared memory to carry over to message-passing. Yet linearizable registers do not always behave properly when used inside randomized programs. A strengthening of linearizability, called strong linearizability, has been shown to preserve probabilistic behavior, as well as other "hypersafety" properties. In order to exploit composition and abstraction in message-passing systems, it is crucial to know whether there exist strongly-linearizable implementations of registers in message-passing. This paper answers the question in the negative: there are no strongly-linearizable fault-tolerant message-passing implementations of multi-writer registers, max-registers, snapshots or counters. This result is proved by reduction from the corresponding result by Helmi et al. The reduction is a novel extension of the BG simulation that connects shared-memory and message-passing, supports long-lived objects, and preserves strong linearizability. The main technical challenge arises from the discrepancy between the potentially minuscule fraction of failures to be tolerated in the simulated message-passing algorithm and the large fraction of failures that can afflict the simulating shared-memory system. The reduction is general and can be viewed as the inverse of the ABD simulation of shared memory in message-passing

    Novel Non-Blocking Approach for a Concurrent Heap

    Full text link
    We present a non-blocking algorithm for a concurrent heap in asynchronous shared memory multiprocessors. Processors in these machines often execute instructions at varying speeds and are subject to arbitrarily long delays. Our implementation supports Non-blocking Insert, Delete, and Find-Min operations on a heap. Non-blocking techniques avoid the drawbacks associated with mutual exclusion and also admit improved parallelism. Insert and DeleteMin operations in heap take more than one atomic instruction to complete, thus, it is possible that a new operation may be started before the previous one completes, leaving heap inconsistent between operations. We have represented heap as an array of pointers. Any modification to the heap is done by using single-word Compare&Swap instructions. If all update operations modify different parts of the heap, they run completely concurrently. Our approach guarantees lock-freedom for all concurrent threads and wait-freedom if threads execute operations on different nodes of the heap

    Strongly Linearizable Linked List and Queue

    Get PDF

    Coûts de Synchronization dans les Programmes Parallèles et les Structures de Donnèes Simultanées

    Get PDF
    To use the computational power of modern computing machines, we have to deal with concurrent programs. Writing efficient concurrent programs is notoriously difficult, primarily due to the need of harnessing synchronization costs. In this thesis, we focus on synchronization costs in parallel programs and concurrent data structures.First, we present a novel granularity control technique for parallel programs designed for the dynamic multithreading environment. Then in the context of concurrent data structures, we consider the notion of concurrency-optimality and propose the first implementation of a concurrency-optimal binary search tree that, intuitively, accepts a concurrent schedule if and only if the schedule is correct. Also, we propose parallel combining, a technique that enables efficient implementations of concurrent data structures from their parallel batched counterparts. We validate the proposed techniques via experimental evaluations showing superior or comparable performance with respect to state-of-the-art algorithms.From a more formal perspective, we consider the phenomenon of helping in concurrent data structures. Intuitively, helping is observed when the order of some operation in a linearization is fixed by a step of another process. We show that no wait-free linearizable implementation of stack using read, write, compare&swap and fetch&add primitives can be help-free, correcting a mistake in an earlier proof by Censor-Hillel et al. Finally, we propose a simple way to analytically predict the throughput of data structures based on coarse-grained locking.Pour utiliser la puissance de calcul des ordinateurs modernes, nous devons écrire des programmes concurrents. L’écriture de programme concurrent efficace est notoirement difficile, principalement en raison de la nécessité de gérer les coûts de synchronization. Dans cette thèse, nous nous concentrons sur les coûts de synchronisation dans les programmes parallèles et les structures de données concurrentes.D’abord, nous présentons une nouvelle technique de contrôle de la granularité pour les programmes parallèles conçus pour un environnement de multi-threading dynamique. Ensuite, dans le contexte des structures de données concurrentes, nous considérons la notion d’optimalité de concurrence (concurrency-optimality) et proposons la première implémentation concurrence-optimal d’un arbre binaire de recherche qui, intuitivement, accepte un ordonnancement concurrent si et seulement si l’ordonnancement est correct. Nous proposons aussi la combinaison parallèle (parallel combining), une technique qui permet l’implémentation efficace des structures de données concurrences à partir de leur version parallèle par lots. Nous validons les techniques proposées par une évaluation expérimentale, qui montre des performances supérieures ou comparables à celles des algorithmes de l’état de l’art.Dans une perspective plus formelle, nous considérons le phénomène d’assistance (helping) dans des structures de données concurrentes. On observe un phénomène d’assistance quand l’ordre d’une opération d’un processus dans une trace linéarisée est fixée par une étape d’un autre processus. Nous montrons qu’aucune implémentation sans attente (wait-free) linéarisable d’une pile utilisant les primitives read, write, compare&swap et fetch&add ne peut être “sans assistance” (help-free), corrigeant une erreur dans une preuve antérieure de Censor-Hillel et al. Finalement, nous proposons une façon simple de prédire analytiquement le débit (throughput) des structures de données basées sur des verrous à gros grains

    Empowering Cloud Data Centers with Network Programmability

    Get PDF
    Cloud data centers are a critical infrastructure for modern Internet services such as web search, social networking and e-commerce. However, the gradual slow-down of Moore’s law has put a burden on the growth of data centers’ performance and energy efficiency. In addition, the increasing of millisecond-scale and microsecond-scale tasks also bring higher requirements to the throughput and latency for the cloud applications. Today’s server-based solutions are hard to meet the performance requirements in many scenarios like resource management, scheduling, high-speed traffic monitoring and testing. In this dissertation, we study these problems from a network perspective. We investigate a new architecture that leverages the programmability of new-generation network switches to improve the performance and reliability of clouds. As programmable switches only provide very limited memory and functionalities, we exploit compact data structures and deeply co-design software and hardware to best utilize the resource. More specifically, this dissertation presents four systems: (i) NetLock: A new centralized lock management architecture that co-designs programmable switches and servers to simultaneously achieve high performance and rich policy support. It provides orders-of-magnitude higher throughput than existing systems with microsecond-level latency, and supports many commonly-used policies such as performance isolation. (ii) HCSFQ: A scalable and practical solution to implement hierarchical fair queueing on commodity hardware at line rate. Instead of relying on a hierarchy of queues with complex queue management, HCSFQ does not keep per-flow states and uses only one queue to achieve hierarchical fair queueing. (iii) AIFO: A new approach for programmable packet scheduling that only uses a single FIFO queue. AIFO utilizes an admission control mechanism to approximate PIFO which is theoretically ideal but hard to implement with commodity devices. (iv) Lumina: A tool that enables fine-grained analysis of hardware network stack. By exploiting network programmability to emulate various network scenarios, Lumina is able to help users understand the micro-behaviors of hardware network stacks
    • …
    corecore