19 research outputs found

    How a rainbow coloring function can simulate wait-free handshaking

    Get PDF
    How to construct shared data objects is a fundamental issue in asynchronous concurrent systems, since these objects provide the means for communication and synchronization between processes in these systems. Constructions which guarantee that concurrent access to the shared object by processes is free from waiting are of particular interest, since they may help to increase the amount of parallelism in such systems. The problem of constructing a k-valued wait-free shared register out of binary subregisters of the same type where each write access consists of one subwrite (constructions with one-write) has received some attention, since it lies at the heart of studying lower bounds of the complexities of register constructions and trade-offs between them. The first such construction was for the safe register case which uses k binary safe registers and exploits the properties of a rainbow coloring function of a hypercube. The best known construction for the regular/atomic case uses (formula presented) binary regular/atomic registers. In this work we show how the rainbow coloring function can be extended to simulate a handshaking mechanism between the reader and the writer of the register, thus offering a solution for the atomic register case with one reader, which uses only 3k-2 binary registers. The lower bound for such a construction is k−1

    Toward self-stabilizing wait-free shared memory objects

    Get PDF
    Past research on fault tolerant distributed systems has focussed on either processor failures, ranging from benign crash failures to the malicious byzantine failure types, or on transient memory failures, which can suddenly corrupt the state of the system. An interesting question in the theory of distributed computing is whether one can device highly fault tolerant protocols which can tolerate both processor failures as well as transient errors. To answer this question we consider the construction of self-stabilizing wait-free shared memory objects. These objects occur naturally in distributed systems in which both processors and memory may be faulty. Our contribution in this paper is threefold. First, we propose a general definition of a self-stabilizing wait-free shared memory object that expresses safety guarantees even in the face of processor failures. Second, we show that within this framework one cannot construct a self-stabilizing single-reader single-writer regular bit from single-reader single-writer safe bits. This result leads us to postulate a self-stabilizing dual-reader single-writer safe bit with which, as a third contribution, we construct self-stabilizing regular and atomic registers

    How a rainbow coloring function can simulate wait-free handshaking

    No full text
    How to construct shared data objects is a fundamental issue in asynchronous concurrent systems, since these objects provide the means for communication and synchronization between processes in these systems. Constructions which guarantee that concurrent access to the shared object by processes is free from waiting are of particular interest, since they may help to increase the amount of parallelism in such systems. The problem of constructing a k-valued wait-free shared register out of binary subregisters of the same type where each write access consists of one subwrite (constructions with one-write) has received some attention, since it lies at the heart of studying lower bounds of the complexities of register constructions and trade-offs between them. The first such construction was for the safe register case which uses k binary safe registers and exploits the properties of a rainbow coloring function of a hypercube. The best known construction for the regular/atomic case uses (formula presented) binary regular/atomic registers. In this work we show how the rainbow coloring function can be extended to simulate a handshaking mechanism between the reader and the writer of the register, thus offering a solution for the atomic register case with one reader, which uses only 3k-2 binary registers. The lower bound for such a construction is k−1

    Wait-Free Programming for General Purpose Computations on Graphics Processors

    No full text
    The fact that graphics processors (GPUs) are today's most powerful computational hardware for the dollar has motivated researchers to utilize the ubiquitous and powerful GPUs for general-purpose computing. However, unlike CPUs, GPUs are optimized for processing 3D graphics (e.g., graphics rendering), a kind of data-parallel applications, and consequently, several GPUs do not support strong synchronization primitives to coordinate their cores. This prevents the GPUs from being deployed more widely for general-purpose computing. This paper aims at bridging the gap between the lack of strong synchronization primitives in the GPUs and the need for strong synchronization mechanisms in parallel applications. Based on the intrinsic features of typical GPU architectures, we construct strong synchronization objects such as wait-free and t-resilient read-modify-write objects for a general model of GPU architectures without hardware synchronization primitives such as test-and-set and compare-and-swap. Accesses to the wait-free objects have time complexity O(N), where N is the number of processes. The wait-free objects have the optimal space complexity O(N-2) . Our result demonstrates that it is possible to construct wait-free synchronization mechanisms for GPUs without strong synchronization primitives in hardware and that wait-free programming is possible for such GPUs

    How lock-free data structures perform in dynamic environments: Models and analyses

    Get PDF
    \ua9 Aras Atalar, Paul Renaud-Goud, and Philippas Tsigas.In this paper we present two analytical frameworks for calculating the performance of lock-free data structures. Lock-free data structures are based on retry loops and are called by application-specific routines. In contrast to previous work, we consider in this paper lock-free data structures in dynamic environments. The size of each of the retry loops, and the size of the application routines invoked in between, are not constant but may change dynamically. The new frameworks follow two different approaches. The first framework, the simplest one, is based on queuing theory. It introduces an average-based approach that facilitates a more coarse-grained analysis, with the benefit of being ignorant of size distributions. Because of this independence from the distribution nature it covers a set of complicated designs. The second approach, instantiated with an exponential distribution for the size of the application routines, uses Markov chains, and is tighter because it constructs stochastically the execution, step by step. Both frameworks provide a performance estimate which is close to what we observe in practice. We have validated our analysis on (i) several fundamental lock-free data structures such as stacks, queues, deques and counters, some of them employing helping mechanisms, and (ii) synthetic tests covering a wide range of possible lock-free designs. We show the applicability of our results by introducing new back-off mechanisms, tested in application contexts, and by designing an efficient memory management scheme that typical lock-free algorithms can utilize

    The Synchronization Power of Coalesced Memory Accesses

    No full text
    Multicore architectures have established themselves as the new generation of computer architectures. As part of the one core to many cores evolution, memory access mechanisms have advanced rapidly. Several new memory access mechanisms have been implemented in many modern commodity multicore architectures. By specifying how processing cores access shared memory, memory access mechanisms directly influence the synchronization capabilities of multicore architectures. Therefore, it is crucial to investigate the synchronization power of these new memory access mechanisms. This paper investigates the synchronization power of coalesced memory accesses, a family of memory access mechanisms introduced in recent large multicore architectures such as the Compute Unified Device Architecture (CUDA). We first define three memory access models to capture the fundamental features of the new memory access mechanisms. Subsequently, we prove the exact synchronization power of these models in terms of their consensus numbers. These tight results show that the coalesced memory access mechanisms can facilitate strong synchronization between the threads of multicore architectures, without the need of synchronization primitives other than reads and writes. In the case of the contemporary CUDA processors, our results imply that the coalesced memory access mechanisms have consensus numbers up to 64

    Wait-Free Programming for General Purpose Computations on Graphics Processors

    No full text
    The fact that graphics processors (GPUs) are today\u27s most powerful computational hardware for the dollar has motivated researchers to utilize the ubiquitous and powerful GPUs for general-purpose computing. However, unlike CPUs, GPUs are optimized for processing 3D graphics (e.g., graphics rendering), a kind of data-parallel applications, and consequently, several GPUs do not support strong synchronization primitives to coordinate their cores. This prevents the GPUs from being deployed more widely for general-purpose computing. This paper aims at bridging the gap between the lack of strong synchronization primitives in the GPUs and the need for strong synchronization mechanisms in parallel applications. Based on the intrinsic features of typical GPU architectures, we construct strong synchronization objects such as wait-free and t-resilient read-modify-write objects for a general model of GPU architectures without hardware synchronization primitives such as test-and-set and compare-and-swap. Accesses to the wait-free objects have time complexity O(N), where N is the number of processes. The wait-free objects have the optimal space complexity O(N-2) . Our result demonstrates that it is possible to construct wait-free synchronization mechanisms for GPUs without strong synchronization primitives in hardware and that wait-free programming is possible for such GPUs
    corecore