8 research outputs found

    Scalable hashing for shared memory supercomputers

    Full text link

    Scheduling Irregular Workloads on GPUs

    Get PDF
    This doctoral research aims at understanding the nature of the overhead for data irregular GPU workloads, proposing a solution, and examining the consequences of the result. We propose a novel, retry-free GPU workload scheduler for irregular workloads. When used in a Breadth First Search (BFS) algorithm, the proposed simple, monolithic concurrent queue scales to within 10% of ideal scalability on AMD’s Fiji GPU with 14,336 active threads. The dissertation presents an important finding that the retry overhead associated with Compare and Swap (CAS) operations is the principle reason why concurrent queues do not scale well as the number of clients increases in a massively multi-threaded environment

    Verifying linearisability: A comparative survey

    Get PDF
    Linearisability is a key correctness criterion for concurrent data structures, ensuring that each history of the concurrent object under consideration is consistent with respect to a history of the corresponding abstract data structure. Linearisability allows concurrent (i.e., overlapping) operation calls to take effect in any order, but requires the real-time order of nonoverlapping to be preserved. The sophisticated nature of concurrent objects means that linearisability is difficult to judge, and hence, over the years, numerous techniques for verifying lineasizability have been developed using a variety of formal foundations such as data refinement, shape analysis, reduction, etc. However, because the underlying framework, nomenclature, and terminology for each method is different, it has become difficult for practitioners to evaluate the differences between each approach, and hence, evaluate the methodology most appropriate for verifying the data structure at hand. In this article, we compare the major of methods for verifying linearisability, describe the main contribution of each method, and compare their advantages and limitations

    A general lock-free algorithm using compare-and-swap

    Get PDF
    The compare-and-swap register (CAS) is a synchronization primitive for lock-free algorithms. Most uses of it, however, suffer from the so-called ABA problem. The simplest and most efficient solution to the ABA problem is to include a tag with the memory location such that the tag is incremented with each update of the target location. This solution, however, is theoretically unsound and has limited applicability. This paper presents a general lock-free pattern that is based on the synchronization primitive CAS without causing ABA problem or problems with wrap around. It can be used to provide lock-free functionality for any data type. Our algorithm is a CAS variation of a LL/SC methodology for lock-free transformation. The basis of our techniques is to poll different locations on reading and writing objects in such a way that the consistency of an object can be checked by its location instead of its tag. It consists of simple code that can be easily implemented using C-like languages. A true problem of lock-free algorithms is that they are hard to design correctly, which even holds for apparently straightforward algorithms. We therefore develop a reduction theorem that enables us to reason about the general lock-free algorithm to be designed on a higher level than the synchronization primitives. The reduction theorem is based on refinement mappings, and has been verified with the higher-order interactive theorem prover PVS. Using the reduction theorem, fewer invariants are required and some invariants are easier to discover and formulate without considering the internal structure of the final implementation.
    corecore