Search CORE

8 research outputs found

Scalable hashing for shared memory supercomputers

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2011
Field of study

Scheduling Irregular Workloads on GPUs

Author: Troendle David Arthur
Publication venue: eGrove
Publication date: 01/01/2019
Field of study

This doctoral research aims at understanding the nature of the overhead for data irregular GPU workloads, proposing a solution, and examining the consequences of the result. We propose a novel, retry-free GPU workload scheduler for irregular workloads. When used in a Breadth First Search (BFS) algorithm, the proposed simple, monolithic concurrent queue scales to within 10% of ideal scalability on AMD’s Fiji GPU with 14,336 active threads. The dissertation presents an important finding that the retry overhead associated with Compare and Swap (CAS) operations is the principle reason why concurrent queues do not scale well as the number of clients increases in a massively multi-threaded environment

eGrove (Univ. of Mississippi)

Verifying linearisability: A comparative survey

Author: Derrick J
Dongol B
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/09/2015
Field of study

Linearisability is a key correctness criterion for concurrent data structures, ensuring that each history of the concurrent object under consideration is consistent with respect to a history of the corresponding abstract data structure. Linearisability allows concurrent (i.e., overlapping) operation calls to take effect in any order, but requires the real-time order of nonoverlapping to be preserved. The sophisticated nature of concurrent objects means that linearisability is difficult to judge, and hence, over the years, numerous techniques for verifying lineasizability have been developed using a variety of formal foundations such as data refinement, shape analysis, reduction, etc. However, because the underlying framework, nomenclature, and terminology for each method is different, it has become difficult for practitioners to evaluate the differences between each approach, and hence, evaluate the methodology most appropriate for verifying the data structure at hand. In this article, we compare the major of methods for verifying linearisability, describe the main contribution of each method, and compare their advantages and limitations

Crossref

Brunel University Research Archive

Design and verification of lock-free parallel algorithms

Author: Gao Hui
Publication venue: s.n.
Publication date: 01/01/2005
Field of study

ARTS repository - University of Groningen

A general lock-free algorithm using compare-and-swap

Author: Gao H.,
Hesselink W.H.,
Publication venue
Publication date: 01/01/2007
Field of study

The compare-and-swap register (CAS) is a synchronization primitive for lock-free algorithms. Most uses of it, however, suffer from the so-called ABA problem. The simplest and most efficient solution to the ABA problem is to include a tag with the memory location such that the tag is incremented with each update of the target location. This solution, however, is theoretically unsound and has limited applicability. This paper presents a general lock-free pattern that is based on the synchronization primitive CAS without causing ABA problem or problems with wrap around. It can be used to provide lock-free functionality for any data type. Our algorithm is a CAS variation of a LL/SC methodology for lock-free transformation. The basis of our techniques is to poll different locations on reading and writing objects in such a way that the consistency of an object can be checked by its location instead of its tag. It consists of simple code that can be easily implemented using C-like languages. A true problem of lock-free algorithms is that they are hard to design correctly, which even holds for apparently straightforward algorithms. We therefore develop a reduction theorem that enables us to reason about the general lock-free algorithm to be designed on a higher level than the synchronization primitives. The reduction theorem is based on refinement mappings, and has been verified with the higher-order interactive theorem prover PVS. Using the reduction theorem, fewer invariants are required and some invariants are easier to discover and formulate without considering the internal structure of the final implementation.

CiteSeerX

Elsevier - Publisher Connector

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

University of Groningen Digital Archive

Dissertations of the University of Groningen

A general lock-free algorithm using compare-and-swap

Author: Abadi
Gao
H. Gao
Herlihy
Herlihy
Hesselink
Lamport
Manna
W.H. Hesselink
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref