Search CORE

48 research outputs found

Lock cohorting: A general technique for designing NUMA locks

Author: Dice David
Marathe Virendra J.
Shavit Nir N.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2012
Field of study

Multicore machines are quickly shifting to NUMA and CC-NUMA architectures, making scalable NUMA-aware locking algorithms, ones that take into account the machines' non-uniform memory and caching hierarchy, ever more important. This paper presents lock cohorting, a general new technique for designing NUMA-aware locks that is as simple as it is powerful. Lock cohorting allows one to transform any spin-lock algorithm, with minimal non-intrusive changes, into scalable NUMA-aware spin-locks. Our new cohorting technique allows us to easily create NUMA-aware versions of the TATAS-Backoff, CLH, MCS, and ticket locks, to name a few. Moreover, it allows us to derive a CLH-based cohort abortable lock, the first NUMA-aware queue lock to support abortability. We empirically compared the performance of cohort locks with prior NUMA-aware and classic NUMA-oblivious locks on a synthetic micro-benchmark, a real world key-value store application memcached, as well as the libc memory allocator. Our results demonstrate that cohort locks perform as well or better than known locks when the load is low and significantly out-perform them as the load increases

CiteSeerX

DSpace@MIT

Crossref

Efficient Multi-Word Compare and Swap

Author: Guerraoui Rachid
Kogan Alex
Marathe Virendra J.
Zablotchi Igor
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 34th International Symposium on Distributed Computing (DISC 2020)
Publication date: 01/01/2020
Field of study

Atomic lock-free multi-word compare-and-swap (MCAS) is a powerful tool for designing concurrent algorithms. Yet, its widespread usage has been limited because lock-free implementations of MCAS make heavy use of expensive compare-and-swap (CAS) instructions. Existing MCAS implementations indeed use at least 2k+1 CASes per k-CAS. This leads to the natural desire to minimize the number of CASes required to implement MCAS. We first prove in this paper that it is impossible to "pack" the information required to perform a k-word CAS (k-CAS) in less than k locations to be CASed. Then we present the first algorithm that requires k+1 CASes per call to k-CAS in the common uncontended case. We implement our algorithm and show that it outperforms a state-of-the-art baseline in a variety of benchmarks in most considered workloads. We also present a durably linearizable (persistent memory friendly) version of our MCAS algorithm using only 2 persistence fences per call, while still only requiring k+1 CASes per k-CAS

Infoscience - École polytechnique fédérale de Lausanne

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Virtual Machine Support for Many-Core Architectures: Decoupling Abstract from Concrete Concurrency Models

Author: A. Peymandoust
Alastair R. Beresford
Andreas Gal Albert Noll
Bram Adams
Bratin Saha
Carl Hewitt
Charles Antony Richard Hoare
Charles R. Johns
Chen-Yong Cher
Colin Blundell
David Ungar
David Wentzlaff
Doug Lea
ECMA International
Edward A. Lee
freescale semiconductor
Georg Sorst
Gul Agha
Hans Schippers
Haris Volos
Intel Corporation
James Gosling
Jim Gray
John A. Trono
John S. Danaher
John Zigman
Jos'e M. Piquer
Kevin Casey
Kevin Williams
Larry Seiler
Lukasz Ziarek
M. Anton Ertl
Mark S. Miller
Maurice Herlihy
Michael Haupt
Michael R. Marty
Nir Shavit
Pascal Costanza
Philipp Haller
Rajesh K. Karmani
Robert D. Blumofe
Robert Virding
Simon Gay
Sriram Srinivasan
Stefan Marr
Stefan Marr
Stijn Timbermont
Theo D'Hondt
Thomas Kistler
Tom Van Cutsem
Uwe Kastens
Vijay A. Saraswat
Virendra J. Marathe
Wenzhang Zhu
Wolfgang De Meuter
Xu Wang
Yaoqing Gao
Publication venue: 'Open Publishing Association'
Publication date: 01/02/2010
Field of study

The upcoming many-core architectures require software developers to exploit concurrency to utilize available computational power. Today's high-level language virtual machines (VMs), which are a cornerstone of software development, do not provide sufficient abstraction for concurrency concepts. We analyze concrete and abstract concurrency models and identify the challenges they impose for VMs. To provide sufficient concurrency support in VMs, we propose to integrate concurrency operations into VM instruction sets. Since there will always be VMs optimized for special purposes, our goal is to develop a methodology to design instruction sets with concurrency support. Therefore, we also propose a list of trade-offs that have to be investigated to advise the design of such instruction sets. As a first experiment, we implemented one instruction set extension for shared memory and one for non-shared memory concurrency. From our experimental results, we derived a list of requirements for a full-grown experimental environment for further research

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Kent Academic Repository

Efficient Nonblocking Software Transactional Memory

Author: Virendra J. Marathe
Publication venue: ACM Press
Publication date: 01/01/2007
Field of study

Foundational transactional memory research grew out of researc

CiteSeerX

Categories and Subject Descriptors [D.1.3 Concurrent Programming]: Programming Abstractions General Terms Algorithms, Languages

Author: Virendra J. Marathe
Publication venue
Publication date
Field of study

Transactional memory is a powerful programming abstraction that enables a programmer to turn a complex, composite collection of statements into an atomic operation. Previous work usually expresses this abstraction as an atomic block, which offers mutua

CiteSeerX

Toward High Performance Nonblocking Software Transactional Memory

Author: Mark Moir
Virendra J. Marathe
Publication venue
Publication date: 01/01/2008
Field of study

Substantial advances in STM performance in recent years have mostly focused on blocking systems. We describe our work integrating the most important techniques and optimizations emerging from the recent work on blocking STMs into several variants of a nonblocking STM. In particular, our design is based on the philosophy of keeping the common, contention free execution path as simple (consequently fast) as possible, while resorting to the more expensive data displacement and metadata management only in situations where transactions have problems making forward progress. We employ novel ownership “stealing ” and metadata management techniques in our nonblocking STM to enable several recent blocking STM optimizations such as timestamp-based validation and ownership release via store instructions, all leading to a more streamlined and efficient fast path. We present an undo log (eager versioning) variant of our STM, as well as two redo log (lazy versioning) variants, the latter of which are based on the two ownership acquisition techniques (namely eager and lazy) for writes made by transactions. Experimental results show that our efforts have improved the performance of nonblocking STMs up to the level of being competitive with the state-of-the-art blocking STMs such as TL2

CiteSeerX