Search CORE

14 research outputs found

Brief Announcement: Jiffy: A Fast, Memory Efficient, Wait-Free Multi-Producers Single-Consumer Queue

Author: Adas Dolev
Friedman Roy
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 34th International Symposium on Distributed Computing (DISC 2020)
Publication date: 01/01/2020
Field of study

In applications such as sharded data processing systems, data flow programming and load sharing applications, multiple concurrent data producers are feeding requests into the same data consumer. This can be naturally realized through concurrent queues, where each consumer pulls its tasks from its dedicated queue. For scalability, wait-free queues are often preferred over lock based structures. The vast majority of wait-free queue implementations, and even lock-free ones, support the multi-producer multi-consumer model. Yet, this comes at a premium, since implementing wait-free multi-producer multi-consumer queues requires utilizing complex helper data structures. The latter increases the memory consumption of such queues and limits their performance and scalability. Additionally, many such designs employ (hardware) cache unfriendly memory access patterns. In this work we study the implementation of wait-free multi-producer single-consumer queues. Specifically, we propose Jiffy, an efficient memory frugal novel wait-free multi-producer single-consumer queue and formally prove its correctness. We then compare the performance and memory requirements of Jiffy with other state of the art lock-free and wait-free queues. We show that indeed Jiffy can maintain good performance with up to 128 threads, delivers better throughput than other constructions we compared against, and consumes less memory

Dagstuhl Research Online Publication Server

Mutual Exclusion Algorithms with Constant RMR Complexity and Wait-Free Exit Code

Author: Dvir Rotem
Taubenfeld Gadi
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 21st International Conference on Principles of Distributed Systems (OPODIS 2017)
Publication date: 01/01/2018
Field of study

Dagstuhl Research Online Publication Server

Recoverable, Abortable, and Adaptive Mutual Exclusion with Sublogarithmic RMR Complexity

Author: Katzan Daniel
Morrison Adam
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 24th International Conference on Principles of Distributed Systems (OPODIS 2020)
Publication date: 17/11/2020
Field of study

We present the first recoverable mutual exclusion (RME) algorithm that is simultaneously abortable, adaptive to point contention, and with sublogarithmic RMR complexity. Our algorithm has O(min(K,log_W N)) RMR passage complexity and O(F + min(K,log_W N)) RMR super-passage complexity, where K is the number of concurrent processes (point contention), W is the size (in bits) of registers, and F is the number of crashes in a super-passage. Under the standard assumption that W = ?(log N), these bounds translate to worst-case O((log N)/(log log N)) passage complexity and O(F + (log N)/(log log N)) super-passage complexity. Our key building blocks are: - A D-process abortable RME algorithm, for D ? W, with O(1) passage complexity and O(1+F) super-passage complexity. We obtain this algorithm by using the Fetch-And-Add (FAA) primitive, unlike prior work on RME that uses Fetch-And-Store (FAS/SWAP). - A generic transformation that transforms any abortable RME algorithm with passage complexity of B < W, into an abortable RME lock with passage complexity of O(min(K,B))

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

An Almost Tight RMR Lower Bound for Abortable Test-And-Set

Author: Eghbali Aryaz
Woelfel Philipp
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 32nd International Symposium on Distributed Computing (DISC 2018)
Publication date: 01/01/2018
Field of study

We prove a lower bound of Omega(log n/log log n) for the remote memory reference (RMR) complexity of abortable test-and-set (leader election) in the cache-coherent (CC) and the distributed shared memory (DSM) model. This separates the complexities of abortable and non-abortable test-and-set, as the latter has constant RMR complexity [Wojciech Golab et al., 2010]. Golab, Hendler, Hadzilacos and Woelfel [Wojciech M. Golab et al., 2012] showed that compare-and-swap can be implemented from registers and test-and-set objects with constant RMR complexity. We observe that a small modification to that implementation is abortable, provided that the used test-and-set objects are atomic (or abortable). As a consequence, using existing efficient randomized wait-free implementations of test-and-set [George Giakkoupis and Philipp Woelfel, 2012], we obtain randomized abortable compare-and-swap objects with almost constant (O(log^* n)) RMR complexity

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Abortable Reader-Writer Locks are No More Complex Than Abortable Mutex Locks

Author: Liu Zhiyu
Publication venue: Dartmouth Digital Commons
Publication date: 01/06/2012
Field of study

When a process attempts to acquire a mutex lock, it may be forced to wait if another process currently holds the lock. In certain applications, such as real-time operating systems and databases, indefinite waiting can cause a process to miss an important deadline. Hence, there has been research on designing abortable mutual exclusion locks, and fairly efficient algorithms of O(log n) RMR complexity have been discovered (n denotes the number of processes for which the algorithm is designed). The abort feature is just as important for a reader-writer lock as it is for a mutual exclusion lock, but to the best of our knowledge there are currently no abortable reader-writer locks that are starvation-free. We show the surprising result that any abortable, starvation-free mutual exclusion algorithm of RMR complexity t(n) can be transformed into an abortable, starvation-free reader-writer exclusion algorithm of RMR complexity O(t(n)). Thus, we obtain the first abortable, starvation-free reader-writer exclusion algorithm of O(log n) RMR complexity. Our results apply to the Cache-Coherent (CC) model of multiprocessors

Dartmouth Digital Commons (Dartmouth College)

Lock cohorting: A general technique for designing NUMA locks

Author: Dice David
Marathe Virendra J.
Shavit Nir N.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2012
Field of study

Multicore machines are quickly shifting to NUMA and CC-NUMA architectures, making scalable NUMA-aware locking algorithms, ones that take into account the machines' non-uniform memory and caching hierarchy, ever more important. This paper presents lock cohorting, a general new technique for designing NUMA-aware locks that is as simple as it is powerful. Lock cohorting allows one to transform any spin-lock algorithm, with minimal non-intrusive changes, into scalable NUMA-aware spin-locks. Our new cohorting technique allows us to easily create NUMA-aware versions of the TATAS-Backoff, CLH, MCS, and ticket locks, to name a few. Moreover, it allows us to derive a CLH-based cohort abortable lock, the first NUMA-aware queue lock to support abortability. We empirically compared the performance of cohort locks with prior NUMA-aware and classic NUMA-oblivious locks on a synthetic micro-benchmark, a real world key-value store application memcached, as well as the libc memory allocator. Our results demonstrate that cohort locks perform as well or better than known locks when the load is low and significantly out-perform them as the load increases

CiteSeerX

DSpace@MIT

Crossref

Aether: A scalable approach to logging

Author: Ailamaki Anastasia
Athanassoulis Manos
Johnson Ryan
Pandis Ippokratis
Stoica Radu
Publication venue: 'VLDB Endowment'
Publication date: 22/06/2010
Field of study

The shift to multi-core hardware brings new challenges to database systems, as the software parallelism determines performance. Even though database systems traditionally accommodate simultaneous requests, a multitude of synchronization barriers serialize execution. Write-ahead logging is a fundamental, omnipresent component in ARIES-style concurrency and recovery, and one of the most important yet-to-be addressed potential bottlenecks, especially in OLTP workloads with small and frequent changes to data. In this paper, we identify four logging-related impediments to database system scalability. Each issue challenges different level in the software architecture: (a) the high volume of small-sized I/O requests may saturate the disk, (b) transactions hold locks while waiting for the log flush, (c) extensive context switching overwhelms the OS scheduler with threads executing log I/Os, and (d) contention appears as transactions serialize accesses to in-memory log data structures. We demonstrate these problems and address them with techniques that, when combined, comprise a holistic, scalable approach to logging. Our solution achieves a 20%-69% speedup over a modern database system when running log-intensive workloads, such as the TPC-B and TATP benchmarks. Moreover, it achieves a log insert throughput of small-sized log records of over 1.8GB/s on a modern single socket server, an order of magnitude higher than the traditional way of accessing the log using a single mutex

Infoscience - École polytechnique fédérale de Lausanne

SynCron: Efficient Synchronization Support for Near-Data-Processing Architectures

Author: Fernandez Ivan
Giannoula Christina
Goumas Georgios
Gómez-Luna Juan
Karakostas Vasileios
Koziris Nectarios
Mutlu Onur
Orosa Lois
Papadopoulou Nikela
Vijaykumar Nandita
Publication venue
Publication date: 13/02/2021
Field of study

Near-Data-Processing (NDP) architectures present a promising way to alleviate data movement costs and can provide significant performance and energy benefits to parallel applications. Typically, NDP architectures support several NDP units, each including multiple simple cores placed close to memory. To fully leverage the benefits of NDP and achieve high performance for parallel workloads, efficient synchronization among the NDP cores of a system is necessary. However, supporting synchronization in many NDP systems is challenging because they lack shared caches and hardware cache coherence support, which are commonly used for synchronization in multicore systems, and communication across different NDP units can be expensive. This paper comprehensively examines the synchronization problem in NDP systems, and proposes SynCron, an end-to-end synchronization solution for NDP systems. SynCron adds low-cost hardware support near memory for synchronization acceleration, and avoids the need for hardware cache coherence support. SynCron has three components: 1) a specialized cache memory structure to avoid memory accesses for synchronization and minimize latency overheads, 2) a hierarchical message-passing communication protocol to minimize expensive communication across NDP units of the system, and 3) a hardware-only overflow management scheme to avoid performance degradation when hardware resources for synchronization tracking are exceeded. We evaluate SynCron using a variety of parallel workloads, covering various contention scenarios. SynCron improves performance by 1.27