12 research outputs found

    Implementing and Breaking Load-Link / Store-Conditional on an ARM-Based System

    Full text link
    Manufacturers of modern electronic devices are constantly attempting to implement additional features into ever-increasingly complex and performance demanding systems. This race has been historically driven by improvements in the processor's clock speed, but as power consumption and real estate concerns in the embedded space pose an growing challenge, multithreading approaches have become more prevalent and relied upon. Synchronization is essential to multithreading systems, as it ensures that threads do not interfere with each others' operations and produce reliable and consistent outputs whilst maximizing performance and efficiency. One of the primary mechanisms guaranteeing synchronization in RISC architectures is the load-link/store conditional routine, which implements an atomic operation that allows a thread to obtain a lock. In this study, we implement, test, and manipulate an LL/SC routine in a multithreading environment using GDB. After examining the routine mechanics, we propose a concise implementation in ARMv7l, as well as demonstrate the importance of register integrity and vulnerabilities that occur when integrity is violated under a limited threat model. This work sheds light on LL/SC operations and related lock routines used for multithreading

    A Template for Implementing Fast Lock-free Trees Using HTM

    Full text link
    Algorithms that use hardware transactional memory (HTM) must provide a software-only fallback path to guarantee progress. The design of the fallback path can have a profound impact on performance. If the fallback path is allowed to run concurrently with hardware transactions, then hardware transactions must be instrumented, adding significant overhead. Otherwise, hardware transactions must wait for any processes on the fallback path, causing concurrency bottlenecks, or move to the fallback path. We introduce an approach that combines the best of both worlds. The key idea is to use three execution paths: an HTM fast path, an HTM middle path, and a software fallback path, such that the middle path can run concurrently with each of the other two. The fast path and fallback path do not run concurrently, so the fast path incurs no instrumentation overhead. Furthermore, fast path transactions can move to the middle path instead of waiting or moving to the software path. We demonstrate our approach by producing an accelerated version of the tree update template of Brown et al., which can be used to implement fast lock-free data structures based on down-trees. We used the accelerated template to implement two lock-free trees: a binary search tree (BST), and an (a,b)-tree (a generalization of a B-tree). Experiments show that, with 72 concurrent processes, our accelerated (a,b)-tree performs between 4.0x and 4.2x as many operations per second as an implementation obtained using the original tree update template

    Concurrent Data Structures Using Multiword Compare and Swap

    Get PDF
    To maximize the performance of concurrent data structures, researchers have turned to highly complex fine-grained techniques. Resulting algorithms are often extremely difficult to understand and prove correct, allowing for highly cited works to contain correctness bugs that go undetected for long periods of time. This complexity is perceived as a necessary sacrifice: simpler, more general techniques cannot attain competitive performance with these fine-grained implementations. To challenge this perception, this work presents three data structures created using multi-word compare-and-swap (KCAS), version numbering, and double-collect searches that showcase the power of using a more coarse-grained approach. First, a novel lock-free binary search tree (BST) is presented that is both fully-internal and balanced, which is able to achieve competitive performance with the state-of-the-art fine-grained concurrent BSTs while being significantly simpler. Next, the first concurrent implementation of an Euler-tour data-structure is outlined, solving fully-dynamic graph connectivity. Finally, a KCAS variant of an (a,b)-tree implementation is presented, which shows significant performance improvements in certain workloads when compared to the original

    Extremely fast (a,b)-trees at all contention levels

    Get PDF
    Many concurrent dictionary implementations are designed and evaluated with only low-contention workloads in mind. This thesis presents several concurrent linearizable (a,b)-tree implementations with the overarching goal of performing well on both low- and high-contention workloads, and especially update-heavy workloads. The OCC-ABtree uses optimistic concurrency control to achieve state-of-the-art low-contention performance. However, under high-contention, cache coherence traffic begins to affect its performance. This is addressed by replacing its test-and-compare-and-swap locks with MCS queue locks. The resulting MCS-ABtree scales well under both low- and high-contention workloads. This thesis also introduces two coalescing-based trees, the CoMCS-ABtree and the CoPub-ABtree, that achieve substantially better performance under high-contention by reordering and coalescing concurrent inserts and deletes. Comparing these algorithms against the state of the art in concurrent search trees, we find that the fastest algorithm, the CoPub-ABtree, outperforms the next fastest competitor by up to 2x. This thesis then describes persistent versions of the four trees, whose implementations use fewer sfence instructions than a leading competitor (the FPTree). The persistent trees are proved to be strictly linearizable. Experimentally, the persistent trees are only slightly slower than their volatile counterparts, suggesting that they have great use as in-memory databases that need to be able to recover after a crash

    Techniques for Constructing Efficient Lock-free Data Structures

    Full text link
    Building a library of concurrent data structures is an essential way to simplify the difficult task of developing concurrent software. Lock-free data structures, in which processes can help one another to complete operations, offer the following progress guarantee: If processes take infinitely many steps, then infinitely many operations are performed. Handcrafted lock-free data structures can be very efficient, but are notoriously difficult to implement. We introduce numerous tools that support the development of efficient lock-free data structures, and especially trees.Comment: PhD thesis, Univ Toronto (2017
    corecore