1,438 research outputs found

    Scalable Synchronization with Mindicators

    Get PDF
    The Mindicator is a shared object that stores one value for each thread in a system, and can return the minimum of all thread’s values in constant time. In this paper, we explore applications of the Mindicator in synchronization algorithms. We introduce three new algorithms, designed for scalable Read-Copy-Update (RCU), fair Readers-Writer locking, and Group Mutual Exclusion. Experimental evaluation shows these algorithms to perform well while avoiding contention

    Scalable Range Locks for Scalable Address Spaces and Beyond

    Full text link
    Range locks are a synchronization construct designed to provide concurrent access to multiple threads (or processes) to disjoint parts of a shared resource. Originally conceived in the file system context, range locks are gaining increasing interest in the Linux kernel community seeking to alleviate bottlenecks in the virtual memory management subsystem. The existing implementation of range locks in the kernel, however, uses an internal spin lock to protect the underlying tree structure that keeps track of acquired and requested ranges. This spin lock becomes a point of contention on its own when the range lock is frequently acquired. Furthermore, where and exactly how specific (refined) ranges can be locked remains an open question. In this paper, we make two independent, but related contributions. First, we propose an alternative approach for building range locks based on linked lists. The lists are easy to maintain in a lock-less fashion, and in fact, our range locks do not use any internal locks in the common case. Second, we show how the range of the lock can be refined in the mprotect operation through a speculative mechanism. This refinement, in turn, allows concurrent execution of mprotect operations on non-overlapping memory regions. We implement our new algorithms and demonstrate their effectiveness in user-space and kernel-space, achieving up to 9×\times speedup compared to the stock version of the Linux kernel. Beyond the virtual memory management subsystem, we discuss other applications of range locks in parallel software. As a concrete example, we show how range locks can be used to facilitate the design of scalable concurrent data structures, such as skip lists.Comment: 17 pages, 9 figures, Eurosys 202

    Enabling Highly-Scalable Remote Memory Access Programming with MPI-3 One Sided

    Get PDF

    Constant RMR Solutions to Reader Writer Synchronization

    Get PDF
    We study Reader-Writer Exclusion, a well-known variant of the Mutual Exclusion problem where processes are divided into two classes--readers and writers--and multiple readers can be in the Critical Section (CS) at the same time, although no process may be in the CS at the same time as a writer. Since readers don\u27t conflict with each other, they should not obstruct each other. Specifically, the concurrent entering property must be satisfied: if all writers are in the remainder section, each reader should be able to enter the CS in a bounded number of its own steps. Three versions of the Reader-Writer Exclusion problem are commonly studied--one where writers have priority over readers, another where readers have priority, and the last where neither class has priority over the other and no process may starve. To ensure high performance on Cache-Coherent (CC) and Distributed Shared Memory (DSM) multiprocessors, algorithms should be designed to generate as few remote memory references (RMRs) as possible. The ideal would be to achieve constant RMR complexity, i.e., the worst case number of RMRs that a process generates in order to enter and exit the CS once is a constant, independent of the number of processes. Constant RMR complexity algorithms have existed for Mutual Exclusion for two decades, but none exists for Reader-Writer Exclusion. Danek and Hadzilacos\u27 lower bound proof implies that it is impossible to achieve sublinear RMR complexity for DSM machines. For CC machines, the best existing bound, also due to Danek and Hadzilacos , is O(log n), where n is the number of processes. In this work, we present the first constant RMR complexity algorithms for all three versions of the Reader-Writer Exclusion problem (for CC machines)

    Abortable Reader-Writer Locks are No More Complex Than Abortable Mutex Locks

    Get PDF
    When a process attempts to acquire a mutex lock, it may be forced to wait if another process currently holds the lock. In certain applications, such as real-time operating systems and databases, indefinite waiting can cause a process to miss an important deadline. Hence, there has been research on designing abortable mutual exclusion locks, and fairly efficient algorithms of O(log n) RMR complexity have been discovered (n denotes the number of processes for which the algorithm is designed). The abort feature is just as important for a reader-writer lock as it is for a mutual exclusion lock, but to the best of our knowledge there are currently no abortable reader-writer locks that are starvation-free. We show the surprising result that any abortable, starvation-free mutual exclusion algorithm of RMR complexity t(n) can be transformed into an abortable, starvation-free reader-writer exclusion algorithm of RMR complexity O(t(n)). Thus, we obtain the first abortable, starvation-free reader-writer exclusion algorithm of O(log n) RMR complexity. Our results apply to the Cache-Coherent (CC) model of multiprocessors

    Smartlocks: Self-Aware Synchronization through Lock Acquisition Scheduling

    Get PDF
    As multicore processors become increasingly prevalent, system complexity is skyrocketing. The advent of the asymmetric multicore compounds this -- it is no longer practical for an average programmer to balance the system constraints associated with today's multicores and worry about new problems like asymmetric partitioning and thread interference. Adaptive, or self-aware, computing has been proposed as one method to help application and system programmers confront this complexity. These systems take some of the burden off of programmers by monitoring themselves and optimizing or adapting to meet their goals. This paper introduces an open-source self-aware synchronization library for multicores and asymmetric multicores called Smartlocks. Smartlocks is a spin-lock library that adapts its internal implementation during execution using heuristics and machine learning to optimize toward a user-defined goal, which may relate to performance, power, or other problem-specific criteria. Smartlocks builds upon adaptation techniques from prior work like reactive locks, but introduces a novel form of adaptation designed for asymmetric multicores that we term lock acquisition scheduling. Lock acquisition scheduling is optimizing which waiter will get the lock next for the best long-term effect when multiple threads (or processes) are spinning for a lock. Our results demonstrate empirically that lock scheduling is important for asymmetric multicores and that Smartlocks significantly outperform conventional and reactive locks for asymmetries like dynamic variations in processor clock frequencies caused by thermal throttling events
    • …
    corecore