150 research outputs found

    RCU Semantics: A First Attempt

    Get PDF
    There is not yet a formal statement of RCU (read-copy update) semantics. While this lack has thus far not been an impediment to adoption and use of RCU, it is quite possible that formal semantics would point the way towards tools that automatically validate uses of RCU or that permit RCU algorithms to be automatically generated by a parallel compiler. This paper is a first attempt to supply a formal definition of RCU. Or at least a semi-formal definition: although RCU does not yet wear a tux (though it does run in Linux), at least it might yet wear some clothes

    Resizable, Scalable, Concurrent Hash Tables

    Get PDF
    We present algorithms for shrinking and expanding a hash table while allowing concurrent, wait-free, linearly scalable lookups. These resize algorithms allow the hash table to maintain constant-time performance as the number of entries grows, and reclaim memory as the number of entries decreases, without delaying or disrupting readers. We implemented our algorithms in the Linux kernel, to test their performance and scalability. Benchmarks show lookup scalability improved 125x over readerwriter locking, and 56% over the current state-of-the-art for Linux, with no performance degradation for lookups during a resize. To achieve this performance, this hash table implementation uses a new concurrent programming methodology known as relativistic programming. In particular, we make use of an existing synchronization primitive which waits for all current readers to finish, with little to no reader overhead; careful use of this primitive allows ordering of updates without read-side synchronization or memory barriers

    When Do Real Time Systems Need Multiple CPUs?

    Get PDF
    Abstract Until recently, real-time systems were always single-CPU systems. The prospect of multiprocessing has arrived with the advent of low-cost and readily available multi-core systems. Now many RTOSes, perhaps most notably Linux TM , provide real-time response on multiprocessor systems. However, this begs the question as to whether your real-time application should avail itself of parallelism. Furthermore, if the answer is "yes," the next question is what form of parallelism your application should avail itself of: shared memory parallelism with locking and threads, process pipelines, multiple cooperating processes, or one of a number of other approaches. This paper will examine these questions, providing rules of thumb to help you choose whether your real-time application should be parallel, and, if so, what sort of parallelism is best for you

    Universal Wait-Free Memory Reclamation

    Full text link
    In this paper, we present a universal memory reclamation scheme, Wait-Free Eras (WFE), for deleted memory blocks in wait-free concurrent data structures. WFE's key innovation is that it is completely wait-free. Although some prior techniques provide similar guarantees for certain data structures, they lack support for arbitrary wait-free data structures. Consequently, developers are typically forced to marry their wait-free data structures with lock-free Hazard Pointers or (potentially blocking) epoch-based memory reclamation. Since both these schemes provide weaker progress guarantees, they essentially forfeit the strong progress guarantee of wait-free data structures. Though making the original Hazard Pointers scheme or epoch-based reclamation completely wait-free seems infeasible, we achieved this goal with a more recent, (lock-free) Hazard Eras scheme, which we extend to guarantee wait-freedom. As this extension is non-trivial, we discuss all challenges pertaining to the construction of universal wait-free memory reclamation. WFE is implementable on ubiquitous x86_64 and AArch64 (ARM) architectures. Its API is mostly compatible with Hazard Pointers, which allows easy transitioning of existing data structures into WFE. Our experimental evaluations show that WFE's performance is close to epoch-based reclamation and almost matches the original Hazard Eras scheme, while providing the stronger wait-free progress guarantee

    Concurrent Search Data Structures Can Be Blocking and Practically Wait-Free

    Get PDF
    We argue that there is virtually no practical situation in which one should seek a "theoretically wait-free" algorithm at the expense of a state-of-the-art blocking algorithm in the case of search data structures: blocking algorithms are simple, fast, and can be made "practically wait-free". We draw this conclusion based on the most exhaustive study of blocking search data structures to date. We consider (a) different search data structures of different sizes, (b) numerous uniform and non-uniform workloads, representative of a wide range of practical scenarios, with different percentages of update operations, (c) with and without delayed threads, (d) on different hardware technologies, including processors providing HTM instructions. We explain our claim that blocking search data structures are practically wait-free through an analogy with the birthday paradox, revealing that, in state-of-the-art algorithms implementing such data structures, the probability of conflicts is extremely small. When conflicts occur as a result of context switches and interrupts, we show that HTM-based locks enable blocking algorithms to cope with the

    Regular Topologies for Gigabit Wide-Area Networks: Congestion Avoidance Testbed Experiments

    Get PDF
    This document is Volume 3 of the final technical report on the work performed by SRI International (SRI) on SRI Project 8600. The document includes source listings for all software developed by SRI under this effort. Since some of our work involved the use of ST-II and the Sun Microsystems, Inc. (Sun) High-Speed Serial Interface (HSI/S) driver, we have included some of the source developed by LBL and BBN as well. In most cases, our decision to include source developed by other contractors depended on whether it was necessary to modify the original code. If we have modified the software in any way, it is included in this document. In the case of the Traffic Generator (TG), however, we have included all the ST-II software, even though BBN performed the integration, because the ST-II software is part of the standard TG release. It is important to note that all the code developed by other contractors is in the public domain, so that all software developed under this effort can be re-created from the source included here

    Congestion Avoidance Testbed Experiments

    Get PDF
    DARTnet provides an excellent environment for executing networking experiments. Since the network is private and spans the continental United States, it gives researchers a great opportunity to test network behavior under controlled conditions. However, this opportunity is not available very often, and therefore a support environment for such testing is lacking. To help remedy this situation, part of SRI's effort in this project was devoted to advancing the state of the art in the techniques used for benchmarking network performance. The second objective of SRI's effort in this project was to advance networking technology in the area of traffic control, and to test our ideas on DARTnet, using the tools we developed to improve benchmarking networks. Networks are becoming more common and are being used by more and more people. The applications, such as multimedia conferencing and distributed simulations, are also placing greater demand on the resources the networks provide. Hence, new mechanisms for traffic control must be created to enable their networks to serve the needs of their users. SRI's objective, therefore, was to investigate a new queueing and scheduling approach that will help to meet the needs of a large, diverse user population in a "fair" way

    Asynchronized Concurrency: The Secret to Scaling Concurrent Search Data Structures

    Get PDF
    We introduce "asynchronized concurrency (ASCY),'' a paradigm consisting of four complementary programming patterns. ASCY calls for the design of concurrent search data structures (CSDSs) to resemble that of their sequential counterparts. We argue that ASCY leads to implementations which are portably scalable: they scale across different types of hardware platforms, including single and multi-socket ones, for various classes of workloads, such as read-only and read-write, and according to different performance metrics, including throughput, latency, and energy. We substantiate our thesis through the most exhaustive evaluation of CSDSs to date, involving 6 platforms, 22 state-of-the-art CSDS algorithms, 10 re-engineered state-of-the-art CSDS algorithms following the ASCY patterns, and 2 new CSDS algorithms designed with ASCY in mind. We observe up to 30% improvements in throughput in the re-engineered algorithms, while our new algorithms out-perform the state-of-the-art alternatives
    corecore