150 research outputs found
RCU Semantics: A First Attempt
There is not yet a formal statement of RCU (read-copy update) semantics. While this lack has thus far not been an impediment to adoption and use of RCU, it is quite possible that formal semantics would point the way towards tools that automatically validate uses of RCU or that permit RCU algorithms to be automatically generated by a parallel compiler. This paper is a first attempt to supply a formal definition of RCU. Or at least a semi-formal definition: although RCU does not yet wear a tux (though it does run in Linux), at least it might yet wear some clothes
The acceptability of promotional policies to teachers and administrators in five selected communities.
Thesis (Ed.M.)--Boston Universit
Resizable, Scalable, Concurrent Hash Tables
We present algorithms for shrinking and expanding a hash table while allowing concurrent, wait-free, linearly scalable lookups. These resize algorithms allow the hash table to maintain constant-time performance as the number of entries grows, and reclaim memory as the number of entries decreases, without delaying or disrupting readers.
We implemented our algorithms in the Linux kernel, to test their performance and scalability. Benchmarks show lookup scalability improved 125x over readerwriter locking, and 56% over the current state-of-the-art for Linux, with no performance degradation for lookups during a resize.
To achieve this performance, this hash table implementation uses a new concurrent programming methodology known as relativistic programming. In particular, we make use of an existing synchronization primitive which waits for all current readers to finish, with little to no reader overhead; careful use of this primitive allows ordering of updates without read-side synchronization or memory barriers
When Do Real Time Systems Need Multiple CPUs?
Abstract Until recently, real-time systems were always single-CPU systems. The prospect of multiprocessing has arrived with the advent of low-cost and readily available multi-core systems. Now many RTOSes, perhaps most notably Linux TM , provide real-time response on multiprocessor systems. However, this begs the question as to whether your real-time application should avail itself of parallelism. Furthermore, if the answer is "yes," the next question is what form of parallelism your application should avail itself of: shared memory parallelism with locking and threads, process pipelines, multiple cooperating processes, or one of a number of other approaches. This paper will examine these questions, providing rules of thumb to help you choose whether your real-time application should be parallel, and, if so, what sort of parallelism is best for you
Universal Wait-Free Memory Reclamation
In this paper, we present a universal memory reclamation scheme, Wait-Free
Eras (WFE), for deleted memory blocks in wait-free concurrent data structures.
WFE's key innovation is that it is completely wait-free. Although some prior
techniques provide similar guarantees for certain data structures, they lack
support for arbitrary wait-free data structures. Consequently, developers are
typically forced to marry their wait-free data structures with lock-free Hazard
Pointers or (potentially blocking) epoch-based memory reclamation. Since both
these schemes provide weaker progress guarantees, they essentially forfeit the
strong progress guarantee of wait-free data structures. Though making the
original Hazard Pointers scheme or epoch-based reclamation completely wait-free
seems infeasible, we achieved this goal with a more recent, (lock-free) Hazard
Eras scheme, which we extend to guarantee wait-freedom. As this extension is
non-trivial, we discuss all challenges pertaining to the construction of
universal wait-free memory reclamation.
WFE is implementable on ubiquitous x86_64 and AArch64 (ARM) architectures.
Its API is mostly compatible with Hazard Pointers, which allows easy
transitioning of existing data structures into WFE. Our experimental
evaluations show that WFE's performance is close to epoch-based reclamation and
almost matches the original Hazard Eras scheme, while providing the stronger
wait-free progress guarantee
Recommended from our members
Position estimation
Acoustic position estimation is used where high accuracy navigation is required over a small area, such as for searching or for collecting gravitational or geomagnetic data. In this position estimation method, a surface ship or submersible periodically sends out a high-frequency acoustic 'ping' at a prearranged frequency. This ping is received by an array of transponders attached to the ocean floor, each of these transponders 'replies' to the ping with another ping at its own prearranged frequency. The ship records the times elapsed from when it sent out its ping to when it received each of the transponder's replies. The ship can then convert these elapsed 'round trip times' into distances, and can compute its position relative to the transponder array.
However, the relative positions of the transponders must be known. If the ship had some way of accurately determining its position when it deployed the transponders, it would not have needed to deploy them in the first place (since the only purpose of the transponders it to determine the ship's position accurately). Furthermore, even if the ship did know its position when it deployed a given transponder, there are many forces (such as ocean currents) that would prevent the transponder from descending exactly straight down.
This paper presents an algorithm that can determine the relative positions of the transponders in an array from acoustic measurements collected by the ship. This algorithm makes use of a second-order Newton's method with exact linesearch to minimize a.n error function whose domain is the set of coordinates of all the transponder and ship positions involved in the acoustic measurements.
This algorithm will be qualitatively compared to an older algorithm that has been used to solve the problem
Concurrent Search Data Structures Can Be Blocking and Practically Wait-Free
We argue that there is virtually no practical situation in which one should seek a "theoretically wait-free" algorithm at the expense of a state-of-the-art blocking algorithm in the case of search data structures: blocking algorithms are simple, fast, and can be made "practically wait-free". We draw this conclusion based on the most exhaustive study of blocking search data structures to date. We consider (a) different search data structures of different sizes, (b) numerous uniform and non-uniform workloads, representative of a wide range of practical scenarios, with different percentages of update operations, (c) with and without delayed threads, (d) on different hardware technologies, including processors providing HTM instructions. We explain our claim that blocking search data structures are practically wait-free through an analogy with the birthday paradox, revealing that, in state-of-the-art algorithms implementing such data structures, the probability of conflicts is extremely small. When conflicts occur as a result of context switches and interrupts, we show that HTM-based locks enable blocking algorithms to cope with the
Regular Topologies for Gigabit Wide-Area Networks: Congestion Avoidance Testbed Experiments
This document is Volume 3 of the final technical report on the work performed by SRI International (SRI) on SRI Project 8600. The document includes source listings for all software developed by SRI under this effort. Since some of our work involved the use of ST-II and the Sun Microsystems, Inc. (Sun) High-Speed Serial Interface (HSI/S) driver, we have included some of the source developed by LBL and BBN as well. In most cases, our decision to include source developed by other contractors depended on whether it was necessary to modify the original code. If we have modified the software in any way, it is included in this document. In the case of the Traffic Generator (TG), however, we have included all the ST-II software, even though BBN performed the integration, because the ST-II software is part of the standard TG release. It is important to note that all the code developed by other contractors is in the public domain, so that all software developed under this effort can be re-created from the source included here
Congestion Avoidance Testbed Experiments
DARTnet provides an excellent environment for executing networking experiments. Since the network is private and spans the continental United States, it gives researchers a great opportunity to test network behavior under controlled conditions. However, this opportunity is not available very often, and therefore a support environment for such testing is lacking. To help remedy this situation, part of SRI's effort in this project was devoted to advancing the state of the art in the techniques used for benchmarking network performance. The second objective of SRI's effort in this project was to advance networking technology in the area of traffic control, and to test our ideas on DARTnet, using the tools we developed to improve benchmarking networks. Networks are becoming more common and are being used by more and more people. The applications, such as multimedia conferencing and distributed simulations, are also placing greater demand on the resources the networks provide. Hence, new mechanisms for traffic control must be created to enable their networks to serve the needs of their users. SRI's objective, therefore, was to investigate a new queueing and scheduling approach that will help to meet the needs of a large, diverse user population in a "fair" way
Asynchronized Concurrency: The Secret to Scaling Concurrent Search Data Structures
We introduce "asynchronized concurrency (ASCY),'' a paradigm consisting of four complementary programming patterns. ASCY calls for the design of concurrent search data structures (CSDSs) to resemble that of their sequential counterparts. We argue that ASCY leads to implementations which are portably scalable: they scale across different types of hardware platforms, including single and multi-socket ones, for various classes of workloads, such as read-only and read-write, and according to different performance metrics, including throughput, latency, and energy. We substantiate our thesis through the most exhaustive evaluation of CSDSs to date, involving 6 platforms, 22 state-of-the-art CSDS algorithms, 10 re-engineered state-of-the-art CSDS algorithms following the ASCY patterns, and 2 new CSDS algorithms designed with ASCY in mind. We observe up to 30% improvements in throughput in the re-engineered algorithms, while our new algorithms out-perform the state-of-the-art alternatives
- …