2,496 research outputs found
Single-Producer/Single-Consumer Queues on Shared Cache Multi-Core Systems
Using efficient point-to-point communication channels is critical for
implementing fine grained parallel program on modern shared cache multi-core
architectures.
This report discusses in detail several implementations of wait-free
Single-Producer/Single-Consumer queue (SPSC), and presents a novel and
efficient algorithm for the implementation of an unbounded wait-free SPSC queue
(uSPSC). The correctness proof of the new algorithm, and several performance
measurements based on simple synthetic benchmark and microbenchmark, are also
discussed
A Scalable, Portable, and Memory-Efficient Lock-Free FIFO Queue
We present a new lock-free multiple-producer and multiple-consumer (MPMC) FIFO queue design which is scalable and, unlike existing high-performant queues, very memory efficient. Moreover, the design is ABA safe and does not require any external memory allocators or safe memory reclamation techniques, typically needed by other scalable designs. In fact, this queue itself can be leveraged for object allocation and reclamation, as in data pools. We use FAA (fetch-and-add), a specialized and more scalable than CAS (compare-and-set) instruction, on the most contended hot spots of the algorithm. However, unlike prior attempts with FAA, our queue is both lock-free and linearizable.
We propose a general approach, SCQ, for bounded queues. This approach can easily be extended to support unbounded FIFO queues which can store an arbitrary number of elements. SCQ is portable across virtually all existing architectures and flexible enough for a wide variety of uses. We measure the performance of our algorithm on the x86-64 and PowerPC architectures. Our evaluation validates that our queue has exceptional memory efficiency compared to other algorithms and its performance is often comparable to, or exceeding that of state-of-the-art scalable algorithms
Sprinklers: A Randomized Variable-Size Striping Approach to Reordering-Free Load-Balanced Switching
Internet traffic continues to grow exponentially, calling for switches that
can scale well in both size and speed. While load-balanced switches can achieve
such scalability, they suffer from a fundamental packet reordering problem.
Existing proposals either suffer from poor worst-case packet delays or require
sophisticated matching mechanisms. In this paper, we propose a new family of
stable load-balanced switches called "Sprinklers" that has comparable
implementation cost and performance as the baseline load-balanced switch, but
yet can guarantee packet ordering. The main idea is to force all packets within
the same virtual output queue (VOQ) to traverse the same "fat path" through the
switch, so that packet reordering cannot occur. At the core of Sprinklers are
two key innovations: a randomized way to determine the "fat path" for each VOQ,
and a way to determine its "fatness" roughly in proportion to the rate of the
VOQ. These innovations enable Sprinklers to achieve near-perfect load-balancing
under arbitrary admissible traffic. Proving this property rigorously using
novel worst-case large deviation techniques is another key contribution of this
work
Quantitative relaxation of concurrent data structures
There is a trade-off between performance and correctness in implementing concurrent data structures. Better performance may be achieved at the expense of relaxing correctness, by redefining the semantics of data structures. We address such a redefinition of data structure semantics and present a systematic and formal framework for obtaining new data structures by quantitatively relaxing existing ones. We view a data structure as a sequential specification S containing all "legal" sequences over an alphabet of method calls. Relaxing the data structure corresponds to defining a distance from any sequence over the alphabet to the sequential specification: the k-relaxed sequential specification contains all sequences over the alphabet within distance k from the original specification. In contrast to other existing work, our relaxations are semantic (distance in terms of data structure states). As an instantiation of our framework, we present two simple yet generic relaxation schemes, called out-of-order and stuttering relaxation, along with several ways of computing distances. We show that the out-of-order relaxation, when further instantiated to stacks, queues, and priority queues, amounts to tolerating bounded out-of-order behavior, which cannot be captured by a purely syntactic relaxation (distance in terms of sequence manipulation, e.g. edit distance). We give concurrent implementations of relaxed data structures and demonstrate that bounded relaxations provide the means for trading correctness for performance in a controlled way. The relaxations are monotonic which further highlights the trade-off: increasing k increases the number of permitted sequences, which as we demonstrate can lead to better performance. Finally, since a relaxed stack or queue also implements a pool, we actually have new concurrent pool implementations that outperform the state-of-the-art ones
- …