Proceedings of the International Conference on Supercomputing
Doi
Abstract
Designing high-performance software queues for fast intercore communication is challenging, but critical for maximising software parallelism. State-of-the-art single-producer / single-consumer queues for streaming applications contain multiple sections, requiring the producer and consumer to operate independently on different sections from each other. While these queues perform well for coarse-grained data transfers, they perform poorly in the fine-grained case. This paper proposes Lynx, a novel SP/SC queue, specifically tuned for fine-grained communication. Lynx is built from the ground up, reducing the generated code on the critical-path to just two operations per enqueue and dequeue. To achieve this it relies on existing commodity processor hardware and operating system exception handling support to deal with infrequent queue maintenance operations. Lynx outperforms the state-of-the art by up to 1.57× in total 64-bit throughput reaching a peak throughput of 15.7GB/s on a common desktop system. Real applications using Lynx get a performance improvement of up to 1.4×.This work was supported by the Engineering and Physical Sciences Research Council (EPSRC), through grant reference EP/K026399/1.This is the author accepted manuscript. The final version is available from Association for Computing Machinery via http://dx.doi.org/10.1145/2925426.2926274