11 research outputs found

    The Power of Choice in Priority Scheduling

    Full text link
    Consider the following random process: we are given nn queues, into which elements of increasing labels are inserted uniformly at random. To remove an element, we pick two queues at random, and remove the element of lower label (higher priority) among the two. The cost of a removal is the rank of the label removed, among labels still present in any of the queues, that is, the distance from the optimal choice at each step. Variants of this strategy are prevalent in state-of-the-art concurrent priority queue implementations. Nonetheless, it is not known whether such implementations provide any rank guarantees, even in a sequential model. We answer this question, showing that this strategy provides surprisingly strong guarantees: Although the single-choice process, where we always insert and remove from a single randomly chosen queue, has degrading cost, going to infinity as we increase the number of steps, in the two choice process, the expected rank of a removed element is O(n)O( n ) while the expected worst-case cost is O(nlogn)O( n \log n ). These bounds are tight, and hold irrespective of the number of steps for which we run the process. The argument is based on a new technical connection between "heavily loaded" balls-into-bins processes and priority scheduling. Our analytic results inspire a new concurrent priority queue implementation, which improves upon the state of the art in terms of practical performance

    Inherent Limitations of Hybrid Transactional Memory

    Full text link
    Several Hybrid Transactional Memory (HyTM) schemes have recently been proposed to complement the fast, but best-effort, nature of Hardware Transactional Memory (HTM) with a slow, reliable software backup. However, the fundamental limitations of building a HyTM with nontrivial concurrency between hardware and software transactions are still not well understood. In this paper, we propose a general model for HyTM implementations, which captures the ability of hardware transactions to buffer memory accesses, and allows us to formally quantify and analyze the amount of overhead (instrumentation) of a HyTM scheme. We prove the following: (1) it is impossible to build a strictly serializable HyTM implementation that has both uninstrumented reads and writes, even for weak progress guarantees, and (2) under reasonable assumptions, in any opaque progressive HyTM, a hardware transaction must incur instrumentation costs linear in the size of its data set. We further provide two upper bound implementations whose instrumentation costs are optimal with respect to their progress guarantees. In sum, this paper captures for the first time an inherent trade-off between the degree of concurrency a HyTM provides between hardware and software transactions, and the amount of instrumentation overhead the implementation must incur

    Designing a High Throughput Bounded Multi-Producer, Multi-Consumer Queue

    Get PDF
    Multi-Producer Multi-Consumer (MPMC) Queues are the most natural way to solve the common parallel programming Producer-Consumer Problem. The problem arises when a group of entities need work to be done, and another group of entities are responsible for doing said work. The problem is then of how should work be allocated to the workers such that neither group is hindered by the communication required for work allocation. MPMC Queues can solve the problem of allocation by functioning as a global location for work requests to be posted by the former group and later removed to be acted on by the latter group joining the two, but as the amount of work to be done by a system increases, this singular connection can easily become a bottleneck preventing work from being done. Current high performance MPMC Queue implementations strictly enforce that work posted first will be scheduled to a worker first, and while this improves the latency of a system, it can greatly decrease the overall work throughput, crippling bulk data-processing application performance. This project aims to create an MPMC Queue that is focused on overall throughput and investigate what performance optimizations can be made by sacrificing the standard latency guarantees

    On the Inherent Sequentiality of Concurrent Objects

    No full text
    We present Ω(n) lower bounds on the worst case time to perform a single instance of an operation in any nonblocking implementation of a large class of concurrent data structures shared by n processes. Time is measured by the number of stalls a process incurs as a result of contention with other processes. For standard data structures such as counters, stacks, and queues, our bounds are tight. The implementations considered may apply any primitives to a base object. No upper bounds are assumed on either the number of base objects or their size
    corecore