91 research outputs found

    The Adaptive Priority Queue with Elimination and Combining

    Full text link
    Priority queues are fundamental abstract data structures, often used to manage limited resources in parallel programming. Several proposed parallel priority queue implementations are based on skiplists, harnessing the potential for parallelism of the add() operations. In addition, methods such as Flat Combining have been proposed to reduce contention by batching together multiple operations to be executed by a single thread. While this technique can decrease lock-switching overhead and the number of pointer changes required by the removeMin() operations in the priority queue, it can also create a sequential bottleneck and limit parallelism, especially for non-conflicting add() operations. In this paper, we describe a novel priority queue design, harnessing the scalability of parallel insertions in conjunction with the efficiency of batched removals. Moreover, we present a new elimination algorithm suitable for a priority queue, which further increases concurrency on balanced workloads with similar numbers of add() and removeMin() operations. We implement and evaluate our design using a variety of techniques including locking, atomic operations, hardware transactional memory, as well as employing adaptive heuristics given the workload.Comment: Accepted at DISC'14 - this is the full version with appendices, including more algorithm

    Non-blocking Priority Queue based on Skiplists with Relaxed Semantics

    Full text link
    Priority queues are data structures that store information in an orderly fashion. They are of tremendous importance because they are an integral part of many applications, like Dijkstra’s shortest path algorithm, MST algorithms, priority schedulers, and so on. Since priority queues by nature have high contention on the delete_min operation, the design of an efficient priority queue should involve an intelligent choice of the data structure as well as relaxation bounds on the data structure. Lock-free data structures provide higher scalability as well as progress guarantee than a lock-based data structure. That is another factor to be considered in the priority queue design. We present a relaxed non-blocking priority queue based on skiplists. We address all the design issues mentioned above in our priority queue. Use of skiplists allows multiple threads to concurrently access different parts of the skiplist quickly, whereas relaxing the priority queue delete_min operation distributes contention over the skiplist instead of just at the front. Furthermore, a non-blocking implementation guarantees that the system will make progress even when some process fails. Our priority queue is internally composed of several priority queues, one for each thread and one shared priority queue common to all threads. Each thread selects the best value from its local priority queue and the shared priority queue and returns the value. In case a thread is unable to delete an item, it tries to spy items from other threads\u27 local priority queues. We experimentally and theoretically show the correctness of our data structure. We also compare the performance of our data structure with other variations like priority queues based on coarse-grained skiplists for both relaxed and non-relaxed semantics

    A Heap-Based Concurrent Priority Queue with Mutable Priorities for Faster Parallel Algorithms

    Get PDF
    Existing concurrent priority queues do not allow to update the priority of an element after its insertion. As a result, algorithms that need this functionality, such as Dijkstra\u27s single source shortest path algorithm, resort to cumbersome and inefficient workarounds. We report on a heap-based concurrent priority queue which allows to change the priority of an element after its insertion. We show that the enriched interface allows to express Dijkstra\u27s algorithm in a more natural way, and that its implementation, using our concurrent priority queue, outperform existing algorithms

    The Lock-free kk-LSM Relaxed Priority Queue

    Full text link
    Priority queues are data structures which store keys in an ordered fashion to allow efficient access to the minimal (maximal) key. Priority queues are essential for many applications, e.g., Dijkstra's single-source shortest path algorithm, branch-and-bound algorithms, and prioritized schedulers. Efficient multiprocessor computing requires implementations of basic data structures that can be used concurrently and scale to large numbers of threads and cores. Lock-free data structures promise superior scalability by avoiding blocking synchronization primitives, but the \emph{delete-min} operation is an inherent scalability bottleneck in concurrent priority queues. Recent work has focused on alleviating this obstacle either by batching operations, or by relaxing the requirements to the \emph{delete-min} operation. We present a new, lock-free priority queue that relaxes the \emph{delete-min} operation so that it is allowed to delete \emph{any} of the ρ+1\rho+1 smallest keys, where ρ\rho is a runtime configurable parameter. Additionally, the behavior is identical to a non-relaxed priority queue for items added and removed by the same thread. The priority queue is built from a logarithmic number of sorted arrays in a way similar to log-structured merge-trees. We experimentally compare our priority queue to recent state-of-the-art lock-free priority queues, both with relaxed and non-relaxed semantics, showing high performance and good scalability of our approach.Comment: Short version as ACM PPoPP'15 poste

    A Bulk-Parallel Priority Queue in External Memory with STXXL

    Get PDF
    We propose the design and an implementation of a bulk-parallel external memory priority queue to take advantage of both shared-memory parallelism and high external memory transfer speeds to parallel disks. To achieve higher performance by decoupling item insertions and extractions, we offer two parallelization interfaces: one using "bulk" sequences, the other by defining "limit" items. In the design, we discuss how to parallelize insertions using multiple heaps, and how to calculate a dynamic prediction sequence to prefetch blocks and apply parallel multiway merge for extraction. Our experimental results show that in the selected benchmarks the priority queue reaches 75% of the full parallel I/O bandwidth of rotational disks and and 65% of SSDs, or the speed of sorting in external memory when bounded by computation.Comment: extended version of SEA'15 conference pape
    corecore