8 research outputs found

    Brief Announcement: Jiffy: A Fast, Memory Efficient, Wait-Free Multi-Producers Single-Consumer Queue

    Get PDF
    In applications such as sharded data processing systems, data flow programming and load sharing applications, multiple concurrent data producers are feeding requests into the same data consumer. This can be naturally realized through concurrent queues, where each consumer pulls its tasks from its dedicated queue. For scalability, wait-free queues are often preferred over lock based structures. The vast majority of wait-free queue implementations, and even lock-free ones, support the multi-producer multi-consumer model. Yet, this comes at a premium, since implementing wait-free multi-producer multi-consumer queues requires utilizing complex helper data structures. The latter increases the memory consumption of such queues and limits their performance and scalability. Additionally, many such designs employ (hardware) cache unfriendly memory access patterns. In this work we study the implementation of wait-free multi-producer single-consumer queues. Specifically, we propose Jiffy, an efficient memory frugal novel wait-free multi-producer single-consumer queue and formally prove its correctness. We then compare the performance and memory requirements of Jiffy with other state of the art lock-free and wait-free queues. We show that indeed Jiffy can maintain good performance with up to 128 threads, delivers better throughput than other constructions we compared against, and consumes less memory

    A Heap-Based Concurrent Priority Queue with Mutable Priorities for Faster Parallel Algorithms

    Get PDF
    Existing concurrent priority queues do not allow to update the priority of an element after its insertion. As a result, algorithms that need this functionality, such as Dijkstra\u27s single source shortest path algorithm, resort to cumbersome and inefficient workarounds. We report on a heap-based concurrent priority queue which allows to change the priority of an element after its insertion. We show that the enriched interface allows to express Dijkstra\u27s algorithm in a more natural way, and that its implementation, using our concurrent priority queue, outperform existing algorithms

    Checking Linearizability of Concurrent Priority Queues

    Get PDF
    Efficient implementations of concurrent objects such as atomic collections are essential to modern computing. Programming such objects is error prone: in minimizing the synchronization overhead between concurrent object invocations, one risks the conformance to sequential specifications -- or in formal terms, one risks violating linearizability. Unfortunately, verifying linearizability is undecidable in general, even on classes of implementations where the usual control-state reachability is decidable. In this work we consider concurrent priority queues which are fundamental to many multi-threaded applications such as task scheduling or discrete event simulation, and show that verifying linearizability of such implementations can be reduced to control-state reachability. This reduction entails the first decidability results for verifying concurrent priority queues in the context of an unbounded number of threads, and it enables the application of existing safety-verification tools for establishing their correctness.Comment: An extended abstract is published in the Proceedings of CONCUR 201

    Towards Scalable Parallel Fibonacci Heap Implementation

    Get PDF
    With the advancement of multiple processors, the sequential algorithms are being investigated and gradually substituted for its concurrent equivalent to effectively exploit the parallel architecture. Parallel algorithms speed up the performance by dividing the task into a number of processes (or threads) that can be scheduled and executed simultaneously in independent processing units. Various well-known basic algorithms and data-structures have been explored for its efficient parallel counterparts and have been published as popular libraries. However, advanced data-structures and algorithms have not seen similar investigation mainly because they have many optimization steps mostly backed by many states and finding safe and efficient parallel implementation isn’t an easy endeavor. Safety concerns for shared-memory parallel implementation are of utmost importance as it provides a basis for consistency of any data structure and algorithm. There are well-known tools like locks, semaphores, atomic operations and so on that assist towards safe parallel implementation but using them effectively and in well-defined synchronization are key factors in the overall performance of any data-structures and algorithms. This paper explores an advanced data structure, Fibonacci Heap, and its operations to evaluate its implementation using two different synchronization mechanisms: Coarse-grained and Fine-grained. The analysis in this paper shows that a fine-grained synchronized Fibonacci Heap implementation with certainly relaxed semantics is more scalable with growing number of concurrency in comparison to the coarse-grained synchronized Fibonacci Heap implementation

    A Comparison of Concurrent Correctness Criteria for Shared Memory Based Data Structure

    Get PDF
    Developing concurrent algorithms requires safety and liveness to be defined in order to understand their proper behavior. Safety refers to the correctness criteria while liveness is the progress guarantee. Nowadays there are a variety of correctness conditions for concurrent objects. The way these correctness conditions differ and the various trade-offs they present with respect to performance, usability, and progress guarantees is poorly understood. This presents a daunting task for the developers and users of such concurrent algorithms who are trying to better understand the correctness of their code and the various trade-offs associated with their design choices and use. The purpose of this study is to explore the set of known correctness conditions for concurrent objects, find their correlations and categorize them, and provide insights regarding their implications with respect to performance and usability. In this thesis, a comparative study of Linearizability, Sequential Consistency, Quiescent Consistency and Quasi Linearizability will be presented using data structures like FIFO Queues, Stacks, and Priority Queues, and with a case study for performance of these implementations using different correctness criteria

    High-Performance Composable Transactional Data Structures

    Get PDF
    Exploiting the parallelism in multiprocessor systems is a major challenge in the post ``power wall\u27\u27 era. Programming for multicore demands a change in the way we design and use fundamental data structures. Concurrent data structures allow scalable and thread-safe accesses to shared data. They provide operations that appear to take effect atomically when invoked individually. A main obstacle to the practical use of concurrent data structures is their inability to support composable operations, i.e., to execute multiple operations atomically in a transactional manner. The problem stems from the inability of concurrent data structure to ensure atomicity of transactions composed from operations on a single or multiple data structures instances. This greatly hinders software reuse because users can only invoke data structure operations in a limited number of ways. Existing solutions, such as software transactional memory (STM) and transactional boosting, manage transaction synchronization in an external layer separated from the data structure\u27s own thread-level concurrency control. Although this reduces programming effort, it leads to significant overhead associated with additional synchronization and the need to rollback aborted transactions. In this dissertation, I address the practicality and efficiency concerns by designing, implementing, and evaluating high-performance transactional data structures that facilitate the development of future highly concurrent software systems. Firstly, I present two methodologies for implementing high-performance transactional data structures based on existing concurrent data structures using either lock-based or lock-free synchronizations. For lock-based data structures, the idea is to treat data accessed by multiple operations as resources. The challenge is for each thread to acquire exclusive access to desired resources while preventing deadlock or starvation. Existing locking strategies, like two-phase locking and resource hierarchy, suffer from performance degradation under heavy contention, while lacking a desirable fairness guarantee. To overcome these issues, I introduce a scalable lock algorithm for shared-memory multiprocessors addressing the resource allocation problem. It is the first multi-resource lock algorithm that guarantees the strongest first-in, first-out (FIFO) fairness. For lock-free data structures, I present a methodology for transforming them into high-performance lock-free transactional data structures without revamping the data structures\u27 original synchronization design. My approach leverages the semantic knowledge of the data structure to eliminate the overhead of false conflicts and rollbacks. Secondly, I apply the proposed methodologies and present a suite of novel transactional search data structures in the form of an open source library. This is interesting not only because the fundamental importance of search data structures in computer science and their wide use in real world programs, but also because it demonstrate the implementation issues that arise when using the methodologies I have developed. This library is not only a compilation of a large number of fundamental data structures for multiprocessor applications, but also a framework for enabling composable transactions, and moreover, an infrastructure for continuous integration of new data structures. By taking such a top-down approach, I am able to identify and consider the interplay of data structure interface operations as a whole, which allows for scrutinizing their commutativity rules and hence opens up possibilities for design optimizations. Lastly, I evaluate the throughput of the proposed data structures using transactions with randomly generated operations on two difference hardware systems. To ensure the strongest possible competition, I chose the best performing alternatives from state-of-the-art locking protocols and transactional memory systems in the literature. The results show that it is straightforward to build efficient transactional data structures when using my multi-resource lock as a drop-in replacement for transactional boosted data structures. Furthermore, this work shows that it is possible to build efficient lock-free transactional data structures with all perceived benefits of lock-freedom and with performance far better than generic transactional memory systems

    Scalable Concurrent Priority Queue Algorithms

    No full text
    This paper addresses the problem of designing bounded range priority queues, that is, queues that support a fixed range of priorities. Bounded range priority queues are fundamental in the design of modern multiprocessor algorithms -- from the application level to lowest levels of the operating system kernel. While most of the available priority queue literature is directed at existing small-scale machines, we chose to evaluate algorithms on a broader concurrency scale using a simulated 256 node shared memory multiprocessor architecture similar to the MIT Alewife. Our empirical evidence suggests that the priority queue algorithms currently available in the literature do not scale. Based on these findings, we present two simple new algorithms, LinearFunnels and FunnelTree, that provide true scalability throughout the concurrency range. 1 Introduction Priority queues are a fundamental class of data structures used in the design of modern multiprocessor algorithms. Their uses r..
    corecore