128 research outputs found

    Almost Wait-free Resizable Hashtables

    Get PDF

    Almost Wait-free Resizable Hashtables

    Get PDF

    Crafting Concurrent Data Structures

    Get PDF
    Concurrent data structures lie at the heart of modern parallel programs. The design and implementation of concurrent data structures can be challenging due to the demand for good performance (low latency and high scalability) and strong progress guarantees. In this dissertation, we enrich the knowledge of concurrent data structure design by proposing new implementations, as well as general techniques to improve the performance of existing ones.The first part of the dissertation present an unordered linked list implementation that supports nonblocking insert, remove, and lookup operations. The algorithm is based on a novel ``enlist\u27\u27 technique that greatly simplifies the task of achieving wait-freedom. The value of our technique is also demonstrated in the creation of other wait-free data structures such as stacks and hash tables.The second data structure presented is a nonblocking hash table implementation which solves a long-standing design challenge by permitting the hash table to dynamically adjust its size in a nonblocking manner. Additionally, our hash table offers strong theoretical properties such as supporting unbounded memory. In our algorithm, we introduce a new ``freezable set\u27\u27 abstraction which allows us to achieve atomic migration of keys during a resize. The freezable set abstraction also enables highly efficient implementations which maximally exploit the processor cache locality. In experiments, we found our lock-free hash table performs consistently better than state-of-the-art implementations, such as the split-ordered list.The third data structure we present is a concurrent priority queue called the ``mound\u27\u27. Our implementations include nonblocking and lock-based variants. The mound employs randomization to reduce contention on concurrent insert operations, and decomposes a remove operation into smaller atomic operations so that multiple remove operations can execute in parallel within a pipeline. In experiments, we show that the mound can provide excellent latency at low thread counts.Lastly, we discuss how hardware transactional memory (HTM) can be used to accelerate existing nonblocking concurrent data structure implementations. We propose optimization techniques that can significantly improve the performance (1.5x to 3x speedups) of a variety of important concurrent data structures, such as binary search trees and hash tables. The optimizations also preserve the strong progress guarantees of the original implementations

    Boosting Multi-Core Reachability Performance with Shared Hash Tables

    Get PDF
    This paper focuses on data structures for multi-core reachability, which is a key component in model checking algorithms and other verification methods. A cornerstone of an efficient solution is the storage of visited states. In related work, static partitioning of the state space was combined with thread-local storage and resulted in reasonable speedups, but left open whether improvements are possible. In this paper, we present a scaling solution for shared state storage which is based on a lockless hash table implementation. The solution is specifically designed for the cache architecture of modern CPUs. Because model checking algorithms impose loose requirements on the hash table operations, their design can be streamlined substantially compared to related work on lockless hash tables. Still, an implementation of the hash table presented here has dozens of sensitive performance parameters (bucket size, cache line size, data layout, probing sequence, etc.). We analyzed their impact and compared the resulting speedups with related tools. Our implementation outperforms two state-of-the-art multi-core model checkers (SPIN and DiVinE) by a substantial margin, while placing fewer constraints on the load balancing and search algorithms.Comment: preliminary repor

    Lock-free atom garbage collection for multithreaded Prolog

    Get PDF
    The runtime system of dynamic languages such as Prolog or Lisp and their derivatives contain a symbol table, in Prolog often called the atom table. A simple dynamically resizing hash-table used to be an adequate way to implement this table. As Prolog becomes fashionable for 24x7 server processes we need to deal with atom garbage collection and concurrent access to the atom table. Classical lock-based implementations to ensure consistency of the atom table scale poorly and a stop-the-world approach to implement atom garbage collection quickly becomes a bottle-neck, making Prolog unsuitable for soft real-time applications. In this article we describe a novel implementation for the atom table using lock-free techniques where the atom-table remains accessible even during atom garbage collection. Relying only on CAS (Compare And Swap) and not on external libraries, the implementation is straightforward and portable. Under consideration for acceptance in TPLP.Comment: Paper presented at the 32nd International Conference on Logic Programming (ICLP 2016), New York City, USA, 16-21 October 2016, 14 pages, LaTeX, 4 PDF figure

    Non-Blocking Dynamic Unbounded Graphs with Worst-Case Amortized Bounds

    Get PDF

    The Design, Implementation, and Refinement of Wait-Free Algorithms and Containers

    Get PDF
    My research has been on the development of concurrent algorithms for shared memory systems that provide guarantees of progress. Research into such algorithms is important to developers implementing applications on mission critical and time sensitive systems. These guarantees of progress provide safety properties and freedom from many hazards, such as dead-lock, live-lock, and thread starvation. In addition to the safety concerns, the fine-grained synchronization used in implementing these algorithms promises to provide scalable performance in massively parallel systems. My research has resulted in the development of wait-free versions of the stack, hash map, ring buffer, vector, and a multi-word compare-and-swap algorithms. Through this experience, I have learned and developed new techniques and methodologies for implementing non-blocking and wait-free algorithms. I have worked with and refined existing techniques to improve their practicality and applicability. In the creation of the aforementioned algorithms, I have developed an association model for use with descriptor-based operations. This model, originally developed for the multi-word compare-and-swap algorithm, has been applied to the design of the vector and ring buffer algorithms. To unify these algorithms and techniques, I have released Tervel, a wait-free library of common algorithms and containers. This library includes a framework that simplifies and improves the design of non-blocking algorithms. I have reimplemented several algorithms using this framework and the resulting implementation exhibits less code duplication and fewer perceivable states. When reimplementing algorithms, I have adapted their Application Programming Interface (API) specification to remove ambiguity and non-deterministic behavior found when using a sequential API in a concurrent environment. To improve the performance of my algorithm implementations, I extended OVIS\u27s Lightweight Distributed Metric Service (LDMS)\u27s data collection and transport system to support performance monitoring using perf_event and PAPI libraries. These libraries have provided me with deeper insights into the behavior of my algorithms, and I was able to use these insights to improve the design and performance of my algorithms
    • …
    corecore