75,945 research outputs found

    Efficient Lock-free Binary Search Trees

    Full text link
    In this paper we present a novel algorithm for concurrent lock-free internal binary search trees (BST) and implement a Set abstract data type (ADT) based on that. We show that in the presented lock-free BST algorithm the amortized step complexity of each set operation - {\sc Add}, {\sc Remove} and {\sc Contains} - is O(H(n)+c)O(H(n) + c), where, H(n)H(n) is the height of BST with nn number of nodes and cc is the contention during the execution. Our algorithm adapts to contention measures according to read-write load. If the situation is read-heavy, the operations avoid helping pending concurrent {\sc Remove} operations during traversal, and, adapt to interval contention. However, for write-heavy situations we let an operation help pending {\sc Remove}, even though it is not obstructed, and so adapt to tighter point contention. It uses single-word compare-and-swap (\texttt{CAS}) operations. We show that our algorithm has improved disjoint-access-parallelism compared to similar existing algorithms. We prove that the presented algorithm is linearizable. To the best of our knowledge this is the first algorithm for any concurrent tree data structure in which the modify operations are performed with an additive term of contention measure.Comment: 15 pages, 3 figures, submitted to POD

    A VLSI Architecture for Concurrent Data Structures

    Get PDF
    Concurrent data structures simplify the development of concurrent programs by encapsulating commonly used mechanisms for synchronization and communication into data structures. This thesis develops a notation for describing concurrent data structures, presents examples of concurrent data structures, and describes an architecture to support concurrent data structures. Concurrent Smalltalk (CST), a derivative of Smalltalk-80 with extensions for concurrency, is developed to describe concurrent data structures. CST allows the programmer to specify objects that are distributed over the nodes of a concurrent computer. These distributed objects have many constituent objects and thus can process many messages simultaneously. They are the foundation upon which concurrent data structures are built. The balanced cube is a concurrent data structure for ordered sets. The set is distributed by a balanced recursive partition that maps to the subcubes of a binary n-cube using a Gray code. A search algorithm, VW search, based on the distance properties of the Gray code, searches a balanced cube in O(log N) time. Because it does not have the root bottleneck that limits all tree-based data structures to O(1) concurrency, the balanced cube achieves O(~og ) concurrency. Considering graphs as concurrent data structures, graph algorithms are presented for the shortest path problem, the mix-flow problem, and graph partitioning. These algorithms introduce new synchronization techniques to achieve better performance than existing algorithms. A message-passing, concurrent architecture is developed that exploits the characteristics of VLSI technology to support concurrent data structures. Interconnection topologies are compared on the basis of dimension. It is shown that minimum latency is achieved with a very low dimensional network. A deadlock-free routing strategy is developed for this class of networks, and a prototype VLSI chip implementing this strategy is described. A message-driven processor complements the network by responding to messages with a very low latency. The processor directly executes messages, eliminating a level of interpretation. To take advantage of the performance offered by specialization while at the same time retaining flexibility, processing elements can be specialized to operate on a single class of objects. These object experts accelerate the performance of all applications using this class

    DeltaTree: A Practical Locality-aware Concurrent Search Tree

    Full text link
    As other fundamental programming abstractions in energy-efficient computing, search trees are expected to support both high parallelism and data locality. However, existing highly-concurrent search trees such as red-black trees and AVL trees do not consider data locality while existing locality-aware search trees such as those based on the van Emde Boas layout (vEB-based trees), poorly support concurrent (update) operations. This paper presents DeltaTree, a practical locality-aware concurrent search tree that combines both locality-optimisation techniques from vEB-based trees and concurrency-optimisation techniques from non-blocking highly-concurrent search trees. DeltaTree is a kk-ary leaf-oriented tree of DeltaNodes in which each DeltaNode is a size-fixed tree-container with the van Emde Boas layout. The expected memory transfer costs of DeltaTree's Search, Insert, and Delete operations are O(logBN)O(\log_B N), where N,BN, B are the tree size and the unknown memory block size in the ideal cache model, respectively. DeltaTree's Search operation is wait-free, providing prioritised lanes for Search operations, the dominant operation in search trees. Its Insert and {\em Delete} operations are non-blocking to other Search, Insert, and Delete operations, but they may be occasionally blocked by maintenance operations that are sometimes triggered to keep DeltaTree in good shape. Our experimental evaluation using the latest implementation of AVL, red-black, and speculation friendly trees from the Synchrobench benchmark has shown that DeltaTree is up to 5 times faster than all of the three concurrent search trees for searching operations and up to 1.6 times faster for update operations when the update contention is not too high

    A Unified approach to concurrent and parallel algorithms on balanced data structures

    Get PDF
    Concurrent and parallel algorithms are different. However, in the case of dictionaries, both kinds of algorithms share many common points. We present a unified approach emphasizing these points. It is based on a careful analysis of the sequential algorithm, extracting from it the more basic facts, encapsulated later on as local rules. We apply the method to the insertion algorithms in AVL trees. All the concurrent and parallel insertion algorithms have two main phases. A percolation phase, moving the keys to be inserted down, and a rebalancing phase. Finally, some other algorithms and balanced structures are discussed.Postprint (published version

    A Template for Implementing Fast Lock-free Trees Using HTM

    Full text link
    Algorithms that use hardware transactional memory (HTM) must provide a software-only fallback path to guarantee progress. The design of the fallback path can have a profound impact on performance. If the fallback path is allowed to run concurrently with hardware transactions, then hardware transactions must be instrumented, adding significant overhead. Otherwise, hardware transactions must wait for any processes on the fallback path, causing concurrency bottlenecks, or move to the fallback path. We introduce an approach that combines the best of both worlds. The key idea is to use three execution paths: an HTM fast path, an HTM middle path, and a software fallback path, such that the middle path can run concurrently with each of the other two. The fast path and fallback path do not run concurrently, so the fast path incurs no instrumentation overhead. Furthermore, fast path transactions can move to the middle path instead of waiting or moving to the software path. We demonstrate our approach by producing an accelerated version of the tree update template of Brown et al., which can be used to implement fast lock-free data structures based on down-trees. We used the accelerated template to implement two lock-free trees: a binary search tree (BST), and an (a,b)-tree (a generalization of a B-tree). Experiments show that, with 72 concurrent processes, our accelerated (a,b)-tree performs between 4.0x and 4.2x as many operations per second as an implementation obtained using the original tree update template
    corecore