75,945 research outputs found
Efficient Lock-free Binary Search Trees
In this paper we present a novel algorithm for concurrent lock-free internal
binary search trees (BST) and implement a Set abstract data type (ADT) based on
that. We show that in the presented lock-free BST algorithm the amortized step
complexity of each set operation - {\sc Add}, {\sc Remove} and {\sc Contains} -
is , where, is the height of BST with number of nodes
and is the contention during the execution. Our algorithm adapts to
contention measures according to read-write load. If the situation is
read-heavy, the operations avoid helping pending concurrent {\sc Remove}
operations during traversal, and, adapt to interval contention. However, for
write-heavy situations we let an operation help pending {\sc Remove}, even
though it is not obstructed, and so adapt to tighter point contention. It uses
single-word compare-and-swap (\texttt{CAS}) operations. We show that our
algorithm has improved disjoint-access-parallelism compared to similar existing
algorithms. We prove that the presented algorithm is linearizable. To the best
of our knowledge this is the first algorithm for any concurrent tree data
structure in which the modify operations are performed with an additive term of
contention measure.Comment: 15 pages, 3 figures, submitted to POD
A VLSI Architecture for Concurrent Data Structures
Concurrent data structures simplify the development of concurrent programs by encapsulating commonly used mechanisms for synchronization and communication into data structures. This thesis develops a notation for describing concurrent data structures, presents examples of concurrent data structures, and describes an architecture to support concurrent data structures.
Concurrent Smalltalk (CST), a derivative of Smalltalk-80 with extensions for concurrency, is developed to describe concurrent data structures. CST allows the programmer to specify objects that are distributed over the nodes of a concurrent computer. These distributed objects have many constituent objects and thus can process many messages simultaneously. They are the foundation upon which concurrent data structures are built.
The balanced cube is a concurrent data structure for ordered sets. The set is distributed by a balanced recursive partition that maps to the subcubes of a binary n-cube using a Gray code. A search algorithm, VW search, based on the distance properties of the Gray code, searches a balanced cube in O(log N) time. Because it does not have the root bottleneck that limits all tree-based data structures to O(1) concurrency, the balanced cube achieves O(~og ) concurrency.
Considering graphs as concurrent data structures, graph algorithms are presented for the shortest path problem, the mix-flow problem, and graph partitioning. These algorithms introduce new synchronization techniques to achieve better performance than existing algorithms.
A message-passing, concurrent architecture is developed that exploits the characteristics of VLSI technology to support concurrent data structures. Interconnection topologies are compared on the basis of dimension. It is shown that minimum latency is achieved with a very low dimensional network. A deadlock-free routing strategy is developed for this class of networks, and a prototype VLSI chip implementing this strategy is described. A message-driven processor complements the network by responding to messages with a very low latency. The processor directly executes messages, eliminating a level of interpretation. To take advantage of the performance offered by specialization while at the same time retaining flexibility, processing elements can be specialized to operate on a single class of objects. These object experts accelerate the performance of all applications using this class
DeltaTree: A Practical Locality-aware Concurrent Search Tree
As other fundamental programming abstractions in energy-efficient computing,
search trees are expected to support both high parallelism and data locality.
However, existing highly-concurrent search trees such as red-black trees and
AVL trees do not consider data locality while existing locality-aware search
trees such as those based on the van Emde Boas layout (vEB-based trees), poorly
support concurrent (update) operations.
This paper presents DeltaTree, a practical locality-aware concurrent search
tree that combines both locality-optimisation techniques from vEB-based trees
and concurrency-optimisation techniques from non-blocking highly-concurrent
search trees. DeltaTree is a -ary leaf-oriented tree of DeltaNodes in which
each DeltaNode is a size-fixed tree-container with the van Emde Boas layout.
The expected memory transfer costs of DeltaTree's Search, Insert, and Delete
operations are , where are the tree size and the unknown
memory block size in the ideal cache model, respectively. DeltaTree's Search
operation is wait-free, providing prioritised lanes for Search operations, the
dominant operation in search trees. Its Insert and {\em Delete} operations are
non-blocking to other Search, Insert, and Delete operations, but they may be
occasionally blocked by maintenance operations that are sometimes triggered to
keep DeltaTree in good shape. Our experimental evaluation using the latest
implementation of AVL, red-black, and speculation friendly trees from the
Synchrobench benchmark has shown that DeltaTree is up to 5 times faster than
all of the three concurrent search trees for searching operations and up to 1.6
times faster for update operations when the update contention is not too high
A Unified approach to concurrent and parallel algorithms on balanced data structures
Concurrent and parallel algorithms are different. However, in the case of dictionaries, both kinds of algorithms share many
common points. We present a unified approach emphasizing these points. It is based on a careful analysis of the sequential
algorithm, extracting from it the more basic facts, encapsulated later on as local rules. We apply the method to the
insertion algorithms in AVL trees. All the concurrent and parallel insertion algorithms have two main phases. A
percolation phase, moving the keys to be inserted down, and a rebalancing phase. Finally, some other algorithms and
balanced structures are discussed.Postprint (published version
A Template for Implementing Fast Lock-free Trees Using HTM
Algorithms that use hardware transactional memory (HTM) must provide a
software-only fallback path to guarantee progress. The design of the fallback
path can have a profound impact on performance. If the fallback path is allowed
to run concurrently with hardware transactions, then hardware transactions must
be instrumented, adding significant overhead. Otherwise, hardware transactions
must wait for any processes on the fallback path, causing concurrency
bottlenecks, or move to the fallback path. We introduce an approach that
combines the best of both worlds. The key idea is to use three execution paths:
an HTM fast path, an HTM middle path, and a software fallback path, such that
the middle path can run concurrently with each of the other two. The fast path
and fallback path do not run concurrently, so the fast path incurs no
instrumentation overhead. Furthermore, fast path transactions can move to the
middle path instead of waiting or moving to the software path. We demonstrate
our approach by producing an accelerated version of the tree update template of
Brown et al., which can be used to implement fast lock-free data structures
based on down-trees. We used the accelerated template to implement two
lock-free trees: a binary search tree (BST), and an (a,b)-tree (a
generalization of a B-tree). Experiments show that, with 72 concurrent
processes, our accelerated (a,b)-tree performs between 4.0x and 4.2x as many
operations per second as an implementation obtained using the original tree
update template
- …