3,848 research outputs found
DeltaTree: A Practical Locality-aware Concurrent Search Tree
As other fundamental programming abstractions in energy-efficient computing,
search trees are expected to support both high parallelism and data locality.
However, existing highly-concurrent search trees such as red-black trees and
AVL trees do not consider data locality while existing locality-aware search
trees such as those based on the van Emde Boas layout (vEB-based trees), poorly
support concurrent (update) operations.
This paper presents DeltaTree, a practical locality-aware concurrent search
tree that combines both locality-optimisation techniques from vEB-based trees
and concurrency-optimisation techniques from non-blocking highly-concurrent
search trees. DeltaTree is a -ary leaf-oriented tree of DeltaNodes in which
each DeltaNode is a size-fixed tree-container with the van Emde Boas layout.
The expected memory transfer costs of DeltaTree's Search, Insert, and Delete
operations are , where are the tree size and the unknown
memory block size in the ideal cache model, respectively. DeltaTree's Search
operation is wait-free, providing prioritised lanes for Search operations, the
dominant operation in search trees. Its Insert and {\em Delete} operations are
non-blocking to other Search, Insert, and Delete operations, but they may be
occasionally blocked by maintenance operations that are sometimes triggered to
keep DeltaTree in good shape. Our experimental evaluation using the latest
implementation of AVL, red-black, and speculation friendly trees from the
Synchrobench benchmark has shown that DeltaTree is up to 5 times faster than
all of the three concurrent search trees for searching operations and up to 1.6
times faster for update operations when the update contention is not too high
Runtime Optimizations for Prediction with Tree-Based Models
Tree-based models have proven to be an effective solution for web ranking as
well as other problems in diverse domains. This paper focuses on optimizing the
runtime performance of applying such models to make predictions, given an
already-trained model. Although exceedingly simple conceptually, most
implementations of tree-based models do not efficiently utilize modern
superscalar processor architectures. By laying out data structures in memory in
a more cache-conscious fashion, removing branches from the execution flow using
a technique called predication, and micro-batching predictions using a
technique called vectorization, we are able to better exploit modern processor
architectures and significantly improve the speed of tree-based models over
hard-coded if-else blocks. Our work contributes to the exploration of
architecture-conscious runtime implementations of machine learning algorithms
- …