6,228 research outputs found
The Case for Learned Index Structures
Indexes are models: a B-Tree-Index can be seen as a model to map a key to the
position of a record within a sorted array, a Hash-Index as a model to map a
key to a position of a record within an unsorted array, and a BitMap-Index as a
model to indicate if a data record exists or not. In this exploratory research
paper, we start from this premise and posit that all existing index structures
can be replaced with other types of models, including deep-learning models,
which we term learned indexes. The key idea is that a model can learn the sort
order or structure of lookup keys and use this signal to effectively predict
the position or existence of records. We theoretically analyze under which
conditions learned indexes outperform traditional index structures and describe
the main challenges in designing learned index structures. Our initial results
show, that by using neural nets we are able to outperform cache-optimized
B-Trees by up to 70% in speed while saving an order-of-magnitude in memory over
several real-world data sets. More importantly though, we believe that the idea
of replacing core components of a data management system through learned models
has far reaching implications for future systems designs and that this work
just provides a glimpse of what might be possible
A Simple Method for Organizing Nearly Optimal Binary Search Trees
Improving the efficiency of retrieving information concerns users of computer systems involved in many applications- One way of addressing this concern is to organize a sorted sequence into a binary search tree. Knuth\u27s Algorithm K is a bottom-up organization algorithm that always constructs a binary tree which minimizes average search time. However, the cost of executing Algorithm K is prohibitive for a large tree. The aim of this work is to find a less costly method of organizing sorted sequences into nearly-optimal binary search trees.
We present a top-down organization method which yields better average search times than top-down methods already available, specifically height-balancing and weight-balancing. The variation in access frequency among the members of a sequence is used to recommend specific values for some of the parameters in this new method of organization.
The new method improves considerably on the cost of organization as opposed to the cost of using Algorithm K while producing trees whose average search times are close to minimal. The new algorithm yields an average search time that is usually within 1% of the minimal average search time and for every case attempted has been no worse than 1.5% larger than minimal
Parallel Weighted Random Sampling
Data structures for efficient sampling from a set of weighted items are an important building block of many applications. However, few parallel solutions are known. We close many of these gaps both for shared-memory and distributed-memory machines. We give efficient, fast, and practicable algorithms for sampling single items, k items with/without replacement, permutations, subsets, and reservoirs. We also give improved sequential algorithms for alias table construction and for sampling with replacement. Experiments on shared-memory parallel machines with up to 158 threads show near linear speedups both for construction and queries
Fast Parallel Operations on Search Trees
Using (a,b)-trees as an example, we show how to perform a parallel split with
logarithmic latency and parallel join, bulk updates, intersection, union (or
merge), and (symmetric) set difference with logarithmic latency and with
information theoretically optimal work. We present both asymptotically optimal
solutions and simplified versions that perform well in practice - they are
several times faster than previous implementations
Optimal (Randomized) Parallel Algorithms in the Binary-Forking Model
In this paper we develop optimal algorithms in the binary-forking model for a
variety of fundamental problems, including sorting, semisorting, list ranking,
tree contraction, range minima, and ordered set union, intersection and
difference. In the binary-forking model, tasks can only fork into two child
tasks, but can do so recursively and asynchronously. The tasks share memory,
supporting reads, writes and test-and-sets. Costs are measured in terms of work
(total number of instructions), and span (longest dependence chain).
The binary-forking model is meant to capture both algorithm performance and
algorithm-design considerations on many existing multithreaded languages, which
are also asynchronous and rely on binary forks either explicitly or under the
covers. In contrast to the widely studied PRAM model, it does not assume
arbitrary-way forks nor synchronous operations, both of which are hard to
implement in modern hardware. While optimal PRAM algorithms are known for the
problems studied herein, it turns out that arbitrary-way forking and strict
synchronization are powerful, if unrealistic, capabilities. Natural simulations
of these PRAM algorithms in the binary-forking model (i.e., implementations in
existing parallel languages) incur an overhead in span. This
paper explores techniques for designing optimal algorithms when limited to
binary forking and assuming asynchrony. All algorithms described in this paper
are the first algorithms with optimal work and span in the binary-forking
model. Most of the algorithms are simple. Many are randomized
Performance Anomalies in Concurrent Data Structure Microbenchmarks
Recent decades have witnessed a surge in the development of concurrent data
structures with an increasing interest in data structures implementing
concurrent sets (CSets). Microbenchmarking tools are frequently utilized to
evaluate and compare the performance differences across concurrent data
structures. The underlying structure and design of the microbenchmarks
themselves can play a hidden but influential role in performance results.
However, the impact of microbenchmark design has not been well investigated. In
this work, we illustrate instances where concurrent data structure performance
results reported by a microbenchmark can vary 10-100x depending on the
microbenchmark implementation details. We investigate factors leading to
performance variance across three popular microbenchmarks and outline cases in
which flawed microbenchmark design can lead to an inversion of performance
results between two concurrent data structure implementations. We further
derive a prescriptive approach for best practices in the design and utilization
of concurrent data structure microbenchmarks
- …