14 research outputs found
The Case for Learned Index Structures
Indexes are models: a B-Tree-Index can be seen as a model to map a key to the
position of a record within a sorted array, a Hash-Index as a model to map a
key to a position of a record within an unsorted array, and a BitMap-Index as a
model to indicate if a data record exists or not. In this exploratory research
paper, we start from this premise and posit that all existing index structures
can be replaced with other types of models, including deep-learning models,
which we term learned indexes. The key idea is that a model can learn the sort
order or structure of lookup keys and use this signal to effectively predict
the position or existence of records. We theoretically analyze under which
conditions learned indexes outperform traditional index structures and describe
the main challenges in designing learned index structures. Our initial results
show, that by using neural nets we are able to outperform cache-optimized
B-Trees by up to 70% in speed while saving an order-of-magnitude in memory over
several real-world data sets. More importantly though, we believe that the idea
of replacing core components of a data management system through learned models
has far reaching implications for future systems designs and that this work
just provides a glimpse of what might be possible
Hyperion: Building the largest in-memory search tree
Indexes are essential in data management systems to increase the speed of data retrievals. Widespread data structures to provide fast and memory-efficient indexes are prefix tries. Implementations like Judy, ART, or HOT optimize their internal alignments for cache and vector unit efficiency. While these measures usually improve the performance substantially, they can have a negative impact on memory efficiency. In this paper we present Hyperion, a trie-based main-memory key-value store achieving extreme space efficiency. In contrast to other data structures, Hyperion does not depend on CPU vector units, but scans the data structure linearly. Combined with a custom memory allocator, Hyperion accomplishes a remarkable data density while achieving a competitive point query and an exceptional range query performance. Hyperion can significantly reduce the index memory footprint, while being at least two times better concerning the performance to memory ratio compared to the best implemented alternative strategies for randomized string data sets
FITing-Tree: A Data-aware Index Structure
Index structures are one of the most important tools that DBAs leverage to
improve the performance of analytics and transactional workloads. However,
building several indexes over large datasets can often become prohibitive and
consume valuable system resources. In fact, a recent study showed that indexes
created as part of the TPC-C benchmark can account for 55% of the total memory
available in a modern DBMS. This overhead consumes valuable and expensive main
memory, and limits the amount of space available to store new data or process
existing data.
In this paper, we present FITing-Tree, a novel form of a learned index which
uses piece-wise linear functions with a bounded error specified at construction
time. This error knob provides a tunable parameter that allows a DBA to FIT an
index to a dataset and workload by being able to balance lookup performance and
space consumption. To navigate this tradeoff, we provide a cost model that
helps determine an appropriate error parameter given either (1) a lookup
latency requirement (e.g., 500ns) or (2) a storage budget (e.g., 100MB). Using
a variety of real-world datasets, we show that our index is able to provide
performance that is comparable to full index structures while reducing the
storage footprint by orders of magnitude.Comment: 18 page