1 research outputs found
COAX: Correlation-Aware Indexing on Multidimensional Data with Soft Functional Dependencies
Recent work proposed learned index structures, which learn the distribution
of the underlying dataset to improve performance. The initial work on learned
indexes has shown that by learning the cumulative distribution function of the
data, index structures such as the B-Tree can improve their performance by one
order of magnitude while having a smaller memory footprint.
In this paper, we present COAX, a learned index for multidimensional data
that, instead of learning the distribution of keys, learns the correlations
between attributes of the dataset. Our approach is driven by the observation
that in many datasets, values of two (or multiple) attributes are correlated.
COAX exploits these correlations to reduce the dimensionality of the datasets.
More precisely, we learn how to infer one (or multiple) attribute from
the remaining attributes and hence no longer need to index attribute .
This reduces the dimensionality and hence makes the index smaller and more
efficient.
We theoretically investigate the effectiveness of the proposed technique
based on the predictability of the FD attributes. We further show
experimentally that by predicting correlated attributes in the data, we can
improve the query execution time and reduce the memory overhead of the index.
In our experiments, we reduce the execution time by 25% while reducing the
memory footprint of the index by four orders of magnitude