Search CORE

9,985 research outputs found

The Case for Learned Index Structures

Author: Abadi M.
Armbrust M.
Böhm M.
Chang F.
Goodfellow I.
Grossi R.
Lehman T. J.
Litwin W.
Magdon-Ismail M.
Miller D. J.
Moerkotte G.
Sutskever I.
You S.
Publication venue
Publication date: 30/04/2018
Field of study

Indexes are models: a B-Tree-Index can be seen as a model to map a key to the position of a record within a sorted array, a Hash-Index as a model to map a key to a position of a record within an unsorted array, and a BitMap-Index as a model to indicate if a data record exists or not. In this exploratory research paper, we start from this premise and posit that all existing index structures can be replaced with other types of models, including deep-learning models, which we term learned indexes. The key idea is that a model can learn the sort order or structure of lookup keys and use this signal to effectively predict the position or existence of records. We theoretically analyze under which conditions learned indexes outperform traditional index structures and describe the main challenges in designing learned index structures. Our initial results show, that by using neural nets we are able to outperform cache-optimized B-Trees by up to 70% in speed while saving an order-of-magnitude in memory over several real-world data sets. More importantly though, we believe that the idea of replacing core components of a data management system through learned models has far reaching implications for future systems designs and that this work just provides a glimpse of what might be possible

arXiv.org e-Print Archive

Crossref

Approximate Range Emptiness in Constant Time and Optimal Space

Author: Goswami M.
Grønlund A.
Larsen K.
Pagh R.
Publication venue
Publication date: 10/07/2014
Field of study

This paper studies the \emph{

\varepsilon

-approximate range emptiness} problem, where the task is to represent a set

S

n

points from

\{0,\ldots,U-1\}

and answer emptiness queries of the form "

[a ; b]\cap S \neq \emptyset

?" with a probability of \emph{false positives} allowed. This generalizes the functionality of \emph{Bloom filters} from single point queries to any interval length

L

. Setting the false positive rate to

\varepsilon/L

and performing

L

queries, Bloom filters yield a solution to this problem with space

O(n \lg(L/\varepsilon))

bits, false positive probability bounded by

\varepsilon

for intervals of length up to

L

, using query time

O(L \lg(L/\varepsilon))

. Our first contribution is to show that the space/error trade-off cannot be improved asymptotically: Any data structure for answering approximate range emptiness queries on intervals of length up to

L

with false positive probability

\varepsilon

, must use space

\Omega(n \lg(L/\varepsilon)) - O(n)

bits. On the positive side we show that the query time can be improved greatly, to constant time, while matching our space lower bound up to a lower order additive term. This result is achieved through a succinct data structure for (non-approximate 1d) range emptiness/reporting queries, which may be of independent interest

MPG.PuRe

Bloom Filters in Adversarial Environments

Author: Naor Moni
Yogev Eylon
Publication venue
Publication date: 29/01/2019
Field of study

Many efficient data structures use randomness, allowing them to improve upon deterministic ones. Usually, their efficiency and correctness are analyzed using probabilistic tools under the assumption that the inputs and queries are independent of the internal randomness of the data structure. In this work, we consider data structures in a more robust model, which we call the adversarial model. Roughly speaking, this model allows an adversary to choose inputs and queries adaptively according to previous responses. Specifically, we consider a data structure known as "Bloom filter" and prove a tight connection between Bloom filters in this model and cryptography. A Bloom filter represents a set

S

of elements approximately, by using fewer bits than a precise representation. The price for succinctness is allowing some errors: for any

x \in S

it should always answer `Yes', and for any

x \notin S

it should answer `Yes' only with small probability. In the adversarial model, we consider both efficient adversaries (that run in polynomial time) and computationally unbounded adversaries that are only bounded in the number of queries they can make. For computationally bounded adversaries, we show that non-trivial (memory-wise) Bloom filters exist if and only if one-way functions exist. For unbounded adversaries we show that there exists a Bloom filter for sets of size

n

and error

\varepsilon

, that is secure against

t

queries and uses only

O(n \log{\frac{1}{\varepsilon}}+t)

bits of memory. In comparison,

n\log{\frac{1}{\varepsilon}}

is the best possible under a non-adaptive adversary

arXiv.org e-Print Archive

Cryptology ePrint Archive