185 research outputs found
The Case for Learned Index Structures
Indexes are models: a B-Tree-Index can be seen as a model to map a key to the
position of a record within a sorted array, a Hash-Index as a model to map a
key to a position of a record within an unsorted array, and a BitMap-Index as a
model to indicate if a data record exists or not. In this exploratory research
paper, we start from this premise and posit that all existing index structures
can be replaced with other types of models, including deep-learning models,
which we term learned indexes. The key idea is that a model can learn the sort
order or structure of lookup keys and use this signal to effectively predict
the position or existence of records. We theoretically analyze under which
conditions learned indexes outperform traditional index structures and describe
the main challenges in designing learned index structures. Our initial results
show, that by using neural nets we are able to outperform cache-optimized
B-Trees by up to 70% in speed while saving an order-of-magnitude in memory over
several real-world data sets. More importantly though, we believe that the idea
of replacing core components of a data management system through learned models
has far reaching implications for future systems designs and that this work
just provides a glimpse of what might be possible
RadixSpline: A Single-Pass Learned Index
Recent research has shown that learned models can outperform state-of-the-art
index structures in size and lookup performance. While this is a very promising
result, existing learned structures are often cumbersome to implement and are
slow to build. In fact, most approaches that we are aware of require multiple
training passes over the data.
We introduce RadixSpline (RS), a learned index that can be built in a single
pass over the data and is competitive with state-of-the-art learned index
models, like RMI, in size and lookup performance. We evaluate RS using the SOSD
benchmark and show that it achieves competitive results on all datasets,
despite the fact that it only has two parameters.Comment: Third International Workshop on Exploiting Artificial Intelligence
Techniques for Data Management (aiDM 2020
Estimating Cardinalities with Deep Sketches
We introduce Deep Sketches, which are compact models of databases that allow
us to estimate the result sizes of SQL queries. Deep Sketches are powered by a
new deep learning approach to cardinality estimation that can capture
correlations between columns, even across tables. Our demonstration allows
users to define such sketches on the TPC-H and IMDb datasets, monitor the
training process, and run ad-hoc queries against trained sketches. We also
estimate query cardinalities with HyPer and PostgreSQL to visualize the gains
over traditional cardinality estimators.Comment: To appear in SIGMOD'1
The Potential of Learned Index Structures for Index Compression
Inverted indexes are vital in providing fast key-word-based search. For every
term in the document collection, a list of identifiers of documents in which
the term appears is stored, along with auxiliary information such as term
frequency, and position offsets. While very effective, inverted indexes have
large memory requirements for web-sized collections. Recently, the concept of
learned index structures was introduced, where machine learned models replace
common index structures such as B-tree-indexes, hash-indexes, and
bloom-filters. These learned index structures require less memory, and can be
computationally much faster than their traditional counterparts. In this paper,
we consider whether such models may be applied to conjunctive Boolean querying.
First, we investigate how a learned model can replace document postings of an
inverted index, and then evaluate the compromises such an approach might have.
Second, we evaluate the potential gains that can be achieved in terms of memory
requirements. Our work shows that learned models have great potential in
inverted indexing, and this direction seems to be a promising area for future
research.Comment: Will appear in the proceedings of ADCS'1
Towards Better Interpretability in Deep Q-Networks
Deep reinforcement learning techniques have demonstrated superior performance
in a wide variety of environments. As improvements in training algorithms
continue at a brisk pace, theoretical or empirical studies on understanding
what these networks seem to learn, are far behind. In this paper we propose an
interpretable neural network architecture for Q-learning which provides a
global explanation of the model's behavior using key-value memories, attention
and reconstructible embeddings. With a directed exploration strategy, our model
can reach training rewards comparable to the state-of-the-art deep Q-learning
models. However, results suggest that the features extracted by the neural
network are extremely shallow and subsequent testing using out-of-sample
examples shows that the agent can easily overfit to trajectories seen during
training.Comment: Accepted at AAAI-19; (16 pages, 18 figures
Ways of Applying Artificial Intelligence in Software Engineering
As Artificial Intelligence (AI) techniques have become more powerful and
easier to use they are increasingly deployed as key components of modern
software systems. While this enables new functionality and often allows better
adaptation to user needs it also creates additional problems for software
engineers and exposes companies to new risks. Some work has been done to better
understand the interaction between Software Engineering and AI but we lack
methods to classify ways of applying AI in software systems and to analyse and
understand the risks this poses. Only by doing so can we devise tools and
solutions to help mitigate them. This paper presents the AI in SE Application
Levels (AI-SEAL) taxonomy that categorises applications according to their
point of AI application, the type of AI technology used and the automation
level allowed. We show the usefulness of this taxonomy by classifying 15 papers
from previous editions of the RAISE workshop. Results show that the taxonomy
allows classification of distinct AI applications and provides insights
concerning the risks associated with them. We argue that this will be important
for companies in deciding how to apply AI in their software applications and to
create strategies for its use
Learned Cardinalities: Estimating Correlated Joins with Deep Learning
We describe a new deep learning approach to cardinality estimation. MSCN is a
multi-set convolutional network, tailored to representing relational query
plans, that employs set semantics to capture query features and true
cardinalities. MSCN builds on sampling-based estimation, addressing its
weaknesses when no sampled tuples qualify a predicate, and in capturing
join-crossing correlations. Our evaluation of MSCN using a real-world dataset
shows that deep learning significantly enhances the quality of cardinality
estimation, which is the core problem in query optimization.Comment: CIDR 2019. https://github.com/andreaskipf/learnedcardinalitie
- …