59,414 research outputs found
Learning Multi-dimensional Indexes
Scanning and filtering over multi-dimensional tables are key operations in
modern analytical database engines. To optimize the performance of these
operations, databases often create clustered indexes over a single dimension or
multi-dimensional indexes such as R-trees, or use complex sort orders (e.g.,
Z-ordering). However, these schemes are often hard to tune and their
performance is inconsistent across different datasets and queries. In this
paper, we introduce Flood, a multi-dimensional in-memory index that
automatically adapts itself to a particular dataset and workload by jointly
optimizing the index structure and data storage. Flood achieves up to three
orders of magnitude faster performance for range scans with predicates than
state-of-the-art multi-dimensional indexes or sort orders on real-world
datasets and workloads. Our work serves as a building block towards an
end-to-end learned database system
LMSFC: A Novel Multidimensional Index based on Learned Monotonic Space Filling Curves
The recently proposed learned indexes have attracted much attention as they
can adapt to the actual data and query distributions to attain better search
efficiency. Based on this technique, several existing works build up indexes
for multi-dimensional data and achieve improved query performance. A common
paradigm of these works is to (i) map multi-dimensional data points to a
one-dimensional space using a fixed space-filling curve (SFC) or its variant
and (ii) then apply the learned indexing techniques. We notice that the first
step typically uses a fixed SFC method, such as row-major order and z-order. It
definitely limits the potential of learned multi-dimensional indexes to adapt
variable data distributions via different query workloads. In this paper, we
propose a novel idea of learning a space-filling curve that is carefully
designed and actively optimized for efficient query processing. We also
identify innovative offline and online optimization opportunities common to
SFC-based learned indexes and offer optimal and/or heuristic solutions.
Experimental results demonstrate that our proposed method, LMSFC, outperforms
state-of-the-art non-learned or learned methods across three commonly used
real-world datasets and diverse experimental settings.Comment: Extended Version. Accepted by VLDB 202
HDIdx: High-Dimensional Indexing for Efficient Approximate Nearest Neighbor Search
Fast Nearest Neighbor (NN) search is a fundamental challenge in large-scale
data processing and analytics, particularly for analyzing multimedia contents
which are often of high dimensionality. Instead of using exact NN search,
extensive research efforts have been focusing on approximate NN search
algorithms. In this work, we present "HDIdx", an efficient high-dimensional
indexing library for fast approximate NN search, which is open-source and
written in Python. It offers a family of state-of-the-art algorithms that
convert input high-dimensional vectors into compact binary codes, making them
very efficient and scalable for NN search with very low space complexity
The Case for Learned Index Structures
Indexes are models: a B-Tree-Index can be seen as a model to map a key to the
position of a record within a sorted array, a Hash-Index as a model to map a
key to a position of a record within an unsorted array, and a BitMap-Index as a
model to indicate if a data record exists or not. In this exploratory research
paper, we start from this premise and posit that all existing index structures
can be replaced with other types of models, including deep-learning models,
which we term learned indexes. The key idea is that a model can learn the sort
order or structure of lookup keys and use this signal to effectively predict
the position or existence of records. We theoretically analyze under which
conditions learned indexes outperform traditional index structures and describe
the main challenges in designing learned index structures. Our initial results
show, that by using neural nets we are able to outperform cache-optimized
B-Trees by up to 70% in speed while saving an order-of-magnitude in memory over
several real-world data sets. More importantly though, we believe that the idea
of replacing core components of a data management system through learned models
has far reaching implications for future systems designs and that this work
just provides a glimpse of what might be possible
Lecture Notes of Tensor Network Contractions
Tensor network (TN), a young mathematical tool of high vitality and great
potential, has been undergoing extremely rapid developments in the last two
decades, gaining tremendous success in condensed matter physics, atomic
physics, quantum information science, statistical physics, and so on. In this
lecture notes, we focus on the contraction algorithms of TN as well as some of
the applications to the simulations of quantum many-body systems. Starting from
basic concepts and definitions, we first explain the relations between TN and
physical problems, including the TN representations of classical partition
functions, quantum many-body states (by matrix product state, tree TN, and
projected entangled pair state), time evolution simulations, etc. These
problems, which are challenging to solve, can be transformed to TN contraction
problems. We present then several paradigm algorithms based on the ideas of the
numerical renormalization group and/or boundary states, including density
matrix renormalization group, time-evolving block decimation,
coarse-graining/corner tensor renormalization group, and several distinguished
variational algorithms. Finally, we revisit the TN approaches from the
perspective of multi-linear algebra (also known as tensor algebra or tensor
decompositions) and quantum simulation. Despite the apparent differences in the
ideas and strategies of different TN algorithms, we aim at revealing the
underlying relations and resemblances in order to present a systematic picture
to understand the TN contraction approaches.Comment: 134 pages, 68 figures. In this version, the manuscript has been
changed into the format of book; new sections about tensor network and
quantum circuits have been adde
- …