Search CORE

312 research outputs found

Estimating Cardinalities with Deep Sketches

Author: Boncz Peter
Kemper Alfons
Kipf Andreas
Kipf Thomas
Leis Viktor
Müller Jonas
Neumann Thomas
Radke Bernhard
Vorona Dimitri
Publication venue
Publication date: 17/04/2019
Field of study

We introduce Deep Sketches, which are compact models of databases that allow us to estimate the result sizes of SQL queries. Deep Sketches are powered by a new deep learning approach to cardinality estimation that can capture correlations between columns, even across tables. Our demonstration allows users to define such sketches on the TPC-H and IMDb datasets, monitor the training process, and run ad-hoc queries against trained sketches. We also estimate query cardinalities with HyPer and PostgreSQL to visualize the gains over traditional cardinality estimators.Comment: To appear in SIGMOD'1

arXiv.org e-Print Archive

Crossref

Tsunami: A Learned Multi-dimensional Index for Correlated Data and Skewed Workloads

Author: Alizadeh Mohammad
Ding Jialin
Kraska Tim
Nathan Vikram
Publication venue
Publication date: 23/06/2020
Field of study

Filtering data based on predicates is one of the most fundamental operations for any modern data warehouse. Techniques to accelerate the execution of filter expressions include clustered indexes, specialized sort orders (e.g., Z-order), multi-dimensional indexes, and, for high selectivity queries, secondary indexes. However, these schemes are hard to tune and their performance is inconsistent. Recent work on learned multi-dimensional indexes has introduced the idea of automatically optimizing an index for a particular dataset and workload. However, the performance of that work suffers in the presence of correlated data and skewed query workloads, both of which are common in real applications. In this paper, we introduce Tsunami, which addresses these limitations to achieve up to 6X faster query performance and up to 8X smaller index size than existing learned multi-dimensional indexes, in addition to up to 11X faster query performance and 170X smaller index size than optimally-tuned traditional indexes

arXiv.org e-Print Archive

DSpace@MIT

GLIN: A Lightweight Learned Indexing Mechanism for Complex Geometries

Author: Wang Congying
Yu Jia
Publication venue
Publication date: 15/07/2022
Field of study

Although spatial index structures shorten the query response time, they rely on complex tree structures to narrow down the search space. Such structures in turn yield additional storage overhead and take a toll on index maintenance. Recently, there has been a flurry on works attempting to leverage machine-Learning(ML) models to simplify the index structures. Some follow-up works extend the idea to support geospatial point data. These approaches partition the multidimensional space to cells and assign IDs to these cells using space-filling curve(e.g., Z-order curve) or mathematical equations. These approaches work well for geospatial points but are not able to handle complex geometries such as polygons and trajectories which are widely available in geospatial data. This paper introduces GLIN, a lightweight learned index for spatial range queries on complex geometries. To achieve that, GLIN transforms geometries to Z-address intervals, and builds a hierarchical model to learn the cumulative distribution function between these intervals and the record positions. The lightweight hierarchical model greatly shortens the index probing time. Furthermore, GLIN augments spatial query windows using an add-on function to guarantee the query accuracy for both Contains and Intersects spatial relationships. Our experiments on real-world and synthetic datasets show that GLIN occupies 40-70 times less storage overhead than popular spatial indexes such as Quad-Tree while still showing similar query response time in medium selectivity queries. Moreover, GLIN's maintenance speed is around 1.5 times higher on insertion and 3-5 times higher on deletion

arXiv.org e-Print Archive

Welcome to Sigmod 2019 - The 2019 ACM SIGMOD International Conference on the Management of Data!

Author: Ailamaki A. (Anastasia)
Boncz P.A. (Peter)
Manegold S. (Stefan)
Publication venue
Publication date: 30/06/2019
Field of study

CWI's Institutional Repository

Proceedings of the 2019 International Conference on Management of Data

Author
Publication venue
Publication date: 30/06/2019
Field of study

CWI's Institutional Repository