312 research outputs found
Estimating Cardinalities with Deep Sketches
We introduce Deep Sketches, which are compact models of databases that allow
us to estimate the result sizes of SQL queries. Deep Sketches are powered by a
new deep learning approach to cardinality estimation that can capture
correlations between columns, even across tables. Our demonstration allows
users to define such sketches on the TPC-H and IMDb datasets, monitor the
training process, and run ad-hoc queries against trained sketches. We also
estimate query cardinalities with HyPer and PostgreSQL to visualize the gains
over traditional cardinality estimators.Comment: To appear in SIGMOD'1
Tsunami: A Learned Multi-dimensional Index for Correlated Data and Skewed Workloads
Filtering data based on predicates is one of the most fundamental operations
for any modern data warehouse. Techniques to accelerate the execution of filter
expressions include clustered indexes, specialized sort orders (e.g., Z-order),
multi-dimensional indexes, and, for high selectivity queries, secondary
indexes. However, these schemes are hard to tune and their performance is
inconsistent. Recent work on learned multi-dimensional indexes has introduced
the idea of automatically optimizing an index for a particular dataset and
workload. However, the performance of that work suffers in the presence of
correlated data and skewed query workloads, both of which are common in real
applications. In this paper, we introduce Tsunami, which addresses these
limitations to achieve up to 6X faster query performance and up to 8X smaller
index size than existing learned multi-dimensional indexes, in addition to up
to 11X faster query performance and 170X smaller index size than
optimally-tuned traditional indexes
GLIN: A Lightweight Learned Indexing Mechanism for Complex Geometries
Although spatial index structures shorten the query response time, they rely
on complex tree structures to narrow down the search space. Such structures in
turn yield additional storage overhead and take a toll on index maintenance.
Recently, there has been a flurry on works attempting to leverage
machine-Learning(ML) models to simplify the index structures. Some follow-up
works extend the idea to support geospatial point data. These approaches
partition the multidimensional space to cells and assign IDs to these cells
using space-filling curve(e.g., Z-order curve) or mathematical equations. These
approaches work well for geospatial points but are not able to handle complex
geometries such as polygons and trajectories which are widely available in
geospatial data.
This paper introduces GLIN, a lightweight learned index for spatial range
queries on complex geometries. To achieve that, GLIN transforms geometries to
Z-address intervals, and builds a hierarchical model to learn the cumulative
distribution function between these intervals and the record positions. The
lightweight hierarchical model greatly shortens the index probing time.
Furthermore, GLIN augments spatial query windows using an add-on function to
guarantee the query accuracy for both Contains and Intersects spatial
relationships. Our experiments on real-world and synthetic datasets show that
GLIN occupies 40-70 times less storage overhead than popular spatial indexes
such as Quad-Tree while still showing similar query response time in medium
selectivity queries. Moreover, GLIN's maintenance speed is around 1.5 times
higher on insertion and 3-5 times higher on deletion
- …