312 research outputs found

    Estimating Cardinalities with Deep Sketches

    Full text link
    We introduce Deep Sketches, which are compact models of databases that allow us to estimate the result sizes of SQL queries. Deep Sketches are powered by a new deep learning approach to cardinality estimation that can capture correlations between columns, even across tables. Our demonstration allows users to define such sketches on the TPC-H and IMDb datasets, monitor the training process, and run ad-hoc queries against trained sketches. We also estimate query cardinalities with HyPer and PostgreSQL to visualize the gains over traditional cardinality estimators.Comment: To appear in SIGMOD'1

    Tsunami: A Learned Multi-dimensional Index for Correlated Data and Skewed Workloads

    Full text link
    Filtering data based on predicates is one of the most fundamental operations for any modern data warehouse. Techniques to accelerate the execution of filter expressions include clustered indexes, specialized sort orders (e.g., Z-order), multi-dimensional indexes, and, for high selectivity queries, secondary indexes. However, these schemes are hard to tune and their performance is inconsistent. Recent work on learned multi-dimensional indexes has introduced the idea of automatically optimizing an index for a particular dataset and workload. However, the performance of that work suffers in the presence of correlated data and skewed query workloads, both of which are common in real applications. In this paper, we introduce Tsunami, which addresses these limitations to achieve up to 6X faster query performance and up to 8X smaller index size than existing learned multi-dimensional indexes, in addition to up to 11X faster query performance and 170X smaller index size than optimally-tuned traditional indexes

    GLIN: A Lightweight Learned Indexing Mechanism for Complex Geometries

    Full text link
    Although spatial index structures shorten the query response time, they rely on complex tree structures to narrow down the search space. Such structures in turn yield additional storage overhead and take a toll on index maintenance. Recently, there has been a flurry on works attempting to leverage machine-Learning(ML) models to simplify the index structures. Some follow-up works extend the idea to support geospatial point data. These approaches partition the multidimensional space to cells and assign IDs to these cells using space-filling curve(e.g., Z-order curve) or mathematical equations. These approaches work well for geospatial points but are not able to handle complex geometries such as polygons and trajectories which are widely available in geospatial data. This paper introduces GLIN, a lightweight learned index for spatial range queries on complex geometries. To achieve that, GLIN transforms geometries to Z-address intervals, and builds a hierarchical model to learn the cumulative distribution function between these intervals and the record positions. The lightweight hierarchical model greatly shortens the index probing time. Furthermore, GLIN augments spatial query windows using an add-on function to guarantee the query accuracy for both Contains and Intersects spatial relationships. Our experiments on real-world and synthetic datasets show that GLIN occupies 40-70 times less storage overhead than popular spatial indexes such as Quad-Tree while still showing similar query response time in medium selectivity queries. Moreover, GLIN's maintenance speed is around 1.5 times higher on insertion and 3-5 times higher on deletion
    • …
    corecore