Search CORE

2 research outputs found

DeepSampling: Selectivity Estimation with Predicted Error and Response Time

Author: Eldawy Ahmed
Vu Tin
Publication venue
Publication date: 15/08/2020
Field of study

The rapid growth of spatial data urges the research community to find efficient processing techniques for interactive queries on large volumes of data. Approximate Query Processing (AQP) is the most prominent technique that can provide real-time answer for ad-hoc queries based on a random sample. Unfortunately, existing AQP methods provide an answer without providing any accuracy metrics due to the complex relationship between the sample size, the query parameters, the data distribution, and the result accuracy. This paper proposes DeepSampling, a deep-learning-based model that predicts the accuracy of a sample-based AQP algorithm, specially selectivity estimation, given the sample size, the input distribution, and query parameters. The model can also be reversed to measure the sample size that would produce a desired accuracy. DeepSampling is the first system that provides a reliable tool for existing spatial databases to control the accuracy of AQP.Comment: 9 pages, published in DeepSpatial 202

arXiv.org e-Print Archive

Analyzing Range Queries on Spatial Data

Author: Anand Sivasubramaniam
Ji Jin
Ning An
Publication venue
Publication date
Field of study

Analysis of range queries on spatial (multidimensional) data is both important and challenging. Most previous analysis attempts have made certain simplifying assumptions about the datasets and/or queries to keep the analysis tractable. As a result, they may not be universally applicable. This paper proposes a set of five analysis techniques to estimate the selectivity and number of index nodes accessed in serving a range query. The underlying philosophy behind these techniques is to maintain an auxiliary data structure called a density file, whose creation is a onetime cost, which can be quickly consulted when the query is given. The schemes differ in what information is kept in the density file, how it is maintained, and how this information is looked up. It is shown that one of the proposed schemes, called Cumulative Density (CD), gives very accurate results (usually less than 5 % error) using a diverse suite of point and rectangular datasets, that are uniform or skewed, and a wide range of query window parameters. The estimation takes a constant amount of time, which is typically lower than 1 % of the time that it would take to execute the query, regardless of dataset or query window parameters. 1

CiteSeerX