Search CORE

2 research outputs found

Consistent and Flexible Selectivity Estimation for High-dimensional Data

Author: Ishikawa Yoshiharu
Makoto Onizuka
Mao Rui
Qin Jianbin
Wang Wei
Wang Yaoshu
Xiao Chuan
Zhang Rui
Publication venue
Publication date: 13/07/2020
Field of study

Selectivity estimation aims at estimating the number of database objects that satisfy a selection criterion. Answering this problem accurately and efficiently is essential to many applications, such as density estimation, outlier detection, query optimization, and data integration. The estimation problem is especially challenging for large-scale high-dimensional data due to the curse of dimensionality, the large variance of selectivity across different queries, and the need to make the estimator consistent (i.e., the selectivity is non-decreasing in the threshold). We propose a new deep learning-based model that learns a query-dependent piecewise linear function as selectivity estimator, which is flexible to fit the selectivity curve of any query object and threshold, while guaranteeing that the output is non-decreasing in the threshold. To improve the accuracy for large datasets, we propose to partition the dataset into multiple disjoint subsets and build a local model on each of them. We perform experiments on real datasets and show that the proposed model significantly outperforms state-of-the-art models in accuracy and is competitive in efficiency

arXiv.org e-Print Archive

Query Estimation By Adaptive Sampling

Author: Amr El Abbadi
Divyakant Agrawal
Yi-leh Wu
Publication venue
Publication date: 01/01/2002
Field of study

The ability to provide accurate and efficient result estimations of user queries is very important for the query optimizer in database systems. In this paper, we show that the traditional estimation techniques with data reduction points of view do not produce satisfiable estimation results if the query patterns are dynamically changing. We further show that to reduce query estimation error, instead of accurately capturing the data distribution, it is more effective to capture the user query patterns. In this paper, we propose query estimation techniques that can adapt to user query patterns for more accurate estimates of the size of selection or range queries over databases

CiteSeerX