Guaranteeing the \~O(AGM/OUT) Runtime for Uniform Sampling and OUT Size
  Estimation over Joins

Fletcher, George; Ha, Jaehyun; Han, Wook-Shin; Kim, Kyoungmin

Guaranteeing the \~O(AGM/OUT) Runtime for Uniform Sampling and OUT Size Estimation over Joins

Authors: George Fletcher
Jaehyun Ha
Wook-Shin Han
Kyoungmin Kim
Publication date: 9 April 2023
Publisher

Abstract

We propose a new method for estimating the number of answers OUT of a small join query Q in a large database D, and for uniform sampling over joins. Our method is the first to satisfy all the following statements. - Support arbitrary Q, which can be either acyclic or cyclic, and contain binary and non-binary relations. - Guarantee an arbitrary small error with a high probability always in \~O(AGM/OUT) time, where AGM is the AGM bound OUT (an upper bound of OUT), and \~O hides the polylogarithmic factor of input size. We also explain previous join size estimators in a unified framework. All methods including ours rely on certain indexes on relations in D, which take linear time to build offline. Additionally, we extend our method using generalized hypertree decompositions (GHDs) to achieve a lower complexity than \~O(AGM/OUT) when OUT is small, and present optimization techniques for improving estimation efficiency and accuracy.Comment: 19 page

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2304.00715

Last time updated on 08/04/2023