We propose a new method for estimating the number of answers OUT of a small
join query Q in a large database D, and for uniform sampling over joins. Our
method is the first to satisfy all the following statements. - Support
arbitrary Q, which can be either acyclic or cyclic, and contain binary and
non-binary relations. - Guarantee an arbitrary small error with a high
probability always in \~O(AGM/OUT) time, where AGM is the AGM bound OUT (an
upper bound of OUT), and \~O hides the polylogarithmic factor of input size. We
also explain previous join size estimators in a unified framework. All methods
including ours rely on certain indexes on relations in D, which take linear
time to build offline. Additionally, we extend our method using generalized
hypertree decompositions (GHDs) to achieve a lower complexity than \~O(AGM/OUT)
when OUT is small, and present optimization techniques for improving estimation
efficiency and accuracy.Comment: 19 page