2 research outputs found
Variable Skipping for Autoregressive Range Density Estimation
Deep autoregressive models compute point likelihood estimates of individual
data points. However, many applications (i.e., database cardinality estimation)
require estimating range densities, a capability that is under-explored by
current neural density estimation literature. In these applications, fast and
accurate range density estimates over high-dimensional data directly impact
user-perceived performance. In this paper, we explore a technique, variable
skipping, for accelerating range density estimation over deep autoregressive
models. This technique exploits the sparse structure of range density queries
to avoid sampling unnecessary variables during approximate inference. We show
that variable skipping provides 10-100 efficiency improvements when
targeting challenging high-quantile error metrics, enables complex applications
such as text pattern matching, and can be realized via a simple data
augmentation procedure without changing the usual maximum likelihood objective.Comment: ICML 2020. Code released at: https://var-skip.github.io
FLAT: Fast, Lightweight and Accurate Method for Cardinality Estimation
Query optimizers rely on accurate cardinality estimation (CardEst) to produce
good execution plans. The core problem of CardEst is how to model the rich
joint distribution of attributes in an accurate and compact manner. Despite
decades of research, existing methods either over simplify the models only
using independent factorization which leads to inaccurate estimates, or over
complicate them by lossless conditional factorization without any independent
assumption which results in slow probability computation. In this paper, we
propose FLAT, a CardEst method that is simultaneously fast in probability
computation, lightweight in model size and accurate in estimation quality. The
key idea of FLAT is a novel unsupervised graphical model, called FSPN. It
utilizes both independent and conditional factorization to adaptively model
different levels of attributes correlations, and thus dovetails their
advantages. FLAT supports efficient online probability computation in near
liner time on the underlying FSPN model, provides effective offline model
construction and enables incremental model updates. It can estimate cardinality
for both single table queries and multi table join queries. Extensive
experimental study demonstrates the superiority of FLAT over existing CardEst
methods on well known IMDB benchmarks: FLAT achieves 1 to 5 orders of magnitude
better accuracy, 1 to 3 orders of magnitude faster probability computation
speed and 1 to 2 orders of magnitude lower storage cost. We also integrate FLAT
into Postgres to perform an end to end test. It improves the query execution
time by 12.9% on the benchmark workload, which is very close to the optimal
result 14.2% using the true cardinality.Comment: Technical Report of the FLAT Submissio