Search CORE

2 research outputs found

Variable Skipping for Autoregressive Range Density Estimation

Author: Abbeel Pieter
Chen Xi
Duan Yan
Liang Eric
Stoica Ion
Yang Zongheng
Publication venue
Publication date: 10/07/2020
Field of study

Deep autoregressive models compute point likelihood estimates of individual data points. However, many applications (i.e., database cardinality estimation) require estimating range densities, a capability that is under-explored by current neural density estimation literature. In these applications, fast and accurate range density estimates over high-dimensional data directly impact user-perceived performance. In this paper, we explore a technique, variable skipping, for accelerating range density estimation over deep autoregressive models. This technique exploits the sparse structure of range density queries to avoid sampling unnecessary variables during approximate inference. We show that variable skipping provides 10-100

\times

efficiency improvements when targeting challenging high-quantile error metrics, enables complex applications such as text pattern matching, and can be realized via a simple data augmentation procedure without changing the usual maximum likelihood objective.Comment: ICML 2020. Code released at: https://var-skip.github.io

arXiv.org e-Print Archive

FLAT: Fast, Lightweight and Accurate Method for Cardinality Estimation

Author: Cui Bin
Han Yuxing
Pfadler Andreas
Qian Zhengping
Wu Ziniu
Zeng Kai
Zhou Jingren
Zhu Rong
Publication venue
Publication date: 05/03/2021
Field of study

Query optimizers rely on accurate cardinality estimation (CardEst) to produce good execution plans. The core problem of CardEst is how to model the rich joint distribution of attributes in an accurate and compact manner. Despite decades of research, existing methods either over simplify the models only using independent factorization which leads to inaccurate estimates, or over complicate them by lossless conditional factorization without any independent assumption which results in slow probability computation. In this paper, we propose FLAT, a CardEst method that is simultaneously fast in probability computation, lightweight in model size and accurate in estimation quality. The key idea of FLAT is a novel unsupervised graphical model, called FSPN. It utilizes both independent and conditional factorization to adaptively model different levels of attributes correlations, and thus dovetails their advantages. FLAT supports efficient online probability computation in near liner time on the underlying FSPN model, provides effective offline model construction and enables incremental model updates. It can estimate cardinality for both single table queries and multi table join queries. Extensive experimental study demonstrates the superiority of FLAT over existing CardEst methods on well known IMDB benchmarks: FLAT achieves 1 to 5 orders of magnitude better accuracy, 1 to 3 orders of magnitude faster probability computation speed and 1 to 2 orders of magnitude lower storage cost. We also integrate FLAT into Postgres to perform an end to end test. It improves the query execution time by 12.9% on the benchmark workload, which is very close to the optimal result 14.2% using the true cardinality.Comment: Technical Report of the FLAT Submissio

arXiv.org e-Print Archive