39,137 research outputs found
VerdictDB: Universalizing Approximate Query Processing
Despite 25 years of research in academia, approximate query processing (AQP)
has had little industrial adoption. One of the major causes of this slow
adoption is the reluctance of traditional vendors to make radical changes to
their legacy codebases, and the preoccupation of newer vendors (e.g.,
SQL-on-Hadoop products) with implementing standard features. Additionally, the
few AQP engines that are available are each tied to a specific platform and
require users to completely abandon their existing databases---an unrealistic
expectation given the infancy of the AQP technology. Therefore, we argue that a
universal solution is needed: a database-agnostic approximation engine that
will widen the reach of this emerging technology across various platforms.
Our proposal, called VerdictDB, uses a middleware architecture that requires
no changes to the backend database, and thus, can work with all off-the-shelf
engines. Operating at the driver-level, VerdictDB intercepts analytical queries
issued to the database and rewrites them into another query that, if executed
by any standard relational engine, will yield sufficient information for
computing an approximate answer. VerdictDB uses the returned result set to
compute an approximate answer and error estimates, which are then passed on to
the user or application. However, lack of access to the query execution layer
introduces significant challenges in terms of generality, correctness, and
efficiency. This paper shows how VerdictDB overcomes these challenges and
delivers up to 171 speedup (18.45 on average) for a variety of
existing engines, such as Impala, Spark SQL, and Amazon Redshift, while
incurring less than 2.6% relative error. VerdictDB is open-sourced under Apache
License.Comment: Extended technical report of the paper that appeared in Proceedings
of the 2018 International Conference on Management of Data, pp. 1461-1476.
ACM, 201
Rapid Sampling for Visualizations with Ordering Guarantees
Visualizations are frequently used as a means to understand trends and gather
insights from datasets, but often take a long time to generate. In this paper,
we focus on the problem of rapidly generating approximate visualizations while
preserving crucial visual proper- ties of interest to analysts. Our primary
focus will be on sampling algorithms that preserve the visual property of
ordering; our techniques will also apply to some other visual properties. For
instance, our algorithms can be used to generate an approximate visualization
of a bar chart very rapidly, where the comparisons between any two bars are
correct. We formally show that our sampling algorithms are generally applicable
and provably optimal in theory, in that they do not take more samples than
necessary to generate the visualizations with ordering guarantees. They also
work well in practice, correctly ordering output groups while taking orders of
magnitude fewer samples and much less time than conventional sampling schemes.Comment: Tech Report. 17 pages. Condensed version to appear in VLDB Vol. 8 No.
Quantum algorithm for the Boolean hidden shift problem
The hidden shift problem is a natural place to look for new separations
between classical and quantum models of computation. One advantage of this
problem is its flexibility, since it can be defined for a whole range of
functions and a whole range of underlying groups. In a way, this distinguishes
it from the hidden subgroup problem where more stringent requirements about the
existence of a periodic subgroup have to be made. And yet, the hidden shift
problem proves to be rich enough to capture interesting features of problems of
algebraic, geometric, and combinatorial flavor. We present a quantum algorithm
to identify the hidden shift for any Boolean function. Using Fourier analysis
for Boolean functions we relate the time and query complexity of the algorithm
to an intrinsic property of the function, namely its minimum influence. We show
that for randomly chosen functions the time complexity of the algorithm is
polynomial. Based on this we show an average case exponential separation
between classical and quantum time complexity. A perhaps interesting aspect of
this work is that, while the extremal case of the Boolean hidden shift problem
over so-called bent functions can be reduced to a hidden subgroup problem over
an abelian group, the more general case studied here does not seem to allow
such a reduction.Comment: 10 pages, 1 figur
- …