39,137 research outputs found

    VerdictDB: Universalizing Approximate Query Processing

    Full text link
    Despite 25 years of research in academia, approximate query processing (AQP) has had little industrial adoption. One of the major causes of this slow adoption is the reluctance of traditional vendors to make radical changes to their legacy codebases, and the preoccupation of newer vendors (e.g., SQL-on-Hadoop products) with implementing standard features. Additionally, the few AQP engines that are available are each tied to a specific platform and require users to completely abandon their existing databases---an unrealistic expectation given the infancy of the AQP technology. Therefore, we argue that a universal solution is needed: a database-agnostic approximation engine that will widen the reach of this emerging technology across various platforms. Our proposal, called VerdictDB, uses a middleware architecture that requires no changes to the backend database, and thus, can work with all off-the-shelf engines. Operating at the driver-level, VerdictDB intercepts analytical queries issued to the database and rewrites them into another query that, if executed by any standard relational engine, will yield sufficient information for computing an approximate answer. VerdictDB uses the returned result set to compute an approximate answer and error estimates, which are then passed on to the user or application. However, lack of access to the query execution layer introduces significant challenges in terms of generality, correctness, and efficiency. This paper shows how VerdictDB overcomes these challenges and delivers up to 171×\times speedup (18.45×\times on average) for a variety of existing engines, such as Impala, Spark SQL, and Amazon Redshift, while incurring less than 2.6% relative error. VerdictDB is open-sourced under Apache License.Comment: Extended technical report of the paper that appeared in Proceedings of the 2018 International Conference on Management of Data, pp. 1461-1476. ACM, 201

    Rapid Sampling for Visualizations with Ordering Guarantees

    Get PDF
    Visualizations are frequently used as a means to understand trends and gather insights from datasets, but often take a long time to generate. In this paper, we focus on the problem of rapidly generating approximate visualizations while preserving crucial visual proper- ties of interest to analysts. Our primary focus will be on sampling algorithms that preserve the visual property of ordering; our techniques will also apply to some other visual properties. For instance, our algorithms can be used to generate an approximate visualization of a bar chart very rapidly, where the comparisons between any two bars are correct. We formally show that our sampling algorithms are generally applicable and provably optimal in theory, in that they do not take more samples than necessary to generate the visualizations with ordering guarantees. They also work well in practice, correctly ordering output groups while taking orders of magnitude fewer samples and much less time than conventional sampling schemes.Comment: Tech Report. 17 pages. Condensed version to appear in VLDB Vol. 8 No.

    Quantum algorithm for the Boolean hidden shift problem

    Get PDF
    The hidden shift problem is a natural place to look for new separations between classical and quantum models of computation. One advantage of this problem is its flexibility, since it can be defined for a whole range of functions and a whole range of underlying groups. In a way, this distinguishes it from the hidden subgroup problem where more stringent requirements about the existence of a periodic subgroup have to be made. And yet, the hidden shift problem proves to be rich enough to capture interesting features of problems of algebraic, geometric, and combinatorial flavor. We present a quantum algorithm to identify the hidden shift for any Boolean function. Using Fourier analysis for Boolean functions we relate the time and query complexity of the algorithm to an intrinsic property of the function, namely its minimum influence. We show that for randomly chosen functions the time complexity of the algorithm is polynomial. Based on this we show an average case exponential separation between classical and quantum time complexity. A perhaps interesting aspect of this work is that, while the extremal case of the Boolean hidden shift problem over so-called bent functions can be reduced to a hidden subgroup problem over an abelian group, the more general case studied here does not seem to allow such a reduction.Comment: 10 pages, 1 figur
    • …
    corecore