Linear Sketches for Approximate Aggregate Range Queries 1,2

Abstract

Answering aggregate queries approximately over multidimensional data is an important problem that arises naturally in many applications. An approach to the problem is to maintain a succinct (i.e. O(k) space) representation, called sketch, of the frequency distribution h of the data, and use ˆ h for answering queries. Common sketches are constructed via linear mappings of h onto a k–dimensional space, e.g. map h to its top–k Fourier/Wavelet coefficients. We call such sketches linear sketches, since ˆ h = P ∗ h for some sketching matrix P. Linear sketches have the benefit that they can be easily maintained incrementally over data streams. Sketches are typically optimized for approximating the data distribution, but not the answers to queries. In this paper, we are concerned with linear sketches that approximate well not only the data but also the answers to the aggregate queries. The quality of approximations is measured using the mean squared and relative errors (MSE and RLE). A query is represented by a column vector q such that its answer is q T h. A given set of queries can be represented by an appropriate query matrix Q. We show that the MSE for the queries is minimized when the sketching matrix used to construct a linear sketch of h has as columns the top-k eigenvectors of the query matrix Q. Further, if the quer

    Similar works

    Full text

    thumbnail-image

    Available Versions