6 research outputs found
How to evaluate multiple range-sum queries progressively
Decision support system users typically submit batches of range-sum queries simultaneously rather than issuing individual, unrelated queries. We propose a wavelet based technique that exploits I/O sharing across a query batch to evaluate the set of queries progressively and efficiently. The challenge is that now controlling the structure of errors across query results becomes more critical than minimizing error per individual query. Consequently, we define a class of structural error penalty functions and show how they are controlled by our technique. Experiments demonstrate that our technique is efficient as an exact algorithm, and the progressive estimates are accurate, even after less than one I/O per query
Approximation algorithms for wavelet transform coding of data streams
This paper addresses the problem of finding a B-term wavelet representation
of a given discrete function whose distance from f is
minimized. The problem is well understood when we seek to minimize the
Euclidean distance between f and its representation. The first known algorithms
for finding provably approximate representations minimizing general
distances (including ) under a wide variety of compactly supported
wavelet bases are presented in this paper. For the Haar basis, a polynomial
time approximation scheme is demonstrated. These algorithms are applicable in
the one-pass sublinear-space data stream model of computation. They generalize
naturally to multiple dimensions and weighted norms. A universal representation
that provides a provable approximation guarantee under all p-norms
simultaneously; and the first approximation algorithms for bit-budget versions
of the problem, known as adaptive quantization, are also presented. Further, it
is shown that the algorithms presented here can be used to select a basis from
a tree-structured dictionary of bases and find a B-term representation of the
given function that provably approximates its best dictionary-basis
representation.Comment: Added a universal representation that provides a provable
approximation guarantee under all p-norms simultaneousl
Optimal and Approximate Computation of Summary Statistics for Range Aggregates
Fast estimates for aggregate queries are useful in database query optimization, approximate query answering and online query processing. Hence, there has been a lot of focus on "selectivity estimation", that is, computing summary statistics on the underlying data and using that to answer aggregate queries fast and to a reasonable approximation. We present two sets of results for range aggregate queries, which are amongst the most common queries. First, we focus on a histogram as summary statistics and present algorithms for constructing histograms that are provably optimal (or provably approximate) for range queries; these algorithms take (pseudo-) polynomial time. These are the first known optimality or approximation results for arbitrary range queries; previously known results were optimal only for restricted range queries (such as equality queries, hierarchical or prefix range queries). Second, we focus on wavelet-based representations as summary statistics and present fast algorithms for picking wavelet statistics that are provably optimal for range queries. No previously-knownwavelet-based methods have this property. We perform an experimental study of the various summary representations show the benefits of our algorithms over the known methods. AT&T Labs---Research, Florham Park, NJ. fagilbert, kotidis, muthu, [email protected].