17 research outputs found

    Tight Lower Bound for Comparison-Based Quantile Summaries

    Get PDF
    Quantiles, such as the median or percentiles, provide concise and useful information about the distribution of a collection of items, drawn from a totally ordered universe. We study data structures, called quantile summaries, which keep track of all quantiles, up to an error of at most ε\varepsilon. That is, an ε\varepsilon-approximate quantile summary first processes a stream of items and then, given any quantile query 0ϕ10\le \phi\le 1, returns an item from the stream, which is a ϕ\phi'-quantile for some ϕ=ϕ±ε\phi' = \phi \pm \varepsilon. We focus on comparison-based quantile summaries that can only compare two items and are otherwise completely oblivious of the universe. The best such deterministic quantile summary to date, due to Greenwald and Khanna (SIGMOD '01), stores at most O(1εlogεN)O(\frac{1}{\varepsilon}\cdot \log \varepsilon N) items, where NN is the number of items in the stream. We prove that this space bound is optimal by showing a matching lower bound. Our result thus rules out the possibility of constructing a deterministic comparison-based quantile summary in space f(ε)o(logN)f(\varepsilon)\cdot o(\log N), for any function ff that does not depend on NN. As a corollary, we improve the lower bound for biased quantiles, which provide a stronger, relative-error guarantee of (1±ε)ϕ(1\pm \varepsilon)\cdot \phi, and for other related computational tasks.Comment: 20 pages, 2 figures, major revison of the construction (Sec. 3) and some other parts of the pape

    Optimal Gossip Algorithms for Exact and Approximate Quantile Computations

    Full text link
    This paper gives drastically faster gossip algorithms to compute exact and approximate quantiles. Gossip algorithms, which allow each node to contact a uniformly random other node in each round, have been intensely studied and been adopted in many applications due to their fast convergence and their robustness to failures. Kempe et al. [FOCS'03] gave gossip algorithms to compute important aggregate statistics if every node is given a value. In particular, they gave a beautiful O(logn+log1ϵ)O(\log n + \log \frac{1}{\epsilon}) round algorithm to ϵ\epsilon-approximate the sum of all values and an O(log2n)O(\log^2 n) round algorithm to compute the exact ϕ\phi-quantile, i.e., the the ϕn\lceil \phi n \rceil smallest value. We give an quadratically faster and in fact optimal gossip algorithm for the exact ϕ\phi-quantile problem which runs in O(logn)O(\log n) rounds. We furthermore show that one can achieve an exponential speedup if one allows for an ϵ\epsilon-approximation. We give an O(loglogn+log1ϵ)O(\log \log n + \log \frac{1}{\epsilon}) round gossip algorithm which computes a value of rank between ϕn\phi n and (ϕ+ϵ)n(\phi+\epsilon)n at every node.% for any 0ϕ10 \leq \phi \leq 1 and 0<ϵ<10 < \epsilon < 1. Our algorithms are extremely simple and very robust - they can be operated with the same running times even if every transmission fails with a, potentially different, constant probability. We also give a matching Ω(loglogn+log1ϵ)\Omega(\log \log n + \log \frac{1}{\epsilon}) lower bound which shows that our algorithm is optimal for all values of ϵ\epsilon

    SQUAD: Combining Sketching and Sampling Is Better than Either for Per-item Quantile Estimation

    Get PDF
    Latency quantiles measurements are essential as they often capture the user's utility. For example, if a video connection has high tail latency, the perceived quality will suffer, even if the average and median latencies are low. In this work, we consider the problem of approximating the per-item quantiles. Elements in our stream are (ID, latency) tuples, and we wish to track the latency quantiles for each ID. Existing quantile sketches are designed for a single number stream (e.g., containing just the latency). While one could allocate a separate sketch instance for each ID, this may require an infeasible amount of memory. Instead, we consider tracking the quantiles for the heavy hitters (most frequent items), which are often considered particularly important, without knowing them beforehand. We first present a simple sampling algorithm that serves as a benchmark. Then, we design an algorithm that augments a quantile sketch within each entry of a heavy hitter algorithm, resulting in similar space complexity but with a deterministic error guarantee. Finally, we present SQUAD, a method that combines sampling and sketching while improving the asymptotic space complexity. Intuitively, SQUAD uses a background sampling process to capture the behaviour of the latencies of an item before it is allocated with a sketch, thereby allowing us to use fewer samples and sketches. Our solutions are rigorously analyzed, and we demonstrate the superiority of our approach using extensive simulations

    Streaming algorithms for bin packing and vector scheduling

    Get PDF
    Problems involving the efficient arrangement of simple objects, as captured by bin packing and makespan scheduling, are fundamental tasks in combinatorial optimization. These are well understood in the traditional online and offline cases, but have been less well-studied when the volume of the input is truly massive, and cannot even be read into memory. This is captured by the streaming model of computation, where the aim is to approximate the cost of the solution in one pass over the data, using small space. As a result, streaming algorithms produce concise input summaries that approximately preserve the optimum value. We design the first efficient streaming algorithms for these fundamental problems in combinatorial optimization. For BIN PACKING, we provide a streaming asymptotic (1 + ε)-approximation wit
    corecore