580 research outputs found

    Monitoring Networked Applications With Incremental Quantile Estimation

    Full text link
    Networked applications have software components that reside on different computers. Email, for example, has database, processing, and user interface components that can be distributed across a network and shared by users in different locations or work groups. End-to-end performance and reliability metrics describe the software quality experienced by these groups of users, taking into account all the software components in the pipeline. Each user produces only some of the data needed to understand the quality of the application for the group, so group performance metrics are obtained by combining summary statistics that each end computer periodically (and automatically) sends to a central server. The group quality metrics usually focus on medians and tail quantiles rather than on averages. Distributed quantile estimation is challenging, though, especially when passing large amounts of data around the network solely to compute quality metrics is undesirable. This paper describes an Incremental Quantile (IQ) estimation method that is designed for performance monitoring at arbitrary levels of network aggregation and time resolution when only a limited amount of data can be transferred. Applications to both real and simulated data are provided.Comment: This paper commented in: [arXiv:0708.0317], [arXiv:0708.0336], [arXiv:0708.0338]. Rejoinder in [arXiv:0708.0339]. Published at http://dx.doi.org/10.1214/088342306000000583 in the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Selection from read-only memory with limited workspace

    Full text link
    Given an unordered array of NN elements drawn from a totally ordered set and an integer kk in the range from 11 to NN, in the classic selection problem the task is to find the kk-th smallest element in the array. We study the complexity of this problem in the space-restricted random-access model: The input array is stored on read-only memory, and the algorithm has access to a limited amount of workspace. We prove that the linear-time prune-and-search algorithm---presented in most textbooks on algorithms---can be modified to use Θ(N)\Theta(N) bits instead of Θ(N)\Theta(N) words of extra space. Prior to our work, the best known algorithm by Frederickson could perform the task with Θ(N)\Theta(N) bits of extra space in O(NlgN)O(N \lg^{*} N) time. Our result separates the space-restricted random-access model and the multi-pass streaming model, since we can surpass the Ω(NlgN)\Omega(N \lg^{*} N) lower bound known for the latter model. We also generalize our algorithm for the case when the size of the workspace is Θ(S)\Theta(S) bits, where lg3NSN\lg^3{N} \leq S \leq N. The running time of our generalized algorithm is O(Nlg(N/S)+N(lgN)/lgS)O(N \lg^{*}(N/S) + N (\lg N) / \lg{} S), slightly improving over the O(Nlg(N(lgN)/S)+N(lgN)/lgS)O(N \lg^{*}(N (\lg N)/S) + N (\lg N) / \lg{} S) bound of Frederickson's algorithm. To obtain the improvements mentioned above, we developed a new data structure, called the wavelet stack, that we use for repeated pruning. We expect the wavelet stack to be a useful tool in other applications as well.Comment: 16 pages, 1 figure, Preliminary version appeared in COCOON-201

    Tight Lower Bound for Comparison-Based Quantile Summaries

    Get PDF
    Quantiles, such as the median or percentiles, provide concise and useful information about the distribution of a collection of items, drawn from a totally ordered universe. We study data structures, called quantile summaries, which keep track of all quantiles, up to an error of at most ε\varepsilon. That is, an ε\varepsilon-approximate quantile summary first processes a stream of items and then, given any quantile query 0ϕ10\le \phi\le 1, returns an item from the stream, which is a ϕ\phi'-quantile for some ϕ=ϕ±ε\phi' = \phi \pm \varepsilon. We focus on comparison-based quantile summaries that can only compare two items and are otherwise completely oblivious of the universe. The best such deterministic quantile summary to date, due to Greenwald and Khanna (SIGMOD '01), stores at most O(1εlogεN)O(\frac{1}{\varepsilon}\cdot \log \varepsilon N) items, where NN is the number of items in the stream. We prove that this space bound is optimal by showing a matching lower bound. Our result thus rules out the possibility of constructing a deterministic comparison-based quantile summary in space f(ε)o(logN)f(\varepsilon)\cdot o(\log N), for any function ff that does not depend on NN. As a corollary, we improve the lower bound for biased quantiles, which provide a stronger, relative-error guarantee of (1±ε)ϕ(1\pm \varepsilon)\cdot \phi, and for other related computational tasks.Comment: 20 pages, 2 figures, major revison of the construction (Sec. 3) and some other parts of the pape

    Comment: Monitoring Networked Applications With Incremental Quantile Estimation

    Full text link
    Our comments are in two parts. First, we make some observations regarding the methodology in Chambers et al. [arXiv:0708.0302]. Second, we briefly describe another interesting network monitoring problem that arises in the context of assessing quality of service, such as loss rates and delay distributions, in packet-switched networks.Comment: Published at http://dx.doi.org/10.1214/088342306000000600 in the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A Fast Algorithm for Approximate Quantiles in High Speed Data Streams

    Full text link
    We present a fast algorithm for computing approx-imate quantiles in high speed data streams with deter-ministic error bounds. For data streams of size N where N is unknown in advance, our algorithm par-titions the stream into sub-streams of exponentially increasing size as they arrive. For each sub-stream which has a xed size, we compute and maintain a multi-level summary structure using a novel algorithm. In order to achieve high speed performance, the algo-rithm uses simple block-wise merge and sample oper-ations. Overall, our algorithms for xed-size streams and arbitrary-size streams have a computational cost of O(N log ( 1 log N)) and an average per-element update cost of O(log log N) if is xed.

    Approximate Quantile Computation over Sensor Networks

    Get PDF
    Sensor networks have been deployed in various environments, from battle field surveillance to weather monitoring. The amount of data generated by the sensors can be large. One way to analyze such large data set is to capture the essential statistics of the data. Thus the quantile computation in the large scale sensor network becomes an important but challenging problem. The data may be widely distributed, e.g., there may be thousands of sensors. In addition, the memory and bandwidth among sensors could be quite limited. Most previous quantile computation methods assume that the data is either stored or streaming in a centralized site, which could not be directly applied in the sensor environment. In this paper, we propose a novel algorithm to compute the quantile for sensor network data, which dynamically adapts to the memory limitations. Moreover, since sensors may update their values at any time, an incremental maintenance algorithm is developed to reduce the number of times that a global recomputation is needed upon updates. The performance and complexity of our algorithms are analyzed both theoretically and empirically on various large data sets, which demonstrate the high promise of our method
    corecore