580 research outputs found
Monitoring Networked Applications With Incremental Quantile Estimation
Networked applications have software components that reside on different
computers. Email, for example, has database, processing, and user interface
components that can be distributed across a network and shared by users in
different locations or work groups. End-to-end performance and reliability
metrics describe the software quality experienced by these groups of users,
taking into account all the software components in the pipeline. Each user
produces only some of the data needed to understand the quality of the
application for the group, so group performance metrics are obtained by
combining summary statistics that each end computer periodically (and
automatically) sends to a central server. The group quality metrics usually
focus on medians and tail quantiles rather than on averages. Distributed
quantile estimation is challenging, though, especially when passing large
amounts of data around the network solely to compute quality metrics is
undesirable. This paper describes an Incremental Quantile (IQ) estimation
method that is designed for performance monitoring at arbitrary levels of
network aggregation and time resolution when only a limited amount of data can
be transferred. Applications to both real and simulated data are provided.Comment: This paper commented in: [arXiv:0708.0317], [arXiv:0708.0336],
[arXiv:0708.0338]. Rejoinder in [arXiv:0708.0339]. Published at
http://dx.doi.org/10.1214/088342306000000583 in the Statistical Science
(http://www.imstat.org/sts/) by the Institute of Mathematical Statistics
(http://www.imstat.org
Selection from read-only memory with limited workspace
Given an unordered array of elements drawn from a totally ordered set and
an integer in the range from to , in the classic selection problem
the task is to find the -th smallest element in the array. We study the
complexity of this problem in the space-restricted random-access model: The
input array is stored on read-only memory, and the algorithm has access to a
limited amount of workspace. We prove that the linear-time prune-and-search
algorithm---presented in most textbooks on algorithms---can be modified to use
bits instead of words of extra space. Prior to our
work, the best known algorithm by Frederickson could perform the task with
bits of extra space in time. Our result separates
the space-restricted random-access model and the multi-pass streaming model,
since we can surpass the lower bound known for the latter
model. We also generalize our algorithm for the case when the size of the
workspace is bits, where . The running time
of our generalized algorithm is ,
slightly improving over the
bound of Frederickson's algorithm. To obtain the improvements mentioned above,
we developed a new data structure, called the wavelet stack, that we use for
repeated pruning. We expect the wavelet stack to be a useful tool in other
applications as well.Comment: 16 pages, 1 figure, Preliminary version appeared in COCOON-201
Tight Lower Bound for Comparison-Based Quantile Summaries
Quantiles, such as the median or percentiles, provide concise and useful
information about the distribution of a collection of items, drawn from a
totally ordered universe. We study data structures, called quantile summaries,
which keep track of all quantiles, up to an error of at most .
That is, an -approximate quantile summary first processes a stream
of items and then, given any quantile query , returns an item
from the stream, which is a -quantile for some . We focus on comparison-based quantile summaries that can only
compare two items and are otherwise completely oblivious of the universe.
The best such deterministic quantile summary to date, due to Greenwald and
Khanna (SIGMOD '01), stores at most items, where is the number of items in the stream. We prove
that this space bound is optimal by showing a matching lower bound. Our result
thus rules out the possibility of constructing a deterministic comparison-based
quantile summary in space , for any function
that does not depend on . As a corollary, we improve the lower bound for
biased quantiles, which provide a stronger, relative-error guarantee of , and for other related computational tasks.Comment: 20 pages, 2 figures, major revison of the construction (Sec. 3) and
some other parts of the pape
Comment: Monitoring Networked Applications With Incremental Quantile Estimation
Our comments are in two parts. First, we make some observations regarding the
methodology in Chambers et al. [arXiv:0708.0302]. Second, we briefly describe
another interesting network monitoring problem that arises in the context of
assessing quality of service, such as loss rates and delay distributions, in
packet-switched networks.Comment: Published at http://dx.doi.org/10.1214/088342306000000600 in the
Statistical Science (http://www.imstat.org/sts/) by the Institute of
Mathematical Statistics (http://www.imstat.org
A Fast Algorithm for Approximate Quantiles in High Speed Data Streams
We present a fast algorithm for computing approx-imate quantiles in high speed data streams with deter-ministic error bounds. For data streams of size N where N is unknown in advance, our algorithm par-titions the stream into sub-streams of exponentially increasing size as they arrive. For each sub-stream which has a xed size, we compute and maintain a multi-level summary structure using a novel algorithm. In order to achieve high speed performance, the algo-rithm uses simple block-wise merge and sample oper-ations. Overall, our algorithms for xed-size streams and arbitrary-size streams have a computational cost of O(N log ( 1 log N)) and an average per-element update cost of O(log log N) if is xed.
Approximate Quantile Computation over Sensor Networks
Sensor networks have been deployed in various environments, from battle field surveillance to weather monitoring. The amount of data generated by the sensors can be large. One way to analyze such large data set is to capture the essential statistics of the data. Thus the quantile computation in the large scale sensor network becomes an important but challenging problem. The data may be widely distributed, e.g., there may be thousands of sensors. In addition, the memory and bandwidth among sensors could be quite limited. Most previous quantile computation methods assume that the data is either stored or streaming in a centralized site, which could not be directly applied in the sensor environment. In this paper, we propose a novel algorithm to compute the quantile for sensor network data, which dynamically adapts to the memory limitations. Moreover, since sensors may update their values at any time, an incremental maintenance algorithm is developed to reduce the number of times that a global recomputation is needed upon updates. The performance and complexity of our algorithms are analyzed both theoretically and empirically on various large data sets, which demonstrate the high promise of our method
- …