1,168 research outputs found
Monitoring Networked Applications With Incremental Quantile Estimation
Networked applications have software components that reside on different
computers. Email, for example, has database, processing, and user interface
components that can be distributed across a network and shared by users in
different locations or work groups. End-to-end performance and reliability
metrics describe the software quality experienced by these groups of users,
taking into account all the software components in the pipeline. Each user
produces only some of the data needed to understand the quality of the
application for the group, so group performance metrics are obtained by
combining summary statistics that each end computer periodically (and
automatically) sends to a central server. The group quality metrics usually
focus on medians and tail quantiles rather than on averages. Distributed
quantile estimation is challenging, though, especially when passing large
amounts of data around the network solely to compute quality metrics is
undesirable. This paper describes an Incremental Quantile (IQ) estimation
method that is designed for performance monitoring at arbitrary levels of
network aggregation and time resolution when only a limited amount of data can
be transferred. Applications to both real and simulated data are provided.Comment: This paper commented in: [arXiv:0708.0317], [arXiv:0708.0336],
[arXiv:0708.0338]. Rejoinder in [arXiv:0708.0339]. Published at
http://dx.doi.org/10.1214/088342306000000583 in the Statistical Science
(http://www.imstat.org/sts/) by the Institute of Mathematical Statistics
(http://www.imstat.org
Sequential Quantiles via Hermite Series Density Estimation
Sequential quantile estimation refers to incorporating observations into
quantile estimates in an incremental fashion thus furnishing an online estimate
of one or more quantiles at any given point in time. Sequential quantile
estimation is also known as online quantile estimation. This area is relevant
to the analysis of data streams and to the one-pass analysis of massive data
sets. Applications include network traffic and latency analysis, real time
fraud detection and high frequency trading. We introduce new techniques for
online quantile estimation based on Hermite series estimators in the settings
of static quantile estimation and dynamic quantile estimation. In the static
quantile estimation setting we apply the existing Gauss-Hermite expansion in a
novel manner. In particular, we exploit the fact that Gauss-Hermite
coefficients can be updated in a sequential manner. To treat dynamic quantile
estimation we introduce a novel expansion with an exponentially weighted
estimator for the Gauss-Hermite coefficients which we term the Exponentially
Weighted Gauss-Hermite (EWGH) expansion. These algorithms go beyond existing
sequential quantile estimation algorithms in that they allow arbitrary
quantiles (as opposed to pre-specified quantiles) to be estimated at any point
in time. In doing so we provide a solution to online distribution function and
online quantile function estimation on data streams. In particular we derive an
analytical expression for the CDF and prove consistency results for the CDF
under certain conditions. In addition we analyse the associated quantile
estimator. Simulation studies and tests on real data reveal the Gauss-Hermite
based algorithms to be competitive with a leading existing algorithm.Comment: 43 pages, 9 figures. Improved version incorporating referee comments,
as appears in Electronic Journal of Statistic
On the Classification of Dynamical Data Streams Using Novel “Anti–Bayesian” Techniques
acceptedVersionpublishedVersionNivå2 N
- …