1,889 research outputs found

    Estimating Entropy of Data Streams Using Compressed Counting

    Full text link
    The Shannon entropy is a widely used summary statistic, for example, network traffic measurement, anomaly detection, neural computations, spike trains, etc. This study focuses on estimating Shannon entropy of data streams. It is known that Shannon entropy can be approximated by Reenyi entropy or Tsallis entropy, which are both functions of the p-th frequency moments and approach Shannon entropy as p->1. Compressed Counting (CC) is a new method for approximating the p-th frequency moments of data streams. Our contributions include: 1) We prove that Renyi entropy is (much) better than Tsallis entropy for approximating Shannon entropy. 2) We propose the optimal quantile estimator for CC, which considerably improves the previous estimators. 3) Our experiments demonstrate that CC is indeed highly effective approximating the moments and entropies. We also demonstrate the crucial importance of utilizing the variance-bias trade-off

    Sequential Quantiles via Hermite Series Density Estimation

    Full text link
    Sequential quantile estimation refers to incorporating observations into quantile estimates in an incremental fashion thus furnishing an online estimate of one or more quantiles at any given point in time. Sequential quantile estimation is also known as online quantile estimation. This area is relevant to the analysis of data streams and to the one-pass analysis of massive data sets. Applications include network traffic and latency analysis, real time fraud detection and high frequency trading. We introduce new techniques for online quantile estimation based on Hermite series estimators in the settings of static quantile estimation and dynamic quantile estimation. In the static quantile estimation setting we apply the existing Gauss-Hermite expansion in a novel manner. In particular, we exploit the fact that Gauss-Hermite coefficients can be updated in a sequential manner. To treat dynamic quantile estimation we introduce a novel expansion with an exponentially weighted estimator for the Gauss-Hermite coefficients which we term the Exponentially Weighted Gauss-Hermite (EWGH) expansion. These algorithms go beyond existing sequential quantile estimation algorithms in that they allow arbitrary quantiles (as opposed to pre-specified quantiles) to be estimated at any point in time. In doing so we provide a solution to online distribution function and online quantile function estimation on data streams. In particular we derive an analytical expression for the CDF and prove consistency results for the CDF under certain conditions. In addition we analyse the associated quantile estimator. Simulation studies and tests on real data reveal the Gauss-Hermite based algorithms to be competitive with a leading existing algorithm.Comment: 43 pages, 9 figures. Improved version incorporating referee comments, as appears in Electronic Journal of Statistic

    Simulation of the spatio-temporal extent of groundwater flooding using statistical methods of hydrograph classification and lumped parameter models

    Get PDF
    This article presents the development of a relatively low cost and rapidly applicable methodology to simulate the spatio-temporal occurrence of groundwater flooding in chalk catchments. In winter 2000/2001 extreme rainfall resulted in anomalously high groundwater levels and groundwater flooding in many chalk catchments of northern Europe and the southern United Kingdom. Groundwater flooding was extensive and prolonged, occurring in areas where it had not been recently observed and, in places, lasting for 6 months. In many of these catchments, the prediction of groundwater flooding is hindered by the lack of an appropriate tool, such as a distributed groundwater model, or the inability of models to simulate extremes adequately. A set of groundwater hydrographs is simulated using a simple lumped parameter groundwater model. The number of models required is minimized through the classification and grouping of groundwater level time-series using principal component analysis and cluster analysis. One representative hydrograph is modelled then transposed to other observed hydrographs in the same group by the process of quantile mapping. Time-variant groundwater level surfaces, generated using the discrete set of modelled hydrographs and river elevation data, are overlain on a digital terrain model to predict the spatial extent of groundwater flooding. The methodology is applied to the Pang and Lambourn catchments in southern England for which monthly groundwater level time-series exist for 52 observation boreholes covering the period 1975–2004. The results are validated against observed groundwater flood extent data obtained from aerial surveys and field mapping. The method is shown to simulate the spatial and temporal occurrence of flooding during the 2000/2001 flood event accurately

    UDDSketch: Accurate Tracking of Quantiles in Data Streams

    Get PDF
    none5noopenI. Epicoco, C. Melle, M. Cafaro, M. Pulimeno, G. MorleoEpicoco, I.; Melle, C.; Cafaro, M.; Pulimeno, M.; Morleo, G
    • …
    corecore