12,198 research outputs found

    Recursive Sketching For Frequency Moments

    Full text link
    In a ground-breaking paper, Indyk and Woodruff (STOC 05) showed how to compute FkF_k (for k>2k>2) in space complexity O(\mbox{\em poly-log}(n,m)\cdot n^{1-\frac2k}), which is optimal up to (large) poly-logarithmic factors in nn and mm, where mm is the length of the stream and nn is the upper bound on the number of distinct elements in a stream. The best known lower bound for large moments is Ω(log(n)n12k)\Omega(\log(n)n^{1-\frac2k}). A follow-up work of Bhuvanagiri, Ganguly, Kesh and Saha (SODA 2006) reduced the poly-logarithmic factors of Indyk and Woodruff to O(log2(m)(logn+logm)n12k)O(\log^2(m)\cdot (\log n+ \log m)\cdot n^{1-{2\over k}}). Further reduction of poly-log factors has been an elusive goal since 2006, when Indyk and Woodruff method seemed to hit a natural "barrier." Using our simple recursive sketch, we provide a different yet simple approach to obtain a O(log(m)log(nm)(loglogn)4n12k)O(\log(m)\log(nm)\cdot (\log\log n)^4\cdot n^{1-{2\over k}}) algorithm for constant ϵ\epsilon (our bound is, in fact, somewhat stronger, where the (loglogn)(\log\log n) term can be replaced by any constant number of log\log iterations instead of just two or three, thus approaching lognlog^*n. Our bound also works for non-constant ϵ\epsilon (for details see the body of the paper). Further, our algorithm requires only 44-wise independence, in contrast to existing methods that use pseudo-random generators for computing large frequency moments

    Catching the head, tail, and everything in between: a streaming algorithm for the degree distribution

    Full text link
    The degree distribution is one of the most fundamental graph properties of interest for real-world graphs. It has been widely observed in numerous domains that graphs typically have a tailed or scale-free degree distribution. While the average degree is usually quite small, the variance is quite high and there are vertices with degrees at all scales. We focus on the problem of approximating the degree distribution of a large streaming graph, with small storage. We design an algorithm headtail, whose main novelty is a new estimator of infrequent degrees using truncated geometric random variables. We give a mathematical analysis of headtail and show that it has excellent behavior in practice. We can process streams will millions of edges with storage less than 1% and get extremely accurate approximations for all scales in the degree distribution. We also introduce a new notion of Relative Hausdorff distance between tailed histograms. Existing notions of distances between distributions are not suitable, since they ignore infrequent degrees in the tail. The Relative Hausdorff distance measures deviations at all scales, and is a more suitable distance for comparing degree distributions. By tracking this new measure, we are able to give strong empirical evidence of the convergence of headtail

    Max-stable sketches: estimation of Lp-norms, dominance norms and point queries for non-negative signals

    Full text link
    Max-stable random sketches can be computed efficiently on fast streaming positive data sets by using only sequential access to the data. They can be used to answer point and Lp-norm queries for the signal. There is an intriguing connection between the so-called p-stable (or sum-stable) and the max-stable sketches. Rigorous performance guarantees through error-probability estimates are derived and the algorithmic implementation is discussed

    Optimization of Planck/LFI on--board data handling

    Get PDF
    To asses stability against 1/f noise, the Low Frequency Instrument (LFI) onboard the Planck mission will acquire data at a rate much higher than the data rate allowed by its telemetry bandwith of 35.5 kbps. The data are processed by an onboard pipeline, followed onground by a reversing step. This paper illustrates the LFI scientific onboard processing to fit the allowed datarate. This is a lossy process tuned by using a set of 5 parameters Naver, r1, r2, q, O for each of the 44 LFI detectors. The paper quantifies the level of distortion introduced by the onboard processing, EpsilonQ, as a function of these parameters. It describes the method of optimizing the onboard processing chain. The tuning procedure is based on a optimization algorithm applied to unprocessed and uncompressed raw data provided either by simulations, prelaunch tests or data taken from LFI operating in diagnostic mode. All the needed optimization steps are performed by an automated tool, OCA2, which ends with optimized parameters and produces a set of statistical indicators, among them the compression rate Cr and EpsilonQ. For Planck/LFI the requirements are Cr = 2.4 and EpsilonQ <= 10% of the rms of the instrumental white noise. To speedup the process an analytical model is developed that is able to extract most of the relevant information on EpsilonQ and Cr as a function of the signal statistics and the processing parameters. This model will be of interest for the instrument data analysis. The method was applied during ground tests when the instrument was operating in conditions representative of flight. Optimized parameters were obtained and the performance has been verified, the required data rate of 35.5 Kbps has been achieved while keeping EpsilonQ at a level of 3.8% of white noise rms well within the requirements.Comment: 51 pages, 13 fig.s, 3 tables, pdflatex, needs JINST.csl, graphicx, txfonts, rotating; Issue 1.0 10 nov 2009; Sub. to JINST 23Jun09, Accepted 10Nov09, Pub.: 29Dec09; This is a preprint, not the final versio
    corecore