12,198 research outputs found
Recursive Sketching For Frequency Moments
In a ground-breaking paper, Indyk and Woodruff (STOC 05) showed how to
compute (for ) in space complexity O(\mbox{\em poly-log}(n,m)\cdot
n^{1-\frac2k}), which is optimal up to (large) poly-logarithmic factors in
and , where is the length of the stream and is the upper bound on
the number of distinct elements in a stream. The best known lower bound for
large moments is . A follow-up work of
Bhuvanagiri, Ganguly, Kesh and Saha (SODA 2006) reduced the poly-logarithmic
factors of Indyk and Woodruff to . Further reduction of poly-log factors has been an elusive
goal since 2006, when Indyk and Woodruff method seemed to hit a natural
"barrier." Using our simple recursive sketch, we provide a different yet simple
approach to obtain a algorithm for constant (our bound is, in fact, somewhat
stronger, where the term can be replaced by any constant number
of iterations instead of just two or three, thus approaching .
Our bound also works for non-constant (for details see the body of
the paper). Further, our algorithm requires only -wise independence, in
contrast to existing methods that use pseudo-random generators for computing
large frequency moments
Catching the head, tail, and everything in between: a streaming algorithm for the degree distribution
The degree distribution is one of the most fundamental graph properties of
interest for real-world graphs. It has been widely observed in numerous domains
that graphs typically have a tailed or scale-free degree distribution. While
the average degree is usually quite small, the variance is quite high and there
are vertices with degrees at all scales. We focus on the problem of
approximating the degree distribution of a large streaming graph, with small
storage. We design an algorithm headtail, whose main novelty is a new estimator
of infrequent degrees using truncated geometric random variables. We give a
mathematical analysis of headtail and show that it has excellent behavior in
practice. We can process streams will millions of edges with storage less than
1% and get extremely accurate approximations for all scales in the degree
distribution.
We also introduce a new notion of Relative Hausdorff distance between tailed
histograms. Existing notions of distances between distributions are not
suitable, since they ignore infrequent degrees in the tail. The Relative
Hausdorff distance measures deviations at all scales, and is a more suitable
distance for comparing degree distributions. By tracking this new measure, we
are able to give strong empirical evidence of the convergence of headtail
Max-stable sketches: estimation of Lp-norms, dominance norms and point queries for non-negative signals
Max-stable random sketches can be computed efficiently on fast streaming
positive data sets by using only sequential access to the data. They can be
used to answer point and Lp-norm queries for the signal. There is an intriguing
connection between the so-called p-stable (or sum-stable) and the max-stable
sketches. Rigorous performance guarantees through error-probability estimates
are derived and the algorithmic implementation is discussed
Optimization of Planck/LFI on--board data handling
To asses stability against 1/f noise, the Low Frequency Instrument (LFI)
onboard the Planck mission will acquire data at a rate much higher than the
data rate allowed by its telemetry bandwith of 35.5 kbps. The data are
processed by an onboard pipeline, followed onground by a reversing step. This
paper illustrates the LFI scientific onboard processing to fit the allowed
datarate. This is a lossy process tuned by using a set of 5 parameters Naver,
r1, r2, q, O for each of the 44 LFI detectors. The paper quantifies the level
of distortion introduced by the onboard processing, EpsilonQ, as a function of
these parameters. It describes the method of optimizing the onboard processing
chain. The tuning procedure is based on a optimization algorithm applied to
unprocessed and uncompressed raw data provided either by simulations, prelaunch
tests or data taken from LFI operating in diagnostic mode. All the needed
optimization steps are performed by an automated tool, OCA2, which ends with
optimized parameters and produces a set of statistical indicators, among them
the compression rate Cr and EpsilonQ. For Planck/LFI the requirements are Cr =
2.4 and EpsilonQ <= 10% of the rms of the instrumental white noise. To speedup
the process an analytical model is developed that is able to extract most of
the relevant information on EpsilonQ and Cr as a function of the signal
statistics and the processing parameters. This model will be of interest for
the instrument data analysis. The method was applied during ground tests when
the instrument was operating in conditions representative of flight. Optimized
parameters were obtained and the performance has been verified, the required
data rate of 35.5 Kbps has been achieved while keeping EpsilonQ at a level of
3.8% of white noise rms well within the requirements.Comment: 51 pages, 13 fig.s, 3 tables, pdflatex, needs JINST.csl, graphicx,
txfonts, rotating; Issue 1.0 10 nov 2009; Sub. to JINST 23Jun09, Accepted
10Nov09, Pub.: 29Dec09; This is a preprint, not the final versio
- …