3,064 research outputs found
Optimal Gossip Algorithms for Exact and Approximate Quantile Computations
This paper gives drastically faster gossip algorithms to compute exact and
approximate quantiles.
Gossip algorithms, which allow each node to contact a uniformly random other
node in each round, have been intensely studied and been adopted in many
applications due to their fast convergence and their robustness to failures.
Kempe et al. [FOCS'03] gave gossip algorithms to compute important aggregate
statistics if every node is given a value. In particular, they gave a beautiful
round algorithm to -approximate
the sum of all values and an round algorithm to compute the exact
-quantile, i.e., the the smallest value.
We give an quadratically faster and in fact optimal gossip algorithm for the
exact -quantile problem which runs in rounds. We furthermore
show that one can achieve an exponential speedup if one allows for an
-approximation. We give an
round gossip algorithm which computes a value of rank between and
at every node.% for any and . Our algorithms are extremely simple and very robust - they can
be operated with the same running times even if every transmission fails with
a, potentially different, constant probability. We also give a matching
lower bound which shows that
our algorithm is optimal for all values of
Approximate Quantile Computation over Sensor Networks
Sensor networks have been deployed in various environments, from battle field surveillance to weather monitoring. The amount of data generated by the sensors can be large. One way to analyze such large data set is to capture the essential statistics of the data. Thus the quantile computation in the large scale sensor network becomes an important but challenging problem. The data may be widely distributed, e.g., there may be thousands of sensors. In addition, the memory and bandwidth among sensors could be quite limited. Most previous quantile computation methods assume that the data is either stored or streaming in a centralized site, which could not be directly applied in the sensor environment. In this paper, we propose a novel algorithm to compute the quantile for sensor network data, which dynamically adapts to the memory limitations. Moreover, since sensors may update their values at any time, an incremental maintenance algorithm is developed to reduce the number of times that a global recomputation is needed upon updates. The performance and complexity of our algorithms are analyzed both theoretically and empirically on various large data sets, which demonstrate the high promise of our method
Optimal Exploitation of the Sentinel-2 Spectral Capabilities for Crop Leaf Area Index Mapping
The continuously increasing demand of accurate quantitative high quality information on land surface properties will be faced by a new generation of environmental Earth observation (EO) missions. One current example, associated with a high potential to contribute to those demands, is the multi-spectral ESA Sentinel-2 (S2) system. The present study focuses on the evaluation of spectral information content needed for crop leaf area index (LAI) mapping in view of the future sensors. Data from a field campaign were used to determine the optimal spectral sampling from available S2 bands applying inversion of a radiative transfer model (PROSAIL) with look-up table (LUT) and artificial neural network (ANN) approaches. Overall LAI estimation performance of the proposed LUT approach (LUTNā
ā) was comparable in terms of retrieval performances with a tested and approved ANN method. Employing seven- and eight-band combinations, the LUTNā
ā approach obtained LAI RMSE of 0.53 and normalized LAI RMSE of 0.12, which was comparable to the results of the ANN. However, the LUTN50 method showed a higher robustness and insensitivity to different band settings. Most frequently selected wavebands were located in near infrared and red edge spectral regions. In conclusion, our results emphasize the potential benefits of the Sentinel-2 mission for agricultural applications
An Experimental Study of Distributed Quantile Estimation
Quantiles are very important statistics information used to describe the
distribution of datasets. Given the quantiles of a dataset, we can easily know
the distribution of the dataset, which is a fundamental problem in data
analysis. However, quite often, computing quantiles directly is inappropriate
due to the memory limitations. Further, in many settings such as data streaming
and sensor network model, even the data size is unpredictable. Although the
quantiles computation has been widely studied, it was mostly in the sequential
setting. In this paper, we study several quantile computation algorithms in the
distributed setting and compare them in terms of space usage, running time, and
accuracy. Moreover, we provide detailed experimental comparisons between
several popular algorithms. Our work focuses on the approximate quantile
algorithms which provide error bounds. Approximate quantiles have received more
attentions than exact ones since they are often faster, can be more easily
adapted to the distributed setting while giving sufficiently good statistical
information on the data sets.Comment: M.S. Thesi
Randomized Algorithms for Tracking Distributed Count, Frequencies, and Ranks
We show that randomization can lead to significant improvements for a few
fundamental problems in distributed tracking. Our basis is the {\em
count-tracking} problem, where there are players, each holding a counter
that gets incremented over time, and the goal is to track an
\eps-approximation of their sum continuously at all times,
using minimum communication. While the deterministic communication complexity
of the problem is \Theta(k/\eps \cdot \log N), where is the final value
of when the tracking finishes, we show that with randomization, the
communication cost can be reduced to \Theta(\sqrt{k}/\eps \cdot \log N). Our
algorithm is simple and uses only O(1) space at each player, while the lower
bound holds even assuming each player has infinite computing power. Then, we
extend our techniques to two related distributed tracking problems: {\em
frequency-tracking} and {\em rank-tracking}, and obtain similar improvements
over previous deterministic algorithms. Both problems are of central importance
in large data monitoring and analysis, and have been extensively studied in the
literature.Comment: 19 pages, 1 figur
Tight Lower Bound for Comparison-Based Quantile Summaries
Quantiles, such as the median or percentiles, provide concise and useful
information about the distribution of a collection of items, drawn from a
totally ordered universe. We study data structures, called quantile summaries,
which keep track of all quantiles, up to an error of at most .
That is, an -approximate quantile summary first processes a stream
of items and then, given any quantile query , returns an item
from the stream, which is a -quantile for some . We focus on comparison-based quantile summaries that can only
compare two items and are otherwise completely oblivious of the universe.
The best such deterministic quantile summary to date, due to Greenwald and
Khanna (SIGMOD '01), stores at most items, where is the number of items in the stream. We prove
that this space bound is optimal by showing a matching lower bound. Our result
thus rules out the possibility of constructing a deterministic comparison-based
quantile summary in space , for any function
that does not depend on . As a corollary, we improve the lower bound for
biased quantiles, which provide a stronger, relative-error guarantee of , and for other related computational tasks.Comment: 20 pages, 2 figures, major revison of the construction (Sec. 3) and
some other parts of the pape
- ā¦