3,064 research outputs found

    Optimal Gossip Algorithms for Exact and Approximate Quantile Computations

    Full text link
    This paper gives drastically faster gossip algorithms to compute exact and approximate quantiles. Gossip algorithms, which allow each node to contact a uniformly random other node in each round, have been intensely studied and been adopted in many applications due to their fast convergence and their robustness to failures. Kempe et al. [FOCS'03] gave gossip algorithms to compute important aggregate statistics if every node is given a value. In particular, they gave a beautiful O(logā”n+logā”1Ļµ)O(\log n + \log \frac{1}{\epsilon}) round algorithm to Ļµ\epsilon-approximate the sum of all values and an O(logā”2n)O(\log^2 n) round algorithm to compute the exact Ļ•\phi-quantile, i.e., the the āŒˆĻ•nāŒ‰\lceil \phi n \rceil smallest value. We give an quadratically faster and in fact optimal gossip algorithm for the exact Ļ•\phi-quantile problem which runs in O(logā”n)O(\log n) rounds. We furthermore show that one can achieve an exponential speedup if one allows for an Ļµ\epsilon-approximation. We give an O(logā”logā”n+logā”1Ļµ)O(\log \log n + \log \frac{1}{\epsilon}) round gossip algorithm which computes a value of rank between Ļ•n\phi n and (Ļ•+Ļµ)n(\phi+\epsilon)n at every node.% for any 0ā‰¤Ļ•ā‰¤10 \leq \phi \leq 1 and 0<Ļµ<10 < \epsilon < 1. Our algorithms are extremely simple and very robust - they can be operated with the same running times even if every transmission fails with a, potentially different, constant probability. We also give a matching Ī©(logā”logā”n+logā”1Ļµ)\Omega(\log \log n + \log \frac{1}{\epsilon}) lower bound which shows that our algorithm is optimal for all values of Ļµ\epsilon

    Approximate Quantile Computation over Sensor Networks

    Get PDF
    Sensor networks have been deployed in various environments, from battle field surveillance to weather monitoring. The amount of data generated by the sensors can be large. One way to analyze such large data set is to capture the essential statistics of the data. Thus the quantile computation in the large scale sensor network becomes an important but challenging problem. The data may be widely distributed, e.g., there may be thousands of sensors. In addition, the memory and bandwidth among sensors could be quite limited. Most previous quantile computation methods assume that the data is either stored or streaming in a centralized site, which could not be directly applied in the sensor environment. In this paper, we propose a novel algorithm to compute the quantile for sensor network data, which dynamically adapts to the memory limitations. Moreover, since sensors may update their values at any time, an incremental maintenance algorithm is developed to reduce the number of times that a global recomputation is needed upon updates. The performance and complexity of our algorithms are analyzed both theoretically and empirically on various large data sets, which demonstrate the high promise of our method

    Optimal Exploitation of the Sentinel-2 Spectral Capabilities for Crop Leaf Area Index Mapping

    Get PDF
    The continuously increasing demand of accurate quantitative high quality information on land surface properties will be faced by a new generation of environmental Earth observation (EO) missions. One current example, associated with a high potential to contribute to those demands, is the multi-spectral ESA Sentinel-2 (S2) system. The present study focuses on the evaluation of spectral information content needed for crop leaf area index (LAI) mapping in view of the future sensors. Data from a field campaign were used to determine the optimal spectral sampling from available S2 bands applying inversion of a radiative transfer model (PROSAIL) with look-up table (LUT) and artificial neural network (ANN) approaches. Overall LAI estimation performance of the proposed LUT approach (LUTNā‚…ā‚€) was comparable in terms of retrieval performances with a tested and approved ANN method. Employing seven- and eight-band combinations, the LUTNā‚…ā‚€ approach obtained LAI RMSE of 0.53 and normalized LAI RMSE of 0.12, which was comparable to the results of the ANN. However, the LUTN50 method showed a higher robustness and insensitivity to different band settings. Most frequently selected wavebands were located in near infrared and red edge spectral regions. In conclusion, our results emphasize the potential benefits of the Sentinel-2 mission for agricultural applications

    An Experimental Study of Distributed Quantile Estimation

    Full text link
    Quantiles are very important statistics information used to describe the distribution of datasets. Given the quantiles of a dataset, we can easily know the distribution of the dataset, which is a fundamental problem in data analysis. However, quite often, computing quantiles directly is inappropriate due to the memory limitations. Further, in many settings such as data streaming and sensor network model, even the data size is unpredictable. Although the quantiles computation has been widely studied, it was mostly in the sequential setting. In this paper, we study several quantile computation algorithms in the distributed setting and compare them in terms of space usage, running time, and accuracy. Moreover, we provide detailed experimental comparisons between several popular algorithms. Our work focuses on the approximate quantile algorithms which provide error bounds. Approximate quantiles have received more attentions than exact ones since they are often faster, can be more easily adapted to the distributed setting while giving sufficiently good statistical information on the data sets.Comment: M.S. Thesi

    Randomized Algorithms for Tracking Distributed Count, Frequencies, and Ranks

    Full text link
    We show that randomization can lead to significant improvements for a few fundamental problems in distributed tracking. Our basis is the {\em count-tracking} problem, where there are kk players, each holding a counter nin_i that gets incremented over time, and the goal is to track an \eps-approximation of their sum n=āˆ‘inin=\sum_i n_i continuously at all times, using minimum communication. While the deterministic communication complexity of the problem is \Theta(k/\eps \cdot \log N), where NN is the final value of nn when the tracking finishes, we show that with randomization, the communication cost can be reduced to \Theta(\sqrt{k}/\eps \cdot \log N). Our algorithm is simple and uses only O(1) space at each player, while the lower bound holds even assuming each player has infinite computing power. Then, we extend our techniques to two related distributed tracking problems: {\em frequency-tracking} and {\em rank-tracking}, and obtain similar improvements over previous deterministic algorithms. Both problems are of central importance in large data monitoring and analysis, and have been extensively studied in the literature.Comment: 19 pages, 1 figur

    Tight Lower Bound for Comparison-Based Quantile Summaries

    Get PDF
    Quantiles, such as the median or percentiles, provide concise and useful information about the distribution of a collection of items, drawn from a totally ordered universe. We study data structures, called quantile summaries, which keep track of all quantiles, up to an error of at most Īµ\varepsilon. That is, an Īµ\varepsilon-approximate quantile summary first processes a stream of items and then, given any quantile query 0ā‰¤Ļ•ā‰¤10\le \phi\le 1, returns an item from the stream, which is a Ļ•ā€²\phi'-quantile for some Ļ•ā€²=Ļ•Ā±Īµ\phi' = \phi \pm \varepsilon. We focus on comparison-based quantile summaries that can only compare two items and are otherwise completely oblivious of the universe. The best such deterministic quantile summary to date, due to Greenwald and Khanna (SIGMOD '01), stores at most O(1Īµā‹…logā”ĪµN)O(\frac{1}{\varepsilon}\cdot \log \varepsilon N) items, where NN is the number of items in the stream. We prove that this space bound is optimal by showing a matching lower bound. Our result thus rules out the possibility of constructing a deterministic comparison-based quantile summary in space f(Īµ)ā‹…o(logā”N)f(\varepsilon)\cdot o(\log N), for any function ff that does not depend on NN. As a corollary, we improve the lower bound for biased quantiles, which provide a stronger, relative-error guarantee of (1Ā±Īµ)ā‹…Ļ•(1\pm \varepsilon)\cdot \phi, and for other related computational tasks.Comment: 20 pages, 2 figures, major revison of the construction (Sec. 3) and some other parts of the pape
    • ā€¦
    corecore