36,563 research outputs found

    Distributed Data Summarization in Well-Connected Networks

    Get PDF
    We study distributed algorithms for some fundamental problems in data summarization. Given a communication graph G of n nodes each of which may hold a value initially, we focus on computing sum_{i=1}^N g(f_i), where f_i is the number of occurrences of value i and g is some fixed function. This includes important statistics such as the number of distinct elements, frequency moments, and the empirical entropy of the data. In the CONGEST~ model, a simple adaptation from streaming lower bounds shows that it requires Omega~(D+ n) rounds, where D is the diameter of the graph, to compute some of these statistics exactly. However, these lower bounds do not hold for graphs that are well-connected. We give an algorithm that computes sum_{i=1}^{N} g(f_i) exactly in {tau_{G}} * 2^{O(sqrt{log n})} rounds where {tau_{G}} is the mixing time of G. This also has applications in computing the top k most frequent elements. We demonstrate that there is a high similarity between the GOSSIP~ model and the CONGEST~ model in well-connected graphs. In particular, we show that each round of the GOSSIP~ model can be simulated almost perfectly in O~({tau_{G}}) rounds of the CONGEST~ model. To this end, we develop a new algorithm for the GOSSIP~ model that 1 +/- epsilon approximates the p-th frequency moment F_p = sum_{i=1}^N f_i^p in O~(epsilon^{-2} n^{1-k/p}) roundsfor p >= 2, when the number of distinct elements F_0 is at most O(n^{1/(k-1)}). This result can be translated back to the CONGEST~ model with a factor O~({tau_{G}}) blow-up in the number of rounds

    Sampling Online Social Networks via Heterogeneous Statistics

    Full text link
    Most sampling techniques for online social networks (OSNs) are based on a particular sampling method on a single graph, which is referred to as a statistics. However, various realizing methods on different graphs could possibly be used in the same OSN, and they may lead to different sampling efficiencies, i.e., asymptotic variances. To utilize multiple statistics for accurate measurements, we formulate a mixture sampling problem, through which we construct a mixture unbiased estimator which minimizes asymptotic variance. Given fixed sampling budgets for different statistics, we derive the optimal weights to combine the individual estimators; given fixed total budget, we show that a greedy allocation towards the most efficient statistics is optimal. In practice, the sampling efficiencies of statistics can be quite different for various targets and are unknown before sampling. To solve this problem, we design a two-stage framework which adaptively spends a partial budget to test different statistics and allocates the remaining budget to the inferred best statistics. We show that our two-stage framework is a generalization of 1) randomly choosing a statistics and 2) evenly allocating the total budget among all available statistics, and our adaptive algorithm achieves higher efficiency than these benchmark strategies in theory and experiment

    Smoothed Analysis of Dynamic Networks

    Full text link
    We generalize the technique of smoothed analysis to distributed algorithms in dynamic network models. Whereas standard smoothed analysis studies the impact of small random perturbations of input values on algorithm performance metrics, dynamic graph smoothed analysis studies the impact of random perturbations of the underlying changing network graph topologies. Similar to the original application of smoothed analysis, our goal is to study whether known strong lower bounds in dynamic network models are robust or fragile: do they withstand small (random) perturbations, or do such deviations push the graphs far enough from a precise pathological instance to enable much better performance? Fragile lower bounds are likely not relevant for real-world deployment, while robust lower bounds represent a true difficulty caused by dynamic behavior. We apply this technique to three standard dynamic network problems with known strong worst-case lower bounds: random walks, flooding, and aggregation. We prove that these bounds provide a spectrum of robustness when subjected to smoothing---some are extremely fragile (random walks), some are moderately fragile / robust (flooding), and some are extremely robust (aggregation).Comment: 20 page

    Clustering Algorithms for Scale-free Networks and Applications to Cloud Resource Management

    Full text link
    In this paper we introduce algorithms for the construction of scale-free networks and for clustering around the nerve centers, nodes with a high connectivity in a scale-free networks. We argue that such overlay networks could support self-organization in a complex system like a cloud computing infrastructure and allow the implementation of optimal resource management policies.Comment: 14 pages, 8 Figurs, Journa
    • …
    corecore