11,829 research outputs found
Distributed Data Summarization in Well-Connected Networks
We study distributed algorithms for some fundamental problems in data summarization. Given a communication graph G of n nodes each of which may hold a value initially, we focus on computing sum_{i=1}^N g(f_i), where f_i is the number of occurrences of value i and g is some fixed function. This includes important statistics such as the number of distinct elements, frequency moments, and the empirical entropy of the data.
In the CONGEST~ model, a simple adaptation from streaming lower bounds shows that it requires Omega~(D+ n) rounds, where D is the diameter of the graph, to compute some of these statistics exactly. However, these lower bounds do not hold for graphs that are well-connected. We give an algorithm that computes sum_{i=1}^{N} g(f_i) exactly in {tau_{G}} * 2^{O(sqrt{log n})} rounds where {tau_{G}} is the mixing time of G. This also has applications in computing the top k most frequent elements.
We demonstrate that there is a high similarity between the GOSSIP~ model and the CONGEST~ model in well-connected graphs. In particular, we show that each round of the GOSSIP~ model can be simulated almost perfectly in O~({tau_{G}}) rounds of the CONGEST~ model. To this end, we develop a new algorithm for the GOSSIP~ model that 1 +/- epsilon approximates the p-th frequency moment F_p = sum_{i=1}^N f_i^p in O~(epsilon^{-2} n^{1-k/p}) roundsfor p >= 2, when the number of distinct elements F_0 is at most O(n^{1/(k-1)}). This result can be translated back to the CONGEST~ model with a factor O~({tau_{G}}) blow-up in the number of rounds
Privacy-enhancing Aggregation of Internet of Things Data via Sensors Grouping
Big data collection practices using Internet of Things (IoT) pervasive
technologies are often privacy-intrusive and result in surveillance, profiling,
and discriminatory actions over citizens that in turn undermine the
participation of citizens to the development of sustainable smart cities.
Nevertheless, real-time data analytics and aggregate information from IoT
devices open up tremendous opportunities for managing smart city
infrastructures. The privacy-enhancing aggregation of distributed sensor data,
such as residential energy consumption or traffic information, is the research
focus of this paper. Citizens have the option to choose their privacy level by
reducing the quality of the shared data at a cost of a lower accuracy in data
analytics services. A baseline scenario is considered in which IoT sensor data
are shared directly with an untrustworthy central aggregator. A grouping
mechanism is introduced that improves privacy by sharing data aggregated first
at a group level compared as opposed to sharing data directly to the central
aggregator. Group-level aggregation obfuscates sensor data of individuals, in a
similar fashion as differential privacy and homomorphic encryption schemes,
thus inference of privacy-sensitive information from single sensors becomes
computationally harder compared to the baseline scenario. The proposed system
is evaluated using real-world data from two smart city pilot projects. Privacy
under grouping increases, while preserving the accuracy of the baseline
scenario. Intra-group influences of privacy by one group member on the other
ones are measured and fairness on privacy is found to be maximized between
group members with similar privacy choices. Several grouping strategies are
compared. Grouping by proximity of privacy choices provides the highest privacy
gains. The implications of the strategy on the design of incentives mechanisms
are discussed
Graph Summarization
The continuous and rapid growth of highly interconnected datasets, which are
both voluminous and complex, calls for the development of adequate processing
and analytical techniques. One method for condensing and simplifying such
datasets is graph summarization. It denotes a series of application-specific
algorithms designed to transform graphs into more compact representations while
preserving structural patterns, query answers, or specific property
distributions. As this problem is common to several areas studying graph
topologies, different approaches, such as clustering, compression, sampling, or
influence detection, have been proposed, primarily based on statistical and
optimization methods. The focus of our chapter is to pinpoint the main graph
summarization methods, but especially to focus on the most recent approaches
and novel research trends on this topic, not yet covered by previous surveys.Comment: To appear in the Encyclopedia of Big Data Technologie
- …