96 research outputs found
Distributed Data Summarization in Well-Connected Networks
We study distributed algorithms for some fundamental problems in data summarization. Given a communication graph G of n nodes each of which may hold a value initially, we focus on computing sum_{i=1}^N g(f_i), where f_i is the number of occurrences of value i and g is some fixed function. This includes important statistics such as the number of distinct elements, frequency moments, and the empirical entropy of the data.
In the CONGEST~ model, a simple adaptation from streaming lower bounds shows that it requires Omega~(D+ n) rounds, where D is the diameter of the graph, to compute some of these statistics exactly. However, these lower bounds do not hold for graphs that are well-connected. We give an algorithm that computes sum_{i=1}^{N} g(f_i) exactly in {tau_{G}} * 2^{O(sqrt{log n})} rounds where {tau_{G}} is the mixing time of G. This also has applications in computing the top k most frequent elements.
We demonstrate that there is a high similarity between the GOSSIP~ model and the CONGEST~ model in well-connected graphs. In particular, we show that each round of the GOSSIP~ model can be simulated almost perfectly in O~({tau_{G}}) rounds of the CONGEST~ model. To this end, we develop a new algorithm for the GOSSIP~ model that 1 +/- epsilon approximates the p-th frequency moment F_p = sum_{i=1}^N f_i^p in O~(epsilon^{-2} n^{1-k/p}) roundsfor p >= 2, when the number of distinct elements F_0 is at most O(n^{1/(k-1)}). This result can be translated back to the CONGEST~ model with a factor O~({tau_{G}}) blow-up in the number of rounds
Finding Subcube Heavy Hitters in Analytics Data Streams
Data streams typically have items of large number of dimensions. We study the
fundamental heavy-hitters problem in this setting. Formally, the data stream
consists of -dimensional items . A -dimensional
subcube is a subset of distinct coordinates . A subcube heavy hitter query , , outputs
YES if and NO if , where is the
ratio of number of stream items whose coordinates have joint values .
The all subcube heavy hitters query outputs all joint
values that return YES to . The one dimensional version
of this problem where was heavily studied in data stream theory,
databases, networking and signal processing. The subcube heavy hitters problem
is applicable in all these cases.
We present a simple reservoir sampling based one-pass streaming algorithm to
solve the subcube heavy hitters problem in space. This
is optimal up to poly-logarithmic factors given the established lower bound. In
the worst case, this is which is prohibitive for large
, and our goal is to circumvent this quadratic bottleneck.
Our main contribution is a model-based approach to the subcube heavy hitters
problem. In particular, we assume that the dimensions are related to each other
via the Naive Bayes model, with or without a latent dimension. Under this
assumption, we present a new two-pass, -space algorithm
for our problem, and a fast algorithm for answering in
time. Our work develops the direction of model-based data
stream analysis, with much that remains to be explored.Comment: To appear in WWW 201
Better Streaming Algorithms for the Maximum Coverage Problem
We study the classic NP-Hard problem of finding the maximum k-set coverage in the data stream model: given a set system of m sets that are subsets of a universe {1,...,n}, find the k sets that cover the most number of distinct elements. The problem can be approximated up to a factor 1-1/e in polynomial time. In the streaming-set model, the sets and their elements are revealed online. The main goal of our work is to design algorithms, with approximation guarantees as close as possible to 1-1/e, that use sublinear space o(mn). Our main results are: 1) Two (1-1/e-epsilon) approximation algorithms: One uses O(1/epsilon) passes and O(k/epsilon^2 polylog(m,n)) space whereas the other uses only a single pass but O(m/epsilon^2 polylog(m,n)) space. 2) We show that any approximation factor better than (1-(1-1/k)^k) in constant passes require space that is linear in m for constant k even if the algorithm is allowed unbounded processing time. We also demonstrate a single-pass, (1-epsilon) approximation algorithm using O(m/epsilon^2 min(k,1/epsilon) polylog(m,n)) space.
We also study the maximum k-vertex coverage problem in the dynamic graph stream model. In this model, the stream consists of edge insertions and deletions of a graph on N vertices. The goal is to find k vertices that cover the most number of distinct edges. We show that any constant approximation in constant passes requires space that is linear in N for constant k whereas O(N/epsilon^2 polylog(m,n)) space is sufficient for a (1-epsilon) approximation and arbitrary k in a single pass. For regular graphs, we show that O(k/epsilon^3 polylog(m,n)) space is sufficient for a (1-epsilon) approximation in a single pass. We generalize this to a K-epsilon approximation when the ratio between the minimum and maximum degree is bounded below by K
Maximum Coverage in the Data Stream Model: Parameterized and Generalized
We present algorithms for the Max-Cover and Max-Unique-Cover problems in the
data stream model. The input to both problems are subsets of a universe of
size and a value . In Max-Cover, the problem is to find a
collection of at most sets such that the number of elements covered by at
least one set is maximized. In Max-Unique-Cover, the problem is to find a
collection of at most sets such that the number of elements covered by
exactly one set is maximized. Our goal is to design single-pass algorithms that
use space that is sublinear in the input size. Our main algorithmic results
are:
If the sets have size at most , there exist single-pass algorithms using
space that solve both problems exactly. This is
optimal up to polylogarithmic factors for constant .
If each element appears in at most sets, we present single pass
algorithms using space that return a
approximation in the case of Max-Cover. We also present a single-pass algorithm
using slightly more memory, i.e., space, that
approximates Max-Unique-Cover.
In contrast to the above results, when and are arbitrary, any
constant pass approximation algorithm for either problem requires
space but a single pass space
algorithm exists. In fact any constant-pass algorithm with an approximation
better than and for Max-Cover and Max-Unique-Cover
respectively requires space when and are unrestricted.
En route, we also obtain an algorithm for a parameterized version of the
streaming Set-Cover problem.Comment: Conference version to appear at ICDT 202
On the Locality of Nash-Williams Forest Decomposition and Star-Forest Decomposition
Given a graph with arboricity , we study the problem of
decomposing the edges of into disjoint forests in the
distributed LOCAL model. Barenboim and Elkin [PODC `08] gave a LOCAL algorithm
that computes a -forest decomposition using rounds. Ghaffari and Su [SODA `17] made further progress by
computing a -forest decomposition in rounds when ,
i.e. the limit of their algorithm is an -forest decomposition. This algorithm, based on a combinatorial
construction of Alon, McDiarmid \& Reed [Combinatorica `92], in fact provides a
decomposition of the graph into \emph{star-forests}, i.e. each forest is a
collection of stars.
Our main result in this paper is to reduce the threshold of
in -forest decomposition and star-forest decomposition.
This further answers the open question from Barenboim and
Elkin's "Distributed Graph Algorithms" book. Moreover, it gives the first
-orientation algorithms with {\it linear dependencies} on
.
At a high level, our results for forest-decomposition are based on a
combination of network decomposition, load balancing, and a new structural
result on local augmenting sequences. Our result for star-forest decomposition
uses a more careful probabilistic analysis for the construction of Alon,
McDiarmid, \& Reed; the bounds on star-arboricity here were not previously
known, even non-constructively
Diameter of Commuting Graphs of Lie Algebras
In this paper, we study the connectedness of the commuting graph of a general
Lie algebra and provide a process to determine whether the commuting graph is
connected or not, as well as to compute an upper bound for its diameter. In
addition, we will examine the connectedness and diameter of the commuting
graphs of some remarkable classes of Lie algebras, including: (1) a class of
Lie algebras with one- or two-dimensional derived algebras; and (2) a class of
solvable Lie algebras over the real field of dimension up to .Comment: 21 page
- …