145 research outputs found
An Asymptotically Optimal Algorithm for Maximum Matching in Dynamic Streams
We present an algorithm for the maximum matching problem in dynamic
(insertion-deletions) streams with *asymptotically optimal* space complexity:
for any -vertex graph, our algorithm with high probability outputs an
-approximate matching in a single pass using bits of
space.
A long line of work on the dynamic streaming matching problem has reduced the
gap between space upper and lower bounds first to factors
[Assadi-Khanna-Li-Yaroslavtsev; SODA 2016] and subsequently to
factors [Dark-Konrad; CCC 2020]. Our upper bound now
matches the Dark-Konrad lower bound up to factors, thus completing this
research direction.
Our approach consists of two main steps: we first (provably) identify a
family of graphs, similar to the instances used in prior work to establish the
lower bounds for this problem, as the only "hard" instances to focus on. These
graphs include an induced subgraph which is both sparse and contains a large
matching. We then design a dynamic streaming algorithm for this family of
graphs which is more efficient than prior work. The key to this efficiency is a
novel sketching method, which bypasses the typical loss of
-factors in space compared to standard -sampling
primitives, and can be of independent interest in designing optimal algorithms
for other streaming problems.Comment: Full version of the paper accepted to ITCS 2022. 42 pages, 5 Figure
The Sketching Complexity of Graph and Hypergraph Counting
Subgraph counting is a fundamental primitive in graph processing, with
applications in social network analysis (e.g., estimating the clustering
coefficient of a graph), database processing and other areas. The space
complexity of subgraph counting has been studied extensively in the literature,
but many natural settings are still not well understood. In this paper we
revisit the subgraph (and hypergraph) counting problem in the sketching model,
where the algorithm's state as it processes a stream of updates to the graph is
a linear function of the stream. This model has recently received a lot of
attention in the literature, and has become a standard model for solving
dynamic graph streaming problems.
In this paper we give a tight bound on the sketching complexity of counting
the number of occurrences of a small subgraph in a bounded degree graph
presented as a stream of edge updates. Specifically, we show that the space
complexity of the problem is governed by the fractional vertex cover number of
the graph . Our subgraph counting algorithm implements a natural vertex
sampling approach, with sampling probabilities governed by the vertex cover of
. Our main technical contribution lies in a new set of Fourier analytic
tools that we develop to analyze multiplayer communication protocols in the
simultaneous communication model, allowing us to prove a tight lower bound. We
believe that our techniques are likely to find applications in other settings.
Besides giving tight bounds for all graphs , both our algorithm and lower
bounds extend to the hypergraph setting, albeit with some loss in space
complexity
Streaming Communication Protocols
We define the Streaming Communication model that combines the main aspects of communication complexity and streaming. Input arrives as a stream, spread between several agents across a network. Each agent has a bounded memory, which can be updated upon receiving a new bit, or a message from another agent. We provide tight tradeoffs between the necessary resources, i.e. communication between agents and memory, for some of the canonical problems from communication complexity by proving a strong general lower bound technique. Second, we analyze the Approximate Matching problem and show that the complexity of this problem (i.e. the achievable approximation ratio) in the one-way variant of our model is strictly different both from the streaming complexity and the one-way communication complexity thereof
Densest Subgraph in Dynamic Graph Streams
In this paper, we consider the problem of approximating the densest subgraph
in the dynamic graph stream model. In this model of computation, the input
graph is defined by an arbitrary sequence of edge insertions and deletions and
the goal is to analyze properties of the resulting graph given memory that is
sub-linear in the size of the stream. We present a single-pass algorithm that
returns a approximation of the maximum density with high
probability; the algorithm uses O(\epsilon^{-2} n \polylog n) space,
processes each stream update in \polylog (n) time, and uses \poly(n)
post-processing time where is the number of nodes. The space used by our
algorithm matches the lower bound of Bahmani et al.~(PVLDB 2012) up to a
poly-logarithmic factor for constant . The best existing results for
this problem were established recently by Bhattacharya et al.~(STOC 2015). They
presented a approximation algorithm using similar space and
another algorithm that both processed each update and maintained a
approximation of the current maximum density in \polylog (n)
time per-update.Comment: To appear in MFCS 201
Tight Bounds on the Round Complexity of the Distributed Maximum Coverage Problem
We study the maximum -set coverage problem in the following distributed
setting. A collection of sets over a universe is
partitioned across machines and the goal is to find sets whose union
covers the most number of elements. The computation proceeds in synchronous
rounds. In each round, all machines simultaneously send a message to a central
coordinator who then communicates back to all machines a summary to guide the
computation for the next round. At the end, the coordinator outputs the answer.
The main measures of efficiency in this setting are the approximation ratio of
the returned solution, the communication cost of each machine, and the number
of rounds of computation.
Our main result is an asymptotically tight bound on the tradeoff between
these measures for the distributed maximum coverage problem. We first show that
any -round protocol for this problem either incurs a communication cost of or only achieves an approximation factor of
. This implies that any protocol that simultaneously achieves
good approximation ratio ( approximation) and good communication cost
( communication per machine), essentially requires
logarithmic (in ) number of rounds. We complement our lower bound result by
showing that there exist an -round protocol that achieves an
-approximation (essentially best possible) with a communication
cost of as well as an -round protocol that achieves a
-approximation with only communication per each
machine (essentially best possible).
We further use our results in this distributed setting to obtain new bounds
for the maximum coverage problem in two other main models of computation for
massive datasets, namely, the dynamic streaming model and the MapReduce model
Tight Bounds for Sketching the Operator Norm, Schatten Norms, and Subspace Embeddings
We consider the following oblivious sketching problem: given epsilon in (0,1/3) and n >= d/epsilon^2, design a distribution D over R^{k * nd} and a function f: R^k * R^{nd} -> R}, so that for any n * d matrix A, Pr_{S sim D} [(1-epsilon) |A|_{op} = 2/3, where |A|_{op} = sup_{x:|x|_2 = 1} |Ax|_2 is the operator norm of A and S(A) denotes S * A, interpreting A as a vector in R^{nd}. We show a tight lower bound of k = Omega(d^2/epsilon^2) for this problem. Previously, Nelson and Nguyen (ICALP, 2014) considered the problem of finding a distribution D over R^{k * n} such that for any n * d matrix A, Pr_{S sim D}[forall x, (1-epsilon)|Ax|_2 = 2/3, which is called an oblivious subspace embedding (OSE). Our result considerably strengthens theirs, as it (1) applies only to estimating the operator norm, which can be estimated given any OSE, and (2) applies to distributions over general linear operators S which treat A as a vector and compute S(A), rather than the restricted class of linear operators corresponding to matrix multiplication. Our technique also implies the first tight bounds for approximating the Schatten p-norm for even integers p via general linear sketches, improving the previous lower bound from k = Omega(n^{2-6/p}) [Regev, 2014] to k = Omega(n^{2-4/p}). Importantly, for sketching the operator norm up to a factor of alpha, where alpha - 1 = Omega(1), we obtain a tight k = Omega(n^2/alpha^4) bound, matching the upper bound of Andoni and Nguyen (SODA, 2013), and improving the previous k = Omega(n^2/alpha^6) lower bound. Finally, we also obtain the first lower bounds for approximating Ky Fan norms
Computing With Distributed Information
The age of computing with massive data sets is highlighting new computational challenges. Nowadays, a typical server may not be able to store an entire data set, and thus data is often partitioned and stored on multiple servers in a distributed manner. A natural way of computing with such distributed data is to use distributed algorithms: these are algorithms where the participating parties (i.e., the servers holding portions of the data) collaboratively compute a function over the entire data set by sending (preferably small-size) messages to each other, where the computation performed at each participating party only relies on the data possessed by it and the messages
received by it.
We study distributed algorithms focused on two key themes: convergence time and data summarization. Convergence time measures how quickly a distributed algorithm settles on a globally stable solution, and data summarization is the approach of creating a compact summary of the input data while retaining key information. The latter often leads to more efficient computation and communication. The main focus of this dissertation is on design and analysis of distributed algorithms for important problems in diverse application domains centering on the themes of convergence time and data summarization. Some of the problems we study include convergence time of double oral auction and interdomain routing, summarizing graphs for large-scale matching problems, and summarizing data for query processing
- …