25 research outputs found
Streaming Verification of Graph Computations via Graph Structure
We give new algorithms in the annotated data streaming setting - also known as verifiable data stream computation - for certain graph problems. This setting is meant to model outsourced computation, where a space-bounded verifier limited to sequential data access seeks to overcome its computational limitations by engaging a powerful prover, without needing to trust the prover. As is well established, several problems that admit no sublinear-space algorithms under traditional streaming do allow protocols using a sublinear amount of prover/verifier communication and sublinear-space verification. We give algorithms for many well-studied graph problems including triangle counting, its generalization to subgraph counting, maximum matching, problems about the existence (or not) of short paths, finding the shortest path between two vertices, and testing for an independent set. While some of these problems have been studied before, our results achieve new tradeoffs between space and communication costs that were hitherto unknown. In particular, two of our results disprove explicit conjectures of Thaler (ICALP, 2016) by giving triangle counting and maximum matching algorithms for n-vertex graphs, using o(n) space and o(n^2) communication
Space-Efficient Algorithms and Verification Schemes for Graph Streams
Structured data-sets are often easy to represent using graphs. The prevalence of massive data-sets in the modern world gives rise to big graphs such as web graphs, social networks, biological networks, and citation graphs. Most of these graphs keep growing continuously and pose two major challenges in their processing: (a) it is infeasible to store them entirely in the memory of a regular server, and (b) even if stored entirely, it is incredibly inefficient to reread the whole graph every time a new query appears. Thus, a natural approach for efficiently processing and analyzing such graphs is reading them as a stream of edge insertions and deletions and maintaining a summary that can be (a) stored in affordable memory (significantly smaller than the input size) and (b) used to detect properties of the original graph. In this thesis, we explore the strengths and limitations of such graph streaming algorithms under three main paradigms: classical or standard streaming, adversarially robust streaming, and streaming verification.
In the classical streaming model, an algorithm needs to process an adversarially chosen input stream using space sublinear in the input size and return a desired output at the end of the stream. Here, we study a collection of fundamental directed graph problems like reachability, acyclicity testing, and topological sorting. Our investigation reveals that while most problems are provably hard for general digraphs, they admit efficient algorithms for the special and widely-studied subclass of tournament graphs. Further, we exhibit certain problems that become drastically easier when the stream elements arrive in random order rather than adversarial order, as well as problems that do not get much easier even under this relaxation. Furthermore, we study the graph coloring problem in this model and design color-efficient algorithms using novel parameterizations and establish complexity separations between different versions of the problem.
The classical streaming setting assumes that the entire input stream is fixed by an adversary before the algorithm reads it. Many randomized algorithms in this setting, however, fail when the stream is extended by an adaptive adversary based on past outputs received. This is the so-called adversarially robust streaming model. We show that graph coloring is significantly harder in the robust setting than in the classical setting, thus establishing the first such separation for a ``natural\u27\u27 problem. We also design a class of efficient robust coloring algorithms using novel techniques.
In classical streaming, many important problems turn out to be ``intractable\u27\u27, i.e., provably impossible to solve in sublinear space. It is then natural to consider an enhanced streaming setting where a space-bounded client outsources the computation to a space-unbounded but untrusted cloud service, who replies with the solution and a supporting ``proof\u27\u27 that the client needs to verify. This is called streaming verification or the annotated streaming model. It allows algorithms or verification schemes for the otherwise intractable problems using both space and proof length sublinear in the input size. We devise efficient schemes that improve upon the state of the art for a variety of fundamental graph problems including triangle counting, maximum matching, topological sorting, maximal independent set, graph connectivity, and shortest paths, as well as for computing frequency-based functions such as distinct items and maximum frequency, which have broad applications in graph streaming. Some of our schemes were conjectured to be impossible, while some others attain smooth and optimal tradeoffs between space and communication costs
New Verification Schemes for Frequency-Based Functions on Data Streams
We study the general problem of computing frequency-based functions, i.e.,
the sum of any given function of data stream frequencies. Special cases include
fundamental data stream problems such as computing the number of distinct
elements (), frequency moments (), and heavy-hitters. It can also be
applied to calculate the maximum frequency of an element ().
Given that exact computation of most of these special cases provably do not
admit any sublinear space algorithm, a natural approach is to consider them in
an enhanced data streaming model, where we have a computationally unbounded but
untrusted prover sending proofs or help messages to ease the computation. Think
of a memory-restricted client delegating the computation to a stronger cloud
service whom it doesn't want to trust blindly. Using its limited memory, it
wants to verify the proof that the cloud sends. Chakrabarti et al.~(ICALP '09)
introduced this setting as the "annotated data streaming model" and showed that
multiple problems including exact computation of frequency-based
functions---that have no sublinear algorithms in basic streaming---do have
annotated streaming algorithms, also called "schemes", with both space and
proof-length sublinear in the input size.
We give a general scheme for computing any frequency-based function with both
space usage and proof-size of bits, where is the size of
the universe. This improves upon the best known bound of given by the seminal paper of Chakrabarti et al.~and as a result, also
improves upon the best known bounds for the important special cases of
computing and . We emphasize that while being quantitatively
better, our scheme is also qualitatively better in the sense that it is simpler
than the previously best scheme that uses intricate data structures and
elaborate subroutines.Comment: To appear in FSTTCS 202
Low-Memory Algorithms for Online and W-Streaming Edge Coloring
For edge coloring, the online and the W-streaming models seem somewhat
orthogonal: the former needs edges to be assigned colors immediately after
insertion, typically without any space restrictions, while the latter limits
memory to sublinear in the input size but allows an edge's color to be
announced any time after its insertion. We aim for the best of both worlds by
designing small-space online algorithms for edge-coloring. We study the problem
under both (adversarial) edge arrivals and vertex arrivals. Our results
significantly improve upon the memory used by prior online algorithms while
achieving an -competitive ratio. In particular, for -node graphs with
maximum vertex-degree under edge arrivals, we obtain an online
-coloring in space. This is also the
first W-streaming edge-coloring algorithm for -coloring in sublinear
memory. All prior works either used linear memory or colors.
We also achieve a smooth color-space tradeoff: for any , we get an
-coloring in space,
improving upon the state of the art that used space for
the same number of colors. The improvements stem from extensive use of random
permutations that enable us to avoid previously used colors. Most of our
algorithms can be derandomized and extended to multigraphs, where edge coloring
is known to be considerably harder than for simple graphs.Comment: 32 pages, 1 figur
New Lower Bounds in Merlin-Arthur Communication and Graph Streaming Verification
We show new lower bounds in the \emph{Merlin-Arthur} (MA) communication model
and the related \emph{annotated streaming} or stream verification model. The MA
communication model is an enhancement of the classical communication model,
where in addition to the usual players Alice and Bob, there is an all-powerful
but untrusted player Merlin who knows their inputs and tries to convince them
about the output. Most functions have MA protocols with total communication
significantly smaller than what would be needed without Merlin. We focus on the
online MA (OMA) model, which is the MA analogue of one-way communication, and
introduce the notion of \emph{non-trivial-OMA} complexity of a function. This
is the minimum total communication needed by any non-trivial OMA protocol
computing that function, where a trivial OMA protocol is one where Alice sends
Bob roughly as many bits as she would have sent without Merlin. We prove a
lower bound on the non-trivial-OMA complexity of a natural function
\emph{Equals-Index} (basically the well-known Index problem on large domains)
and identify it as a canonical problem for proving strong lower bounds on this
complexity: reductions from it (i) reproduce and/or improve upon the lower
bounds for all functions that were previously known to have large
non-trivial-OMA complexity, (ii) exhibit the first explicit functions whose
non-trivial-OMA complexity is superlinear, and even exponential, in their
classical one-way complexity, and (iii) show functions on input size for
which this complexity is as large as . While exhibiting a function
with (standard) OMA complexity is a longstanding open
problem, we did not even know of any function with
non-trivial-OMA complexity. We further extend the lower bounds to a related
streaming model called annotated streaming.Comment: To appear in ITCS 202
Streaming Verification for Graph Problems: Optimal Tradeoffs and Nonlinear Sketches
We study graph computations in an enhanced data streaming setting, where a
space-bounded client reading the edge stream of a massive graph may delegate
some of its work to a cloud service. We seek algorithms that allow the client
to verify a purported proof sent by the cloud service that the work done in the
cloud is correct. A line of work starting with Chakrabarti et al. (ICALP 2009)
has provided such algorithms, which we call schemes, for several statistical
and graph-theoretic problems, many of which exhibit a tradeoff between the
length of the proof and the space used by the streaming verifier.
This work designs new schemes for a number of basic graph
problems---including triangle counting, maximum matching, topological sorting,
and single-source shortest paths---where past work had either failed to obtain
smooth tradeoffs between these two key complexity measures or only obtained
suboptimal tradeoffs. Our key innovation is having the verifier compute certain
nonlinear sketches of the input stream, leading to either new or improved
tradeoffs. In many cases, our schemes in fact provide optimal tradeoffs up to
logarithmic factors.
Specifically, for most graph problems that we study, it is known that the
product of the verifier's space cost and the proof length must be at
least for -vertex graphs. However, matching upper bounds are
only known for a handful of settings of and on the curve . For example, for counting triangles and maximum
matching, schemes with costs lying on this curve are only known for
, , and
the trivial . A major message of this work
is that by exploiting nonlinear sketches, a significant ``portion'' of costs on
the tradeoff curve can be achieved
Graph Coloring via Degeneracy in Streaming and Other Space-Conscious Models
We study the problem of coloring a given graph using a small number of colors
in several well-established models of computation for big data. These include
the data streaming model, the general graph query model, the massively parallel
computation (MPC) model, and the CONGESTED-CLIQUE and the LOCAL models of
distributed computation. On the one hand, we give algorithms with sublinear
complexity, for the appropriate notion of complexity in each of these models.
Our algorithms color a graph using about colors, where
is the degeneracy of : this parameter is closely related to the
arboricity . As a function of alone, our results are
close to best possible, since the optimal number of colors is .
On the other hand, we establish certain lower bounds indicating that
sublinear algorithms probably cannot go much further. In particular, we prove
that any randomized coloring algorithm that uses many colors,
would require storage in the one pass streaming model, and
many queries in the general graph query model, where is the
number of vertices in the graph. These lower bounds hold even when the value of
is known in advance; at the same time, our upper bounds do not
require to be given in advance.Comment: 26 page