25 research outputs found

    Streaming Verification of Graph Computations via Graph Structure

    Get PDF
    We give new algorithms in the annotated data streaming setting - also known as verifiable data stream computation - for certain graph problems. This setting is meant to model outsourced computation, where a space-bounded verifier limited to sequential data access seeks to overcome its computational limitations by engaging a powerful prover, without needing to trust the prover. As is well established, several problems that admit no sublinear-space algorithms under traditional streaming do allow protocols using a sublinear amount of prover/verifier communication and sublinear-space verification. We give algorithms for many well-studied graph problems including triangle counting, its generalization to subgraph counting, maximum matching, problems about the existence (or not) of short paths, finding the shortest path between two vertices, and testing for an independent set. While some of these problems have been studied before, our results achieve new tradeoffs between space and communication costs that were hitherto unknown. In particular, two of our results disprove explicit conjectures of Thaler (ICALP, 2016) by giving triangle counting and maximum matching algorithms for n-vertex graphs, using o(n) space and o(n^2) communication

    Space-Efficient Algorithms and Verification Schemes for Graph Streams

    Get PDF
    Structured data-sets are often easy to represent using graphs. The prevalence of massive data-sets in the modern world gives rise to big graphs such as web graphs, social networks, biological networks, and citation graphs. Most of these graphs keep growing continuously and pose two major challenges in their processing: (a) it is infeasible to store them entirely in the memory of a regular server, and (b) even if stored entirely, it is incredibly inefficient to reread the whole graph every time a new query appears. Thus, a natural approach for efficiently processing and analyzing such graphs is reading them as a stream of edge insertions and deletions and maintaining a summary that can be (a) stored in affordable memory (significantly smaller than the input size) and (b) used to detect properties of the original graph. In this thesis, we explore the strengths and limitations of such graph streaming algorithms under three main paradigms: classical or standard streaming, adversarially robust streaming, and streaming verification. In the classical streaming model, an algorithm needs to process an adversarially chosen input stream using space sublinear in the input size and return a desired output at the end of the stream. Here, we study a collection of fundamental directed graph problems like reachability, acyclicity testing, and topological sorting. Our investigation reveals that while most problems are provably hard for general digraphs, they admit efficient algorithms for the special and widely-studied subclass of tournament graphs. Further, we exhibit certain problems that become drastically easier when the stream elements arrive in random order rather than adversarial order, as well as problems that do not get much easier even under this relaxation. Furthermore, we study the graph coloring problem in this model and design color-efficient algorithms using novel parameterizations and establish complexity separations between different versions of the problem. The classical streaming setting assumes that the entire input stream is fixed by an adversary before the algorithm reads it. Many randomized algorithms in this setting, however, fail when the stream is extended by an adaptive adversary based on past outputs received. This is the so-called adversarially robust streaming model. We show that graph coloring is significantly harder in the robust setting than in the classical setting, thus establishing the first such separation for a ``natural\u27\u27 problem. We also design a class of efficient robust coloring algorithms using novel techniques. In classical streaming, many important problems turn out to be ``intractable\u27\u27, i.e., provably impossible to solve in sublinear space. It is then natural to consider an enhanced streaming setting where a space-bounded client outsources the computation to a space-unbounded but untrusted cloud service, who replies with the solution and a supporting ``proof\u27\u27 that the client needs to verify. This is called streaming verification or the annotated streaming model. It allows algorithms or verification schemes for the otherwise intractable problems using both space and proof length sublinear in the input size. We devise efficient schemes that improve upon the state of the art for a variety of fundamental graph problems including triangle counting, maximum matching, topological sorting, maximal independent set, graph connectivity, and shortest paths, as well as for computing frequency-based functions such as distinct items and maximum frequency, which have broad applications in graph streaming. Some of our schemes were conjectured to be impossible, while some others attain smooth and optimal tradeoffs between space and communication costs

    New Verification Schemes for Frequency-Based Functions on Data Streams

    Get PDF
    We study the general problem of computing frequency-based functions, i.e., the sum of any given function of data stream frequencies. Special cases include fundamental data stream problems such as computing the number of distinct elements (F0F_0), frequency moments (FkF_k), and heavy-hitters. It can also be applied to calculate the maximum frequency of an element (FF_{\infty}). Given that exact computation of most of these special cases provably do not admit any sublinear space algorithm, a natural approach is to consider them in an enhanced data streaming model, where we have a computationally unbounded but untrusted prover sending proofs or help messages to ease the computation. Think of a memory-restricted client delegating the computation to a stronger cloud service whom it doesn't want to trust blindly. Using its limited memory, it wants to verify the proof that the cloud sends. Chakrabarti et al.~(ICALP '09) introduced this setting as the "annotated data streaming model" and showed that multiple problems including exact computation of frequency-based functions---that have no sublinear algorithms in basic streaming---do have annotated streaming algorithms, also called "schemes", with both space and proof-length sublinear in the input size. We give a general scheme for computing any frequency-based function with both space usage and proof-size of O(n2/3logn)O(n^{2/3}\log n) bits, where nn is the size of the universe. This improves upon the best known bound of O(n2/3log4/3n)O(n^{2/3}\log^{4/3} n) given by the seminal paper of Chakrabarti et al.~and as a result, also improves upon the best known bounds for the important special cases of computing F0F_0 and FF_{\infty}. We emphasize that while being quantitatively better, our scheme is also qualitatively better in the sense that it is simpler than the previously best scheme that uses intricate data structures and elaborate subroutines.Comment: To appear in FSTTCS 202

    Low-Memory Algorithms for Online and W-Streaming Edge Coloring

    Full text link
    For edge coloring, the online and the W-streaming models seem somewhat orthogonal: the former needs edges to be assigned colors immediately after insertion, typically without any space restrictions, while the latter limits memory to sublinear in the input size but allows an edge's color to be announced any time after its insertion. We aim for the best of both worlds by designing small-space online algorithms for edge-coloring. We study the problem under both (adversarial) edge arrivals and vertex arrivals. Our results significantly improve upon the memory used by prior online algorithms while achieving an O(1)O(1)-competitive ratio. In particular, for nn-node graphs with maximum vertex-degree Δ\Delta under edge arrivals, we obtain an online O(Δ)O(\Delta)-coloring in O~(nΔ)\tilde{O}(n\sqrt{\Delta}) space. This is also the first W-streaming edge-coloring algorithm for O(Δ)O(\Delta)-coloring in sublinear memory. All prior works either used linear memory or ω(Δ)\omega(\Delta) colors. We also achieve a smooth color-space tradeoff: for any t=O(Δ)t=O(\Delta), we get an O(Δ(logΔ)2t)O(\Delta (\log \Delta)^2 t)-coloring in O~(nΔ/t)\tilde{O}(n\sqrt{\Delta/t}) space, improving upon the state of the art that used O~(nΔ/t)\tilde{O}(n\Delta/t) space for the same number of colors. The improvements stem from extensive use of random permutations that enable us to avoid previously used colors. Most of our algorithms can be derandomized and extended to multigraphs, where edge coloring is known to be considerably harder than for simple graphs.Comment: 32 pages, 1 figur

    New Lower Bounds in Merlin-Arthur Communication and Graph Streaming Verification

    Full text link
    We show new lower bounds in the \emph{Merlin-Arthur} (MA) communication model and the related \emph{annotated streaming} or stream verification model. The MA communication model is an enhancement of the classical communication model, where in addition to the usual players Alice and Bob, there is an all-powerful but untrusted player Merlin who knows their inputs and tries to convince them about the output. Most functions have MA protocols with total communication significantly smaller than what would be needed without Merlin. We focus on the online MA (OMA) model, which is the MA analogue of one-way communication, and introduce the notion of \emph{non-trivial-OMA} complexity of a function. This is the minimum total communication needed by any non-trivial OMA protocol computing that function, where a trivial OMA protocol is one where Alice sends Bob roughly as many bits as she would have sent without Merlin. We prove a lower bound on the non-trivial-OMA complexity of a natural function \emph{Equals-Index} (basically the well-known Index problem on large domains) and identify it as a canonical problem for proving strong lower bounds on this complexity: reductions from it (i) reproduce and/or improve upon the lower bounds for all functions that were previously known to have large non-trivial-OMA complexity, (ii) exhibit the first explicit functions whose non-trivial-OMA complexity is superlinear, and even exponential, in their classical one-way complexity, and (iii) show functions on input size nn for which this complexity is as large as n/lognn/\log n. While exhibiting a function with ω(n)\omega(\sqrt{n}) (standard) OMA complexity is a longstanding open problem, we did not even know of any function with ω(n)\omega(\sqrt{n}) non-trivial-OMA complexity. We further extend the lower bounds to a related streaming model called annotated streaming.Comment: To appear in ITCS 202

    Adversarially Robust Coloring for Graph Streams

    Get PDF

    Streaming Verification for Graph Problems: Optimal Tradeoffs and Nonlinear Sketches

    Get PDF
    We study graph computations in an enhanced data streaming setting, where a space-bounded client reading the edge stream of a massive graph may delegate some of its work to a cloud service. We seek algorithms that allow the client to verify a purported proof sent by the cloud service that the work done in the cloud is correct. A line of work starting with Chakrabarti et al. (ICALP 2009) has provided such algorithms, which we call schemes, for several statistical and graph-theoretic problems, many of which exhibit a tradeoff between the length of the proof and the space used by the streaming verifier. This work designs new schemes for a number of basic graph problems---including triangle counting, maximum matching, topological sorting, and single-source shortest paths---where past work had either failed to obtain smooth tradeoffs between these two key complexity measures or only obtained suboptimal tradeoffs. Our key innovation is having the verifier compute certain nonlinear sketches of the input stream, leading to either new or improved tradeoffs. In many cases, our schemes in fact provide optimal tradeoffs up to logarithmic factors. Specifically, for most graph problems that we study, it is known that the product of the verifier's space cost vv and the proof length hh must be at least Ω(n2)\Omega(n^2) for nn-vertex graphs. However, matching upper bounds are only known for a handful of settings of hh and vv on the curve hv=Θ~(n2)h \cdot v=\tilde{\Theta}(n^2). For example, for counting triangles and maximum matching, schemes with costs lying on this curve are only known for (h=O~(n2),v=O~(1))(h=\tilde{O}(n^2), v=\tilde{O}(1)), (h=O~(n),v=O~(n))(h=\tilde{O}(n), v=\tilde{O}(n)), and the trivial (h=O~(1),v=O~(n2))(h=\tilde{O}(1), v=\tilde{O}(n^2)). A major message of this work is that by exploiting nonlinear sketches, a significant ``portion'' of costs on the tradeoff curve hv=n2h \cdot v = n^2 can be achieved

    Graph Coloring via Degeneracy in Streaming and Other Space-Conscious Models

    Get PDF
    We study the problem of coloring a given graph using a small number of colors in several well-established models of computation for big data. These include the data streaming model, the general graph query model, the massively parallel computation (MPC) model, and the CONGESTED-CLIQUE and the LOCAL models of distributed computation. On the one hand, we give algorithms with sublinear complexity, for the appropriate notion of complexity in each of these models. Our algorithms color a graph GG using about κ(G)\kappa(G) colors, where κ(G)\kappa(G) is the degeneracy of GG: this parameter is closely related to the arboricity α(G)\alpha(G). As a function of κ(G)\kappa(G) alone, our results are close to best possible, since the optimal number of colors is κ(G)+1\kappa(G)+1. On the other hand, we establish certain lower bounds indicating that sublinear algorithms probably cannot go much further. In particular, we prove that any randomized coloring algorithm that uses κ(G)+1\kappa(G)+1 many colors, would require Ω(n2)\Omega(n^2) storage in the one pass streaming model, and Ω(n2)\Omega(n^2) many queries in the general graph query model, where nn is the number of vertices in the graph. These lower bounds hold even when the value of κ(G)\kappa(G) is known in advance; at the same time, our upper bounds do not require κ(G)\kappa(G) to be given in advance.Comment: 26 page
    corecore