45 research outputs found

    Streaming Verification of Graph Computations via Graph Structure

    Get PDF
    We give new algorithms in the annotated data streaming setting - also known as verifiable data stream computation - for certain graph problems. This setting is meant to model outsourced computation, where a space-bounded verifier limited to sequential data access seeks to overcome its computational limitations by engaging a powerful prover, without needing to trust the prover. As is well established, several problems that admit no sublinear-space algorithms under traditional streaming do allow protocols using a sublinear amount of prover/verifier communication and sublinear-space verification. We give algorithms for many well-studied graph problems including triangle counting, its generalization to subgraph counting, maximum matching, problems about the existence (or not) of short paths, finding the shortest path between two vertices, and testing for an independent set. While some of these problems have been studied before, our results achieve new tradeoffs between space and communication costs that were hitherto unknown. In particular, two of our results disprove explicit conjectures of Thaler (ICALP, 2016) by giving triangle counting and maximum matching algorithms for n-vertex graphs, using o(n) space and o(n^2) communication

    Provable and practical approximations for the degree distribution using sublinear graph samples

    Full text link
    The degree distribution is one of the most fundamental properties used in the analysis of massive graphs. There is a large literature on graph sampling, where the goal is to estimate properties (especially the degree distribution) of a large graph through a small, random sample. The degree distribution estimation poses a significant challenge, due to its heavy-tailed nature and the large variance in degrees. We design a new algorithm, SADDLES, for this problem, using recent mathematical techniques from the field of sublinear algorithms. The SADDLES algorithm gives provably accurate outputs for all values of the degree distribution. For the analysis, we define two fatness measures of the degree distribution, called the hh-index and the zz-index. We prove that SADDLES is sublinear in the graph size when these indices are large. A corollary of this result is a provably sublinear algorithm for any degree distribution bounded below by a power law. We deploy our new algorithm on a variety of real datasets and demonstrate its excellent empirical behavior. In all instances, we get extremely accurate approximations for all values in the degree distribution by observing at most 1%1\% of the vertices. This is a major improvement over the state-of-the-art sampling algorithms, which typically sample more than 10%10\% of the vertices to give comparable results. We also observe that the hh and zz-indices of real graphs are large, validating our theoretical analysis.Comment: Longer version of the WWW 2018 submissio

    Massively Parallel Algorithms for Small Subgraph Counting

    Get PDF

    Approximately Counting Triangles in Sublinear Time

    Full text link
    We consider the problem of estimating the number of triangles in a graph. This problem has been extensively studied in both theory and practice, but all existing algorithms read the entire graph. In this work we design a {\em sublinear-time\/} algorithm for approximating the number of triangles in a graph, where the algorithm is given query access to the graph. The allowed queries are degree queries, vertex-pair queries and neighbor queries. We show that for any given approximation parameter 0<ϵ<10<\epsilon<1, the algorithm provides an estimate t^\widehat{t} such that with high constant probability, (1ϵ)t<t^<(1+ϵ)t(1-\epsilon)\cdot t< \widehat{t}<(1+\epsilon)\cdot t, where tt is the number of triangles in the graph GG. The expected query complexity of the algorithm is  ⁣(nt1/3+min{m,m3/2t})poly(logn,1/ϵ)\!\left(\frac{n}{t^{1/3}} + \min\left\{m, \frac{m^{3/2}}{t}\right\}\right)\cdot {\rm poly}(\log n, 1/\epsilon), where nn is the number of vertices in the graph and mm is the number of edges, and the expected running time is  ⁣(nt1/3+m3/2t)poly(logn,1/ϵ)\!\left(\frac{n}{t^{1/3}} + \frac{m^{3/2}}{t}\right)\cdot {\rm poly}(\log n, 1/\epsilon). We also prove that Ω ⁣(nt1/3+min{m,m3/2t})\Omega\!\left(\frac{n}{t^{1/3}} + \min\left\{m, \frac{m^{3/2}}{t}\right\}\right) queries are necessary, thus establishing that the query complexity of this algorithm is optimal up to polylogarithmic factors in nn (and the dependence on 1/ϵ1/\epsilon).Comment: To appear in the 56th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2015
    corecore