45 research outputs found
Streaming Verification of Graph Computations via Graph Structure
We give new algorithms in the annotated data streaming setting - also known as verifiable data stream computation - for certain graph problems. This setting is meant to model outsourced computation, where a space-bounded verifier limited to sequential data access seeks to overcome its computational limitations by engaging a powerful prover, without needing to trust the prover. As is well established, several problems that admit no sublinear-space algorithms under traditional streaming do allow protocols using a sublinear amount of prover/verifier communication and sublinear-space verification. We give algorithms for many well-studied graph problems including triangle counting, its generalization to subgraph counting, maximum matching, problems about the existence (or not) of short paths, finding the shortest path between two vertices, and testing for an independent set. While some of these problems have been studied before, our results achieve new tradeoffs between space and communication costs that were hitherto unknown. In particular, two of our results disprove explicit conjectures of Thaler (ICALP, 2016) by giving triangle counting and maximum matching algorithms for n-vertex graphs, using o(n) space and o(n^2) communication
Provable and practical approximations for the degree distribution using sublinear graph samples
The degree distribution is one of the most fundamental properties used in the
analysis of massive graphs. There is a large literature on graph sampling,
where the goal is to estimate properties (especially the degree distribution)
of a large graph through a small, random sample. The degree distribution
estimation poses a significant challenge, due to its heavy-tailed nature and
the large variance in degrees.
We design a new algorithm, SADDLES, for this problem, using recent
mathematical techniques from the field of sublinear algorithms. The SADDLES
algorithm gives provably accurate outputs for all values of the degree
distribution. For the analysis, we define two fatness measures of the degree
distribution, called the -index and the -index. We prove that SADDLES is
sublinear in the graph size when these indices are large. A corollary of this
result is a provably sublinear algorithm for any degree distribution bounded
below by a power law.
We deploy our new algorithm on a variety of real datasets and demonstrate its
excellent empirical behavior. In all instances, we get extremely accurate
approximations for all values in the degree distribution by observing at most
of the vertices. This is a major improvement over the state-of-the-art
sampling algorithms, which typically sample more than of the vertices to
give comparable results. We also observe that the and -indices of real
graphs are large, validating our theoretical analysis.Comment: Longer version of the WWW 2018 submissio
Approximately Counting Triangles in Sublinear Time
We consider the problem of estimating the number of triangles in a graph.
This problem has been extensively studied in both theory and practice, but all
existing algorithms read the entire graph. In this work we design a {\em
sublinear-time\/} algorithm for approximating the number of triangles in a
graph, where the algorithm is given query access to the graph. The allowed
queries are degree queries, vertex-pair queries and neighbor queries.
We show that for any given approximation parameter , the
algorithm provides an estimate such that with high constant
probability, , where
is the number of triangles in the graph . The expected query complexity of
the algorithm is , where
is the number of vertices in the graph and is the number of edges, and
the expected running time is . We also prove
that queries are necessary, thus establishing that
the query complexity of this algorithm is optimal up to polylogarithmic factors
in (and the dependence on ).Comment: To appear in the 56th Annual IEEE Symposium on Foundations of
Computer Science (FOCS 2015