4,640 research outputs found
A second look at counting triangles in graph streams (corrected)
In this paper we present improved results on the problem of counting triangles in edge streamed graphs. For graphs with m edges and at least T triangles, we show that an extra look over the stream yields a two-pass streaming algorithm that uses O((m)/(Δ4.5sqrt(T))) space and outputs a (1+Δ) approximation of the number of triangles in the graph. This improves upon the two-pass streaming tester of Braverman, Ostrovsky and Vilenchik, ICALP 2013, which distinguishes between triangle-free graphs and graphs with at least T triangle using O((m)/(T1/3)) space. Also, in terms of dependence on T, we show that more passes would not lead to a better space bound. In other words, we prove there is no constant pass streaming algorithm that distinguishes between triangle-free graphs from graphs with at least T triangles using O((m)/(T1/2+Ï)) space for any constant Ï>=0
The Sketching Complexity of Graph and Hypergraph Counting
Subgraph counting is a fundamental primitive in graph processing, with
applications in social network analysis (e.g., estimating the clustering
coefficient of a graph), database processing and other areas. The space
complexity of subgraph counting has been studied extensively in the literature,
but many natural settings are still not well understood. In this paper we
revisit the subgraph (and hypergraph) counting problem in the sketching model,
where the algorithm's state as it processes a stream of updates to the graph is
a linear function of the stream. This model has recently received a lot of
attention in the literature, and has become a standard model for solving
dynamic graph streaming problems.
In this paper we give a tight bound on the sketching complexity of counting
the number of occurrences of a small subgraph in a bounded degree graph
presented as a stream of edge updates. Specifically, we show that the space
complexity of the problem is governed by the fractional vertex cover number of
the graph . Our subgraph counting algorithm implements a natural vertex
sampling approach, with sampling probabilities governed by the vertex cover of
. Our main technical contribution lies in a new set of Fourier analytic
tools that we develop to analyze multiplayer communication protocols in the
simultaneous communication model, allowing us to prove a tight lower bound. We
believe that our techniques are likely to find applications in other settings.
Besides giving tight bounds for all graphs , both our algorithm and lower
bounds extend to the hypergraph setting, albeit with some loss in space
complexity
FLEET: Butterfly Estimation from a Bipartite Graph Stream
We consider space-efficient single-pass estimation of the number of
butterflies, a fundamental bipartite graph motif, from a massive bipartite
graph stream where each edge represents a connection between entities in two
different partitions. We present a space lower bound for any streaming
algorithm that can estimate the number of butterflies accurately, as well as
FLEET, a suite of algorithms for accurately estimating the number of
butterflies in the graph stream. Estimates returned by the algorithms come with
provable guarantees on the approximation error, and experiments show good
tradeoffs between the space used and the accuracy of approximation. We also
present space-efficient algorithms for estimating the number of butterflies
within a sliding window of the most recent elements in the stream. While there
is a significant body of work on counting subgraphs such as triangles in a
unipartite graph stream, our work seems to be one of the few to tackle the case
of bipartite graph streams.Comment: This is the author's version of the work. It is posted here by
permission of ACM for your personal use. Not for redistribution. The
definitive version was published in Seyed-Vahid Sanei-Mehri, Yu Zhang, Ahmet
Erdem Sariyuce and Srikanta Tirthapura. "FLEET: Butterfly Estimation from a
Bipartite Graph Stream". The 28th ACM International Conference on Information
and Knowledge Managemen
Graph Sample and Hold: A Framework for Big-Graph Analytics
Sampling is a standard approach in big-graph analytics; the goal is to
efficiently estimate the graph properties by consulting a sample of the whole
population. A perfect sample is assumed to mirror every property of the whole
population. Unfortunately, such a perfect sample is hard to collect in complex
populations such as graphs (e.g. web graphs, social networks etc), where an
underlying network connects the units of the population. Therefore, a good
sample will be representative in the sense that graph properties of interest
can be estimated with a known degree of accuracy. While previous work focused
particularly on sampling schemes used to estimate certain graph properties
(e.g. triangle count), much less is known for the case when we need to estimate
various graph properties with the same sampling scheme. In this paper, we
propose a generic stream sampling framework for big-graph analytics, called
Graph Sample and Hold (gSH). To begin, the proposed framework samples from
massive graphs sequentially in a single pass, one edge at a time, while
maintaining a small state. We then show how to produce unbiased estimators for
various graph properties from the sample. Given that the graph analysis
algorithms will run on a sample instead of the whole population, the runtime
complexity of these algorithm is kept under control. Moreover, given that the
estimators of graph properties are unbiased, the approximation error is kept
under control. Finally, we show the performance of the proposed framework (gSH)
on various types of graphs, such as social graphs, among others
Linear Time Subgraph Counting, Graph Degeneracy, and the Chasm at Size Six
We consider the problem of counting all k-vertex subgraphs in an input graph, for any constant k. This problem (denoted SUB-CNT_k) has been studied extensively in both theory and practice. In a classic result, Chiba and Nishizeki (SICOMP 85) gave linear time algorithms for clique and 4-cycle counting for bounded degeneracy graphs. This is a rich class of sparse graphs that contains, for example, all minor-free families and preferential attachment graphs. The techniques from this result have inspired a number of recent practical algorithms for SUB-CNT_k. Towards a better understanding of the limits of these techniques, we ask: for what values of k can SUB_CNT_k be solved in linear time?
We discover a chasm at k=6. Specifically, we prove that for k < 6, SUB_CNT_k can be solved in linear time. Assuming a standard conjecture in fine-grained complexity, we prove that for all k ? 6, SUB-CNT_k cannot be solved even in near-linear time
- âŠ