3,088 research outputs found
How Hard is Counting Triangles in the Streaming Model
The problem of (approximately) counting the number of triangles in a graph is
one of the basic problems in graph theory. In this paper we study the problem
in the streaming model. We study the amount of memory required by a randomized
algorithm to solve this problem. In case the algorithm is allowed one pass over
the stream, we present a best possible lower bound of for graphs
with edges on vertices. If a constant number of passes is allowed,
we show a lower bound of , the number of triangles. We match,
in some sense, this lower bound with a 2-pass -memory algorithm
that solves the problem of distinguishing graphs with no triangles from graphs
with at least triangles. We present a new graph parameter -- the
triangle density, and conjecture that the space complexity of the triangles
problem is . We match this by a second algorithm that solves
the distinguishing problem using -memory
Semi-Streaming Algorithms for Annotated Graph Streams
Considerable effort has been devoted to the development of streaming
algorithms for analyzing massive graphs. Unfortunately, many results have been
negative, establishing that a wide variety of problems require
space to solve. One of the few bright spots has been the development of
semi-streaming algorithms for a handful of graph problems -- these algorithms
use space .
In the annotated data streaming model of Chakrabarti et al., a
computationally limited client wants to compute some property of a massive
input, but lacks the resources to store even a small fraction of the input, and
hence cannot perform the desired computation locally. The client therefore
accesses a powerful but untrusted service provider, who not only performs the
requested computation, but also proves that the answer is correct.
We put forth the notion of semi-streaming algorithms for annotated graph
streams (semi-streaming annotation schemes for short). These are protocols in
which both the client's space usage and the length of the proof are . We give evidence that semi-streaming annotation schemes
represent a substantially more robust solution concept than does the standard
semi-streaming model. On the positive side, we give semi-streaming annotation
schemes for two dynamic graph problems that are intractable in the standard
model: (exactly) counting triangles, and (exactly) computing maximum matchings.
The former scheme answers a question of Cormode. On the negative side, we
identify for the first time two natural graph problems (connectivity and
bipartiteness in a certain edge update model) that can be solved in the
standard semi-streaming model, but cannot be solved by annotation schemes of
"sub-semi-streaming" cost. That is, these problems are just as hard in the
annotations model as they are in the standard model.Comment: This update includes some additional discussion of the results
proven. The result on counting triangles was previously included in an ECCC
technical report by Chakrabarti et al. available at
http://eccc.hpi-web.de/report/2013/180/. That report has been superseded by
this manuscript, and the CCC 2015 paper "Verifiable Stream Computation and
Arthur-Merlin Communication" by Chakrabarti et a
The Sketching Complexity of Graph and Hypergraph Counting
Subgraph counting is a fundamental primitive in graph processing, with
applications in social network analysis (e.g., estimating the clustering
coefficient of a graph), database processing and other areas. The space
complexity of subgraph counting has been studied extensively in the literature,
but many natural settings are still not well understood. In this paper we
revisit the subgraph (and hypergraph) counting problem in the sketching model,
where the algorithm's state as it processes a stream of updates to the graph is
a linear function of the stream. This model has recently received a lot of
attention in the literature, and has become a standard model for solving
dynamic graph streaming problems.
In this paper we give a tight bound on the sketching complexity of counting
the number of occurrences of a small subgraph in a bounded degree graph
presented as a stream of edge updates. Specifically, we show that the space
complexity of the problem is governed by the fractional vertex cover number of
the graph . Our subgraph counting algorithm implements a natural vertex
sampling approach, with sampling probabilities governed by the vertex cover of
. Our main technical contribution lies in a new set of Fourier analytic
tools that we develop to analyze multiplayer communication protocols in the
simultaneous communication model, allowing us to prove a tight lower bound. We
believe that our techniques are likely to find applications in other settings.
Besides giving tight bounds for all graphs , both our algorithm and lower
bounds extend to the hypergraph setting, albeit with some loss in space
complexity
Triadic Measures on Graphs: The Power of Wedge Sampling
Graphs are used to model interactions in a variety of contexts, and there is
a growing need to quickly assess the structure of a graph. Some of the most
useful graph metrics, especially those measuring social cohesion, are based on
triangles. Despite the importance of these triadic measures, associated
algorithms can be extremely expensive. We propose a new method based on wedge
sampling. This versatile technique allows for the fast and accurate
approximation of all current variants of clustering coefficients and enables
rapid uniform sampling of the triangles of a graph. Our methods come with
provable and practical time-approximation tradeoffs for all computations. We
provide extensive results that show our methods are orders of magnitude faster
than the state-of-the-art, while providing nearly the accuracy of full
enumeration. Our results will enable more wide-scale adoption of triadic
measures for analysis of extremely large graphs, as demonstrated on several
real-world examples
FLEET: Butterfly Estimation from a Bipartite Graph Stream
We consider space-efficient single-pass estimation of the number of
butterflies, a fundamental bipartite graph motif, from a massive bipartite
graph stream where each edge represents a connection between entities in two
different partitions. We present a space lower bound for any streaming
algorithm that can estimate the number of butterflies accurately, as well as
FLEET, a suite of algorithms for accurately estimating the number of
butterflies in the graph stream. Estimates returned by the algorithms come with
provable guarantees on the approximation error, and experiments show good
tradeoffs between the space used and the accuracy of approximation. We also
present space-efficient algorithms for estimating the number of butterflies
within a sliding window of the most recent elements in the stream. While there
is a significant body of work on counting subgraphs such as triangles in a
unipartite graph stream, our work seems to be one of the few to tackle the case
of bipartite graph streams.Comment: This is the author's version of the work. It is posted here by
permission of ACM for your personal use. Not for redistribution. The
definitive version was published in Seyed-Vahid Sanei-Mehri, Yu Zhang, Ahmet
Erdem Sariyuce and Srikanta Tirthapura. "FLEET: Butterfly Estimation from a
Bipartite Graph Stream". The 28th ACM International Conference on Information
and Knowledge Managemen
Wedge Sampling for Computing Clustering Coefficients and Triangle Counts on Large Graphs
Graphs are used to model interactions in a variety of contexts, and there is
a growing need to quickly assess the structure of such graphs. Some of the most
useful graph metrics are based on triangles, such as those measuring social
cohesion. Algorithms to compute them can be extremely expensive, even for
moderately-sized graphs with only millions of edges. Previous work has
considered node and edge sampling; in contrast, we consider wedge sampling,
which provides faster and more accurate approximations than competing
techniques. Additionally, wedge sampling enables estimation local clustering
coefficients, degree-wise clustering coefficients, uniform triangle sampling,
and directed triangle counts. Our methods come with provable and practical
probabilistic error estimates for all computations. We provide extensive
results that show our methods are both more accurate and faster than
state-of-the-art alternatives.Comment: Full version of SDM 2013 paper "Triadic Measures on Graphs: The Power
of Wedge Sampling" (arxiv:1202.5230
On The Multiparty Communication Complexity of Testing Triangle-Freeness
In this paper we initiate the study of property testing in simultaneous and
non-simultaneous multi-party communication complexity, focusing on testing
triangle-freeness in graphs. We consider the model,
where we have players receiving private inputs, and a coordinator who
receives no input; the coordinator can communicate with all the players, but
the players cannot communicate with each other. In this model, we ask: if an
input graph is divided between the players, with each player receiving some of
the edges, how many bits do the players and the coordinator need to exchange to
determine if the graph is triangle-free, or from triangle-free?
For general communication protocols, we show that
bits are sufficient to test triangle-freeness in
graphs of size with average degree (the degree need not be known in
advance). For protocols, where there is only one
communication round, we give a protocol that uses bits
when and when ; here, again, the average degree does not need to be
known in advance. We show that for average degree , our simultaneous
protocol is asymptotically optimal up to logarithmic factors. For higher
degrees, we are not able to give lower bounds on testing triangle-freeness, but
we give evidence that the problem is hard by showing that finding an edge that
participates in a triangle is hard, even when promised that at least a constant
fraction of the edges must be removed in order to make the graph triangle-free.Comment: To Appear in PODC 201
Efficient computation of the Weighted Clustering Coefficient
The clustering coefficient of an unweighted network has been extensively used to quantify how tightly connected is the neighbor around a node and it has been widely adopted for assessing the quality of nodes in a social network. The computation of the clustering coefficient is challenging since it requires to count the number of triangles in the graph. Several recent works proposed efficient sampling, streaming and MapReduce algorithms that allow to overcome this computational bottleneck. As a matter of fact, the intensity of the interaction between nodes, that is usually represented with weights on the edges of the graph, is also an important measure of the statistical cohesiveness of a network. Recently various notions of weighted clustering coefficient have been proposed but all those techniques are hard to implement on large-scale graphs. In this work we show how standard sampling techniques can be used to obtain efficient estimators for the most commonly used measures of weighted clustering coefficient. Furthermore we also propose a novel graph-theoretic notion of clustering coefficient in weighted networks. © 2016, Copyright © Taylor & Francis Group, LL
- …