27 research outputs found
Counting Triangles in Large Graphs on GPU
The clustering coefficient and the transitivity ratio are concepts often used
in network analysis, which creates a need for fast practical algorithms for
counting triangles in large graphs. Previous research in this area focused on
sequential algorithms, MapReduce parallelization, and fast approximations.
In this paper we propose a parallel triangle counting algorithm for CUDA GPU.
We describe the implementation details necessary to achieve high performance
and present the experimental evaluation of our approach. Our algorithm achieves
8 to 15 times speedup over the CPU implementation and is capable of finding 3.8
billion triangles in an 89 million edges graph in less than 10 seconds on the
Nvidia Tesla C2050 GPU.Comment: 2016 IEEE International Parallel and Distributed Processing Symposium
Workshops (IPDPSW
Wedge Sampling for Computing Clustering Coefficients and Triangle Counts on Large Graphs
Graphs are used to model interactions in a variety of contexts, and there is
a growing need to quickly assess the structure of such graphs. Some of the most
useful graph metrics are based on triangles, such as those measuring social
cohesion. Algorithms to compute them can be extremely expensive, even for
moderately-sized graphs with only millions of edges. Previous work has
considered node and edge sampling; in contrast, we consider wedge sampling,
which provides faster and more accurate approximations than competing
techniques. Additionally, wedge sampling enables estimation local clustering
coefficients, degree-wise clustering coefficients, uniform triangle sampling,
and directed triangle counts. Our methods come with provable and practical
probabilistic error estimates for all computations. We provide extensive
results that show our methods are both more accurate and faster than
state-of-the-art alternatives.Comment: Full version of SDM 2013 paper "Triadic Measures on Graphs: The Power
of Wedge Sampling" (arxiv:1202.5230
Beyond Triangles: A Distributed Framework for Estimating 3-profiles of Large Graphs
We study the problem of approximating the -profile of a large graph.
-profiles are generalizations of triangle counts that specify the number of
times a small graph appears as an induced subgraph of a large graph. Our
algorithm uses the novel concept of -profile sparsifiers: sparse graphs that
can be used to approximate the full -profile counts for a given large graph.
Further, we study the problem of estimating local and ego -profiles, two
graph quantities that characterize the local neighborhood of each vertex of a
graph.
Our algorithm is distributed and operates as a vertex program over the
GraphLab PowerGraph framework. We introduce the concept of edge pivoting which
allows us to collect -hop information without maintaining an explicit
-hop neighborhood list at each vertex. This enables the computation of all
the local -profiles in parallel with minimal communication.
We test out implementation in several experiments scaling up to cores
on Amazon EC2. We find that our algorithm can estimate the -profile of a
graph in approximately the same time as triangle counting. For the harder
problem of ego -profiles, we introduce an algorithm that can estimate
profiles of hundreds of thousands of vertices in parallel, in the timescale of
minutes.Comment: To appear in part at KDD'1
Truss Decomposition in Massive Networks
The k-truss is a type of cohesive subgraphs proposed recently for the study
of networks. While the problem of computing most cohesive subgraphs is NP-hard,
there exists a polynomial time algorithm for computing k-truss. Compared with
k-core which is also efficient to compute, k-truss represents the "core" of a
k-core that keeps the key information of, while filtering out less important
information from, the k-core. However, existing algorithms for computing
k-truss are inefficient for handling today's massive networks. We first improve
the existing in-memory algorithm for computing k-truss in networks of moderate
size. Then, we propose two I/O-efficient algorithms to handle massive networks
that cannot fit in main memory. Our experiments on real datasets verify the
efficiency of our algorithms and the value of k-truss.Comment: VLDB201
On Approximating the Number of -cliques in Sublinear Time
We study the problem of approximating the number of -cliques in a graph
when given query access to the graph.
We consider the standard query model for general graphs via (1) degree
queries, (2) neighbor queries and (3) pair queries. Let denote the number
of vertices in the graph, the number of edges, and the number of
-cliques. We design an algorithm that outputs a
-approximation (with high probability) for , whose
expected query complexity and running time are
O\left(\frac{n}{C_k^{1/k}}+\frac{m^{k/2}}{C_k}\right)\poly(\log
n,1/\varepsilon,k).
Hence, the complexity of the algorithm is sublinear in the size of the graph
for . Furthermore, we prove a lower bound showing that
the query complexity of our algorithm is essentially optimal (up to the
dependence on , and ).
The previous results in this vein are by Feige (SICOMP 06) and by Goldreich
and Ron (RSA 08) for edge counting () and by Eden et al. (FOCS 2015) for
triangle counting (). Our result matches the complexities of these
results.
The previous result by Eden et al. hinges on a certain amortization technique
that works only for triangle counting, and does not generalize for larger
cliques. We obtain a general algorithm that works for any by
designing a procedure that samples each -clique incident to a given set
of vertices with approximately equal probability. The primary difficulty is in
finding cliques incident to purely high-degree vertices, since random sampling
within neighbors has a low success probability. This is achieved by an
algorithm that samples uniform random high degree vertices and a careful
tradeoff between estimating cliques incident purely to high-degree vertices and
those that include a low-degree vertex