Search CORE

2,565 research outputs found

Efficient Triangle Counting in Large Graphs via Degree-based Vertex Partitioning

Author: A. Hajnal
A. Magen
C. Papadimitriou
D. Knuth
F. Chung
H. Chernoff
H. Jowhari
J. Feigenbaum
J.H. Kim
M. Latapy
N. Alon
O. Frank
S. Wasserman
T. Schank
T. Schank
V.H. Vu
W. Johnson
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

The number of triangles is a computationally expensive graph statistic which is frequently used in complex network analysis (e.g., transitivity ratio), in various random graph models (e.g., exponential random graph model) and in important real world applications such as spam detection, uncovering of the hidden thematic structure of the Web and link recommendation. Counting triangles in graphs with millions and billions of edges requires algorithms which run fast, use small amount of space, provide accurate estimates of the number of triangles and preferably are parallelizable. In this paper we present an efficient triangle counting algorithm which can be adapted to the semistreaming model. The key idea of our algorithm is to combine the sampling algorithm of Tsourakakis et al. and the partitioning of the set of vertices into a high degree and a low degree subset respectively as in the Alon, Yuster and Zwick work treating each set appropriately. We obtain a running time

O \left(m + \frac{m^{3/2} \Delta \log{n}}{t \epsilon^2} \right)

and an

\epsilon

approximation (multiplicative error), where

n

is the number of vertices,

m

the number of edges and

\Delta

the maximum number of triangles an edge is contained. Furthermore, we show how this algorithm can be adapted to the semistreaming model with space usage

O\left(m^{1/2}\log{n} + \frac{m^{3/2} \Delta \log{n}}{t \epsilon^2} \right)

and a constant number of passes (three) over the graph stream. We apply our methods in various networks with several millions of edges and we obtain excellent results. Finally, we propose a random projection based method for triangle counting and provide a sufficient condition to obtain an estimate with low variance.Comment: 1) 12 pages 2) To appear in the 7th Workshop on Algorithms and Models for the Web Graph (WAW 2010

arXiv.org e-Print Archive

CiteSeerX

Crossref

Beyond Triangles: A Distributed Framework for Estimating 3-profiles of Large Graphs

Author: Borokhovich Michael
Dimakis Alexandros G.
Elenberg Ethan R.
Shanmugam Karthikeyan
Publication venue
Publication date: 22/06/2015
Field of study

We study the problem of approximating the

3

-profile of a large graph.

3

-profiles are generalizations of triangle counts that specify the number of times a small graph appears as an induced subgraph of a large graph. Our algorithm uses the novel concept of

3

-profile sparsifiers: sparse graphs that can be used to approximate the full

3

-profile counts for a given large graph. Further, we study the problem of estimating local and ego

3

-profiles, two graph quantities that characterize the local neighborhood of each vertex of a graph. Our algorithm is distributed and operates as a vertex program over the GraphLab PowerGraph framework. We introduce the concept of edge pivoting which allows us to collect

2

-hop information without maintaining an explicit

2

-hop neighborhood list at each vertex. This enables the computation of all the local

3

-profiles in parallel with minimal communication. We test out implementation in several experiments scaling up to

640

cores on Amazon EC2. We find that our algorithm can estimate the

3

-profile of a graph in approximately the same time as triangle counting. For the harder problem of ego

3

-profiles, we introduce an algorithm that can estimate profiles of hundreds of thousands of vertices in parallel, in the timescale of minutes.Comment: To appear in part at KDD'1

arXiv.org e-Print Archive

CiteSeerX

Distributed Triangle Counting in the Graphulo Matrix Math Library

Author: Hutchison Dylan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/09/2017
Field of study

Triangle counting is a key algorithm for large graph analysis. The Graphulo library provides a framework for implementing graph algorithms on the Apache Accumulo distributed database. In this work we adapt two algorithms for counting triangles, one that uses the adjacency matrix and another that also uses the incidence matrix, to the Graphulo library for server-side processing inside Accumulo. Cloud-based experiments show a similar performance profile for these different approaches on the family of power law Graph500 graphs, for which data skew increasingly bottlenecks. These results motivate the design of skew-aware hybrid algorithms that we propose for future work.Comment: Honorable mention in the 2017 IEEE HPEC's Graph Challeng

arXiv.org e-Print Archive

Crossref

Extended h-Index Parameterized Data Structures for Computing Dynamic Subgraph Statistics

Author: Albert
Alon
Chiba
Coppersmith
Darren Strash
David Eppstein
Demaine
Downey
Eppstein
Eppstein
Eppstein
Eppstein
Frank
Frank
Hirsch
Itai
Lowell Trott
Michael T. Goodrich
Milenković
Newman
Niedermeier
Price
Pržulj
Pržulj
Snijders
Snijders
Wasserman
Publication venue: 'Elsevier BV'
Publication date: 03/09/2010
Field of study

We present techniques for maintaining subgraph frequencies in a dynamic graph, using data structures that are parameterized in terms of h, the h-index of the graph. Our methods extend previous results of Eppstein and Spiro for maintaining statistics for undirected subgraphs of size three to directed subgraphs and to subgraphs of size four. For the directed case, we provide a data structure to maintain counts for all 3-vertex induced subgraphs in O(h) amortized time per update. For the undirected case, we maintain the counts of size-four subgraphs in O(h^2) amortized time per update. These extensions enable a number of new applications in Bioinformatics and Social Networking research

arXiv.org e-Print Archive

Elsevier - Publisher Connector

Crossref

On the Distributed Complexity of Large-Scale Graph Computations

Author: Pandurangan Gopal
Robinson Peter
Scquizzato Michele
Publication venue
Publication date: 01/01/2018
Field of study

Motivated by the increasing need to understand the distributed algorithmic foundations of large-scale graph computations, we study some fundamental graph problems in a message-passing model for distributed computing where

k \geq 2

machines jointly perform computations on graphs with

n

nodes (typically,

n \gg k

). The input graph is assumed to be initially randomly partitioned among the

k

machines, a common implementation in many real-world systems. Communication is point-to-point, and the goal is to minimize the number of communication {\em rounds} of the computation. Our main contribution is the {\em General Lower Bound Theorem}, a theorem that can be used to show non-trivial lower bounds on the round complexity of distributed large-scale data computations. The General Lower Bound Theorem is established via an information-theoretic approach that relates the round complexity to the minimal amount of information required by machines to solve the problem. Our approach is generic and this theorem can be used in a "cookbook" fashion to show distributed lower bounds in the context of several problems, including non-graph problems. We present two applications by showing (almost) tight lower bounds for the round complexity of two fundamental graph problems, namely {\em PageRank computation} and {\em triangle enumeration}. Our approach, as demonstrated in the case of PageRank, can yield tight lower bounds for problems (including, and especially, under a stochastic partition of the input) where communication complexity techniques are not obvious. Our approach, as demonstrated in the case of triangle enumeration, can yield stronger round lower bounds as well as message-round tradeoffs compared to approaches that use communication complexity techniques

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università di Padova

A Novel Approach to Finding Near-Cliques: The Triangle-Densest Subgraph Problem

Author: Tsourakakis Charalampos E.
Publication venue
Publication date: 20/05/2014
Field of study

Many graph mining applications rely on detecting subgraphs which are near-cliques. There exists a dichotomy between the results in the existing work related to this problem: on the one hand the densest subgraph problem (DSP) which maximizes the average degree over all subgraphs is solvable in polynomial time but for many networks fails to find subgraphs which are near-cliques. On the other hand, formulations that are geared towards finding near-cliques are NP-hard and frequently inapproximable due to connections with the Maximum Clique problem. In this work, we propose a formulation which combines the best of both worlds: it is solvable in polynomial time and finds near-cliques when the DSP fails. Surprisingly, our formulation is a simple variation of the DSP. Specifically, we define the triangle densest subgraph problem (TDSP): given

G(V,E)

, find a subset of vertices

S^*

such that

\tau(S^*)=\max_{S \subseteq V} \frac{t(S)}{|S|}

, where

t(S)

is the number of triangles induced by the set

S

. We provide various exact and approximation algorithms which the solve the TDSP efficiently. Furthermore, we show how our algorithms adapt to the more general problem of maximizing the

k

-clique average density. Finally, we provide empirical evidence that the TDSP should be used whenever the output of the DSP fails to output a near-clique.Comment: 42 page

arXiv.org e-Print Archive

CiteSeerX