Search CORE

51 research outputs found

Efficient Triangle Counting in Large Graphs via Degree-based Vertex Partitioning

Author: A. Hajnal
A. Magen
C. Papadimitriou
D. Knuth
F. Chung
H. Chernoff
H. Jowhari
J. Feigenbaum
J.H. Kim
M. Latapy
N. Alon
O. Frank
S. Wasserman
T. Schank
T. Schank
V.H. Vu
W. Johnson
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

The number of triangles is a computationally expensive graph statistic which is frequently used in complex network analysis (e.g., transitivity ratio), in various random graph models (e.g., exponential random graph model) and in important real world applications such as spam detection, uncovering of the hidden thematic structure of the Web and link recommendation. Counting triangles in graphs with millions and billions of edges requires algorithms which run fast, use small amount of space, provide accurate estimates of the number of triangles and preferably are parallelizable. In this paper we present an efficient triangle counting algorithm which can be adapted to the semistreaming model. The key idea of our algorithm is to combine the sampling algorithm of Tsourakakis et al. and the partitioning of the set of vertices into a high degree and a low degree subset respectively as in the Alon, Yuster and Zwick work treating each set appropriately. We obtain a running time

O \left(m + \frac{m^{3/2} \Delta \log{n}}{t \epsilon^2} \right)

and an

\epsilon

approximation (multiplicative error), where

n

is the number of vertices,

m

the number of edges and

\Delta

the maximum number of triangles an edge is contained. Furthermore, we show how this algorithm can be adapted to the semistreaming model with space usage

O\left(m^{1/2}\log{n} + \frac{m^{3/2} \Delta \log{n}}{t \epsilon^2} \right)

and a constant number of passes (three) over the graph stream. We apply our methods in various networks with several millions of edges and we obtain excellent results. Finally, we propose a random projection based method for triangle counting and provide a sufficient condition to obtain an estimate with low variance.Comment: 1) 12 pages 2) To appear in the 7th Workshop on Algorithms and Models for the Web Graph (WAW 2010

arXiv.org e-Print Archive

CiteSeerX

Crossref

How Hard is Counting Triangles in the Streaming Model

Author: A. Rinaldo
C. Tsourakakis
H. Jowhari
J. Eckmann
M. Alon
M. Kolountzakis
O. Frank
Publication venue
Publication date: 01/01/2013
Field of study

The problem of (approximately) counting the number of triangles in a graph is one of the basic problems in graph theory. In this paper we study the problem in the streaming model. We study the amount of memory required by a randomized algorithm to solve this problem. In case the algorithm is allowed one pass over the stream, we present a best possible lower bound of

\Omega(m)

for graphs

G

with

m

edges on

n

vertices. If a constant number of passes is allowed, we show a lower bound of

\Omega(m/T)

T

the number of triangles. We match, in some sense, this lower bound with a 2-pass

O(m/T^{1/3})

-memory algorithm that solves the problem of distinguishing graphs with no triangles from graphs with at least

T

triangles. We present a new graph parameter

\rho(G)

-- the triangle density, and conjecture that the space complexity of the triangles problem is

\Omega(m/\rho(G))

. We match this by a second algorithm that solves the distinguishing problem using

O(m/\rho(G))

-memory

arXiv.org e-Print Archive

Crossref

Parallel Algorithms for Small Subgraph Counting

Author: Biswas Amartya Shankha
Eden Talya
Liu Quanquan C.
Mitrović Slobodan
Rubinfeld Ronitt
Publication venue
Publication date: 29/05/2020
Field of study

Subgraph counting is a fundamental problem in analyzing massive graphs, often studied in the context of social and complex networks. There is a rich literature on designing efficient, accurate, and scalable algorithms for this problem. In this work, we tackle this challenge and design several new algorithms for subgraph counting in the Massively Parallel Computation (MPC) model: Given a graph

G

over

n

vertices,

m

edges and

T

triangles, our first main result is an algorithm that, with high probability, outputs a

(1+\varepsilon)

-approximation to

T

, with optimal round and space complexity provided any

S \geq \max{(\sqrt m, n^2/m)}

space per machine, assuming

T=\Omega(\sqrt{m/n})

. Our second main result is an

\tilde{O}_{\delta}(\log \log n)

-rounds algorithm for exactly counting the number of triangles, parametrized by the arboricity

\alpha

of the input graph. The space per machine is

O(n^{\delta})

for any constant

\delta

, and the total space is

O(m\alpha)

, which matches the time complexity of (combinatorial) triangle counting in the sequential model. We also prove that this result can be extended to exactly counting

k

-cliques for any constant

k

, with the same round complexity and total space

O(m\alpha^{k-2})

. Alternatively, allowing

O(\alpha^2)

space per machine, the total space requirement reduces to

O(n\alpha^2)

. Finally, we prove that a recent result of Bera, Pashanasangi and Seshadhri (ITCS 2020) for exactly counting all subgraphs of size at most

5

, can be implemented in the MPC model in

\tilde{O}_{\delta}(\sqrt{\log n})

rounds,

O(n^{\delta})

space per machine and

O(m\alpha^3)

total space. Therefore, this result also exhibits the phenomenon that a time bound in the sequential model translates to a space bound in the MPC model

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Towards Performance Portable Graph Algorithms

Author: Yasar Abdurrahman
Publication venue: Georgia Institute of Technology
Publication date: 14/01/2022
Field of study

In today's data-driven world, our computational resources have become heterogeneous, making the processing of large-scale graphs in an architecture agnostic manner crucial. Traditionally, hand-optimized high-performance computing (HPC) solutions have been studied and used to implement highly efficient and scalable graph algorithms. In recent years, several graph processing and management systems have also been proposed. Hand optimized HPC approaches require high levels of expertise and graph processing frameworks suffer from expressibility and performance. Portability is a major concern for both approaches. The main thesis of this work is that block-based graph algorithms offer a compromise between efficient parallelism and architecture agnostic algorithm design for a wide class of graph problems. This dissertation seeks to prove this thesis by focusing the work on the three pillars; data/computation partitioning, block-based algorithm design, and performance portability. In this dissertation, we first show how we can partition the computation and the data to design efficient block-based algorithms for solving graph merging and triangle counting problems. Then, generalizing from our experiences, we propose an algorithmic framework, for shared-memory, heterogeneous machines for implementing block-based graph algorithms; PGAbB. PGAbB aims to maximally leverage different architectures by implementing a task-based execution on top of a block-based programming model. In this talk we will discuss PGAbB's programming model, algorithmic optimizations for scheduling, and load-balancing strategies for graph problems on real-world and synthetic inputs.Ph.D

Scholarly Materials And Research @ Georgia Tech

Introduction to the second issue of Social Network Analysis and Mining journal: scientific computing for social network analysis and dynamicity

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref