Search CORE

91,201 research outputs found

Beyond Triangles: A Distributed Framework for Estimating 3-profiles of Large Graphs

Author: Borokhovich Michael
Dimakis Alexandros G.
Elenberg Ethan R.
Shanmugam Karthikeyan
Publication venue
Publication date: 22/06/2015
Field of study

We study the problem of approximating the

3

-profile of a large graph.

3

-profiles are generalizations of triangle counts that specify the number of times a small graph appears as an induced subgraph of a large graph. Our algorithm uses the novel concept of

3

-profile sparsifiers: sparse graphs that can be used to approximate the full

3

-profile counts for a given large graph. Further, we study the problem of estimating local and ego

3

-profiles, two graph quantities that characterize the local neighborhood of each vertex of a graph. Our algorithm is distributed and operates as a vertex program over the GraphLab PowerGraph framework. We introduce the concept of edge pivoting which allows us to collect

2

-hop information without maintaining an explicit

2

-hop neighborhood list at each vertex. This enables the computation of all the local

3

-profiles in parallel with minimal communication. We test out implementation in several experiments scaling up to

640

cores on Amazon EC2. We find that our algorithm can estimate the

3

-profile of a graph in approximately the same time as triangle counting. For the harder problem of ego

3

-profiles, we introduce an algorithm that can estimate profiles of hundreds of thousands of vertices in parallel, in the timescale of minutes.Comment: To appear in part at KDD'1

arXiv.org e-Print Archive

CiteSeerX

Counting and Sampling Small Structures in Graph and Hypergraph Data Streams

Author: Haris Themistoklis
Publication venue: Dartmouth Digital Commons
Publication date: 06/06/2021
Field of study

In this thesis, we explore the problem of approximating the number of elementary substructures called simplices in large k-uniform hypergraphs. The hypergraphs are assumed to be too large to be stored in memory, so we adopt a data stream model, where the hypergraph is defined by a sequence of hyperedges. First we propose an algorithm that (ε, δ)-estimates the number of simplices using O(m1+1/k / T) bits of space. In addition, we prove that no constant-pass streaming algorithm can (ε, δ)- approximate the number of simplices using less than O( m 1+1/k / T ) bits of space. Thus we resolve the space complexity of the simplex counting problem by providing an algorithm that matches the lower bound. Second, we examine the triangle counting question –a hypergraph where k = 2. We develop and analyze an almost optimal O (n+m 3/2 / T) triangle-counting algorithm based on ideas introduced in [KMPT12]. The proposed algorithm is subsequently used to establish a method for uniformly sampling triangles in a graph stream using O(m 3/2 / T) bits of space, which beats the state-of-the-art O(mn / T) algorithm given by [PTTW13

Dartmouth Digital Commons (Dartmouth College)

A Fast Counting Method for 6-motifs with Low Connectivity

Author: AR Benson
CE Tsourakakis
DJ Watts
F Hormozdiari
K Faust
M Gonen
M Rahman
N Betzler
N Kashtan
O Frank
PW Holland
R Milo
S Wernicke
S Wernicke
T Hočevar
ZRM Kashani
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/02/2020
Field of study

k

-motif (or graphlet) is a subgraph on

k

nodes in a graph or network. Counting of motifs in complex networks has been a well-studied problem in network analysis of various real-word graphs arising from the study of social networks and bioinformatics. In particular, the triangle counting problem has received much attention due to its significance in understanding the behavior of social networks. Similarly, subgraphs with more than 3 nodes have received much attention recently. While there have been successful methods developed on this problem, most of the existing algorithms are not scalable to large networks with millions of nodes and edges. The main contribution of this paper is a preliminary study that genaralizes the exact counting algorithm provided by Pinar, Seshadhri and Vishal to a collection of 6-motifs. This method uses the counts of motifs with smaller size to obtain the counts of 6-motifs with low connecivity, that is, containing a cut-vertex or a cut-edge. Therefore, it circumvents the combinatorial explosion that naturally arises when counting subgraphs in large networks

arXiv.org e-Print Archive

Crossref

On Approximating the Number of $k$ -cliques in Sublinear Time

Author: Avron H.
Curvature
Eden T.
New
On
Onak K.
Portes Alejandro
Seshadhri C.
Publication venue
Publication date: 12/03/2018
Field of study

We study the problem of approximating the number of

k

-cliques in a graph when given query access to the graph. We consider the standard query model for general graphs via (1) degree queries, (2) neighbor queries and (3) pair queries. Let

n

denote the number of vertices in the graph,

m

the number of edges, and

C_k

the number of

k

-cliques. We design an algorithm that outputs a

(1+\varepsilon)

-approximation (with high probability) for

C_k

, whose expected query complexity and running time are O\left(\frac{n}{C_k^{1/k}}+\frac{m^{k/2}}{C_k}\right)\poly(\log n,1/\varepsilon,k). Hence, the complexity of the algorithm is sublinear in the size of the graph for

C_k = \omega(m^{k/2-1})

. Furthermore, we prove a lower bound showing that the query complexity of our algorithm is essentially optimal (up to the dependence on

\log n

1/\varepsilon

and

k

). The previous results in this vein are by Feige (SICOMP 06) and by Goldreich and Ron (RSA 08) for edge counting (

k=2

) and by Eden et al. (FOCS 2015) for triangle counting (

k=3

). Our result matches the complexities of these results. The previous result by Eden et al. hinges on a certain amortization technique that works only for triangle counting, and does not generalize for larger cliques. We obtain a general algorithm that works for any

k\geq 3

by designing a procedure that samples each

k

-clique incident to a given set

S

of vertices with approximately equal probability. The primary difficulty is in finding cliques incident to purely high-degree vertices, since random sampling within neighbors has a low success probability. This is achieved by an algorithm that samples uniform random high degree vertices and a careful tradeoff between estimating cliques incident purely to high-degree vertices and those that include a low-degree vertex

arXiv.org e-Print Archive

Crossref