Search CORE

138 research outputs found

Estimating Graphlet Statistics via Lifting

Author: Paramonov Kirill
Sharpnack James
Shemetov Dmitry
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 15/04/2020
Field of study

Exploratory analysis over network data is often limited by the ability to efficiently calculate graph statistics, which can provide a model-free understanding of the macroscopic properties of a network. We introduce a framework for estimating the graphlet count---the number of occurrences of a small subgraph motif (e.g. a wedge or a triangle) in the network. For massive graphs, where accessing the whole graph is not possible, the only viable algorithms are those that make a limited number of vertex neighborhood queries. We introduce a Monte Carlo sampling technique for graphlet counts, called {\em Lifting}, which can simultaneously sample all graphlets of size up to

k

vertices for arbitrary

k

. This is the first graphlet sampling method that can provably sample every graphlet with positive probability and can sample graphlets of arbitrary size

k

. We outline variants of lifted graphlet counts, including the ordered, unordered, and shotgun estimators, random walk starts, and parallel vertex starts. We prove that our graphlet count updates are unbiased for the true graphlet count and have a controlled variance for all graphlets. We compare the experimental performance of lifted graphlet counts to the state-of-the art graphlet sampling procedures: Waddling and the pairwise subgraph random walk

arXiv.org e-Print Archive

Crossref

Motif counting beyond five nodes

Author: Bressan Marco
Chierichetti Flavio
Kumar Ravi
Leucci Stefano
Panconesi Alessandro
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

Counting graphlets is a well-studied problem in graph mining and social network analysis. Recently, several papers explored very simple and natural algorithms based on Monte Carlo sampling of Markov Chains (MC), and reported encouraging results. We show, perhaps surprisingly, that such algorithms are outperformed by color coding (CC) [2], a sophisticated algorithmic technique that we extend to the case of graphlet sampling and for which we prove strong statistical guarantees. Our computational experiments on graphs with millions of nodes show CC to be more accurate than MC; furthermore, we formally show that the mixing time of the MC approach is too high in general, even when the input graph has high conductance. All this comes at a price however. While MC is very efficient in terms of space, CC’s memory requirements become demanding when the size of the input graph and that of the graphlets grow. And yet, our experiments show that CC can push the limits of the state-of-the-art, both in terms of the size of the input graph and of that of the graphlets

Archivio della ricerca- Università di Roma La Sapienza

Motivo: Fast Motif Counting via Succinct Color Coding and Adaptive Sampling

Author: Bressan M.
Leucci S.
Panconesi A.
Publication venue: 'VLDB Endowment'
Publication date: 01/01/2019
Field of study

The randomized technique of color coding is behind state-of-the-art algorithms for estimating graph motif counts. Those algorithms, however, are not yet capable of scaling well to very large graphs with billions of edges. In this paper we develop novel tools for the `motif counting via color coding' framework. As a result, our new algorithm, Motivo, is able to scale well to larger graphs while at the same time provide more accurate graphlet counts than ever before. This is achieved thanks to two types of improvements. First, we design new succinct data structures that support fast common color coding operations, and a biased coloring trick that trades accuracy versus running time and memory usage. These adaptations drastically reduce the time and memory requirements of color coding. Second, we develop an adaptive graphlet sampling strategy, based on a fractional set cover problem, that breaks the additive approximation barrier of standard sampling. This strategy gives multiplicative approximations for all graphlets at once, allowing us to count not only the most frequent graphlets but also extremely rare ones. To give an idea of the improvements, in

40

minutes Motivo counts

7

-nodes motifs on a graph with

65

M nodes and

1.8

B edges; this is

30

and

500

times larger than the state of the art, respectively in terms of nodes and edges. On the accuracy side, in one hour Motivo produces accurate counts of

\approx \! 10.000

distinct

8

-node motifs on graphs where state-of-the-art algorithms fail even to find the second most frequent motif. Our method requires just a high-end desktop machine. These results show how color coding can bring motif mining to the realm of truly massive graphs using only ordinary hardware.Comment: 13 page

arXiv.org e-Print Archive

MPG.PuRe