Search CORE

1,670 research outputs found

Combinatorial algorithm for counting small induced graphs and orbits

Author: Demšar Janez
Hočevar Tomaž
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 25/01/2016
Field of study

Graphlet analysis is an approach to network analysis that is particularly popular in bioinformatics. We show how to set up a system of linear equations that relate the orbit counts and can be used in an algorithm that is significantly faster than the existing approaches based on direct enumeration of graphlets. The algorithm requires existence of a vertex with certain properties; we show that such vertex exists for graphlets of arbitrary size, except for complete graphs and

C_4

, which are treated separately. Empirical analysis of running time agrees with the theoretical results

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

On the Distributed Complexity of Large-Scale Graph Computations

Author: Pandurangan Gopal
Robinson Peter
Scquizzato Michele
Publication venue
Publication date: 01/01/2018
Field of study

Motivated by the increasing need to understand the distributed algorithmic foundations of large-scale graph computations, we study some fundamental graph problems in a message-passing model for distributed computing where

k \geq 2

machines jointly perform computations on graphs with

n

nodes (typically,

n \gg k

). The input graph is assumed to be initially randomly partitioned among the

k

machines, a common implementation in many real-world systems. Communication is point-to-point, and the goal is to minimize the number of communication {\em rounds} of the computation. Our main contribution is the {\em General Lower Bound Theorem}, a theorem that can be used to show non-trivial lower bounds on the round complexity of distributed large-scale data computations. The General Lower Bound Theorem is established via an information-theoretic approach that relates the round complexity to the minimal amount of information required by machines to solve the problem. Our approach is generic and this theorem can be used in a "cookbook" fashion to show distributed lower bounds in the context of several problems, including non-graph problems. We present two applications by showing (almost) tight lower bounds for the round complexity of two fundamental graph problems, namely {\em PageRank computation} and {\em triangle enumeration}. Our approach, as demonstrated in the case of PageRank, can yield tight lower bounds for problems (including, and especially, under a stochastic partition of the input) where communication complexity techniques are not obvious. Our approach, as demonstrated in the case of triangle enumeration, can yield stronger round lower bounds as well as message-round tradeoffs compared to approaches that use communication complexity techniques

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università di Padova

On Approximating the Number of $k$ -cliques in Sublinear Time

Author: Avron H.
Curvature
Eden T.
New
On
Onak K.
Portes Alejandro
Seshadhri C.
Publication venue
Publication date: 12/03/2018
Field of study

We study the problem of approximating the number of

k

-cliques in a graph when given query access to the graph. We consider the standard query model for general graphs via (1) degree queries, (2) neighbor queries and (3) pair queries. Let

n

denote the number of vertices in the graph,

m

the number of edges, and

C_k

the number of

k

-cliques. We design an algorithm that outputs a

(1+\varepsilon)

-approximation (with high probability) for

C_k

, whose expected query complexity and running time are O\left(\frac{n}{C_k^{1/k}}+\frac{m^{k/2}}{C_k}\right)\poly(\log n,1/\varepsilon,k). Hence, the complexity of the algorithm is sublinear in the size of the graph for

C_k = \omega(m^{k/2-1})

. Furthermore, we prove a lower bound showing that the query complexity of our algorithm is essentially optimal (up to the dependence on

\log n

1/\varepsilon

and

k

). The previous results in this vein are by Feige (SICOMP 06) and by Goldreich and Ron (RSA 08) for edge counting (

k=2

) and by Eden et al. (FOCS 2015) for triangle counting (

k=3

). Our result matches the complexities of these results. The previous result by Eden et al. hinges on a certain amortization technique that works only for triangle counting, and does not generalize for larger cliques. We obtain a general algorithm that works for any

k\geq 3

by designing a procedure that samples each

k

-clique incident to a given set

S

of vertices with approximately equal probability. The primary difficulty is in finding cliques incident to purely high-degree vertices, since random sampling within neighbors has a low success probability. This is achieved by an algorithm that samples uniform random high degree vertices and a careful tradeoff between estimating cliques incident purely to high-degree vertices and those that include a low-degree vertex

arXiv.org e-Print Archive

Crossref

Motif counting beyond five nodes

Author: Bressan Marco
Chierichetti Flavio
Kumar Ravi
Leucci Stefano
Panconesi Alessandro
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

Counting graphlets is a well-studied problem in graph mining and social network analysis. Recently, several papers explored very simple and natural algorithms based on Monte Carlo sampling of Markov Chains (MC), and reported encouraging results. We show, perhaps surprisingly, that such algorithms are outperformed by color coding (CC) [2], a sophisticated algorithmic technique that we extend to the case of graphlet sampling and for which we prove strong statistical guarantees. Our computational experiments on graphs with millions of nodes show CC to be more accurate than MC; furthermore, we formally show that the mixing time of the MC approach is too high in general, even when the input graph has high conductance. All this comes at a price however. While MC is very efficient in terms of space, CC’s memory requirements become demanding when the size of the input graph and that of the graphlets grow. And yet, our experiments show that CC can push the limits of the state-of-the-art, both in terms of the size of the input graph and of that of the graphlets

Archivio della ricerca- Università di Roma La Sapienza