Search CORE

3 research outputs found

Faster sublinear approximations of $k$ -cliques for low arboricity graphs

Author: Eden Talya
Ron Dana
Seshadhri C.
Publication venue
Publication date: 11/11/2018
Field of study

Given query access to an undirected graph

G

, we consider the problem of computing a

(1\pm\epsilon)

-approximation of the number of

k

-cliques in

G

. The standard query model for general graphs allows for degree queries, neighbor queries, and pair queries. Let

n

be the number of vertices,

m

be the number of edges, and

n_k

be the number of

k

-cliques. Previous work by Eden, Ron and Seshadhri (STOC 2018) gives an

O^*(\frac{n}{n^{1/k}_k} + \frac{m^{k/2}}{n_k})

-time algorithm for this problem (we use

O^*(\cdot)

to suppress \poly(\log n, 1/\epsilon, k^k) dependencies). Moreover, this bound is nearly optimal when the expression is sublinear in the size of the graph. Our motivation is to circumvent this lower bound, by parameterizing the complexity in terms of \emph{graph arboricity}. The arboricity of

G

is a measure for the graph density "everywhere". We design an algorithm for the class of graphs with arboricity at most

\alpha

, whose running time is

O^*(\min\{\frac{n\alpha^{k-1}}{n_k},\, \frac{n}{n_k^{1/k}}+\frac{m \alpha^{k-2}}{n_k} \})

. We also prove a nearly matching lower bound. For all graphs, the arboricity is

O(\sqrt m)

, so this bound subsumes all previous results on sublinear clique approximation. As a special case of interest, consider minor-closed families of graphs, which have constant arboricity. Our result implies that for any minor-closed family of graphs, there is a

(1\pm\epsilon)

-approximation algorithm for

n_k

that has running time

O^*(\frac{n}{n_k})

. Such a bound was not known even for the special (classic) case of triangle counting in planar graphs

arXiv.org e-Print Archive

How to Count Triangles, without Seeing the Whole Graph

Author: Bera Suman K.
Seshadhri C.
Publication venue
Publication date: 21/06/2020
Field of study

Triangle counting is a fundamental problem in the analysis of large graphs. There is a rich body of work on this problem, in varying streaming and distributed models, yet all these algorithms require reading the whole input graph. In many scenarios, we do not have access to the whole graph, and can only sample a small portion of the graph (typically through crawling). In such a setting, how can we accurately estimate the triangle count of the graph? We formally study triangle counting in the {\em random walk} access model introduced by Dasgupta et al (WWW '14) and Chierichetti et al (WWW '16). We have access to an arbitrary seed vertex of the graph, and can only perform random walks. This model is restrictive in access and captures the challenges of collecting real-world graphs. Even sampling a uniform random vertex is a hard task in this model. Despite these challenges, we design a provable and practical algorithm, TETRIS, for triangle counting in this model. TETRIS is the first provably sublinear algorithm (for most natural parameter settings) that approximates the triangle count in the random walk model, for graphs with low mixing time. Our result builds on recent advances in the theory of sublinear algorithms. The final sample built by TETRIS is a careful mix of random walks and degree-biased sampling of neighborhoods. Empirically, TETRIS accurately counts triangles on a variety of large graphs, getting estimates within 5\% relative error by looking at 3\% of the number of edges.Comment: Accepted for publication in KDD 202

arXiv.org e-Print Archive

Near-Linear Time Homomorphism Counting in Bounded Degeneracy Graphs: The Barrier of Long Induced Cycles

Author: Bera Suman K.
Pashanasangi Noujan
Seshadhri C.
Publication venue
Publication date: 18/11/2020
Field of study

Counting homomorphisms of a constant sized pattern graph

H

in an input graph

G

is a fundamental computational problem. There is a rich history of studying the complexity of this problem, under various constraints on the input

G

and the pattern

H

. Given the significance of this problem and the large sizes of modern inputs, we investigate when near-linear time algorithms are possible. We focus on the case when the input graph has bounded degeneracy, a commonly studied and practically relevant class for homomorphism counting. It is known from previous work that for certain classes of

H

H

-homomorphisms can be counted exactly in near-linear time in bounded degeneracy graphs. Can we precisely characterize the patterns

H

for which near-linear time algorithms are possible? We completely resolve this problem, discovering a clean dichotomy using fine-grained complexity. Let

m

denote the number of edges in

G

. We prove the following: if the largest induced cycle in

H

has length at most

5

, then there is an

O(m\log m)

algorithm for counting

H

-homomorphisms in bounded degeneracy graphs. If the largest induced cycle in

H

has length at least

6

, then (assuming standard fine-grained complexity conjectures) there is a constant

\gamma > 0

, such that there is no

o(m^{1+\gamma})

time algorithm for counting

H

-homomorphisms.Comment: To be published in Symposium on Discrete Algorithms (SODA) 2021 Added conclusion section in the new versio

arXiv.org e-Print Archive