We study the problem of approximating the number of k-cliques in a graph
when given query access to the graph.
We consider the standard query model for general graphs via (1) degree
queries, (2) neighbor queries and (3) pair queries. Let n denote the number
of vertices in the graph, m the number of edges, and Ck the number of
k-cliques. We design an algorithm that outputs a
(1+ε)-approximation (with high probability) for Ck, whose
expected query complexity and running time are
O\left(\frac{n}{C_k^{1/k}}+\frac{m^{k/2}}{C_k}\right)\poly(\log
n,1/\varepsilon,k).
Hence, the complexity of the algorithm is sublinear in the size of the graph
for Ck=ω(mk/2−1). Furthermore, we prove a lower bound showing that
the query complexity of our algorithm is essentially optimal (up to the
dependence on logn, 1/ε and k).
The previous results in this vein are by Feige (SICOMP 06) and by Goldreich
and Ron (RSA 08) for edge counting (k=2) and by Eden et al. (FOCS 2015) for
triangle counting (k=3). Our result matches the complexities of these
results.
The previous result by Eden et al. hinges on a certain amortization technique
that works only for triangle counting, and does not generalize for larger
cliques. We obtain a general algorithm that works for any k≥3 by
designing a procedure that samples each k-clique incident to a given set S
of vertices with approximately equal probability. The primary difficulty is in
finding cliques incident to purely high-degree vertices, since random sampling
within neighbors has a low success probability. This is achieved by an
algorithm that samples uniform random high degree vertices and a careful
tradeoff between estimating cliques incident purely to high-degree vertices and
those that include a low-degree vertex