3 research outputs found
Faster sublinear approximations of -cliques for low arboricity graphs
Given query access to an undirected graph , we consider the problem of
computing a -approximation of the number of -cliques in .
The standard query model for general graphs allows for degree queries, neighbor
queries, and pair queries. Let be the number of vertices, be the number
of edges, and be the number of -cliques. Previous work by Eden, Ron
and Seshadhri (STOC 2018) gives an -time algorithm for this problem (we use to
suppress \poly(\log n, 1/\epsilon, k^k) dependencies). Moreover, this bound
is nearly optimal when the expression is sublinear in the size of the graph.
Our motivation is to circumvent this lower bound, by parameterizing the
complexity in terms of \emph{graph arboricity}. The arboricity of is a
measure for the graph density "everywhere". We design an algorithm for the
class of graphs with arboricity at most , whose running time is
. We also prove a nearly matching lower bound. For all
graphs, the arboricity is , so this bound subsumes all previous
results on sublinear clique approximation.
As a special case of interest, consider minor-closed families of graphs,
which have constant arboricity. Our result implies that for any minor-closed
family of graphs, there is a -approximation algorithm for
that has running time . Such a bound was not known even for
the special (classic) case of triangle counting in planar graphs
How to Count Triangles, without Seeing the Whole Graph
Triangle counting is a fundamental problem in the analysis of large graphs.
There is a rich body of work on this problem, in varying streaming and
distributed models, yet all these algorithms require reading the whole input
graph. In many scenarios, we do not have access to the whole graph, and can
only sample a small portion of the graph (typically through crawling). In such
a setting, how can we accurately estimate the triangle count of the graph?
We formally study triangle counting in the {\em random walk} access model
introduced by Dasgupta et al (WWW '14) and Chierichetti et al (WWW '16). We
have access to an arbitrary seed vertex of the graph, and can only perform
random walks. This model is restrictive in access and captures the challenges
of collecting real-world graphs. Even sampling a uniform random vertex is a
hard task in this model.
Despite these challenges, we design a provable and practical algorithm,
TETRIS, for triangle counting in this model. TETRIS is the first provably
sublinear algorithm (for most natural parameter settings) that approximates the
triangle count in the random walk model, for graphs with low mixing time. Our
result builds on recent advances in the theory of sublinear algorithms. The
final sample built by TETRIS is a careful mix of random walks and degree-biased
sampling of neighborhoods. Empirically, TETRIS accurately counts triangles on a
variety of large graphs, getting estimates within 5\% relative error by looking
at 3\% of the number of edges.Comment: Accepted for publication in KDD 202
Near-Linear Time Homomorphism Counting in Bounded Degeneracy Graphs: The Barrier of Long Induced Cycles
Counting homomorphisms of a constant sized pattern graph in an input
graph is a fundamental computational problem. There is a rich history of
studying the complexity of this problem, under various constraints on the input
and the pattern . Given the significance of this problem and the large
sizes of modern inputs, we investigate when near-linear time algorithms are
possible. We focus on the case when the input graph has bounded degeneracy, a
commonly studied and practically relevant class for homomorphism counting. It
is known from previous work that for certain classes of , -homomorphisms
can be counted exactly in near-linear time in bounded degeneracy graphs. Can we
precisely characterize the patterns for which near-linear time algorithms
are possible?
We completely resolve this problem, discovering a clean dichotomy using
fine-grained complexity. Let denote the number of edges in . We prove
the following: if the largest induced cycle in has length at most , then
there is an algorithm for counting -homomorphisms in bounded
degeneracy graphs. If the largest induced cycle in has length at least ,
then (assuming standard fine-grained complexity conjectures) there is a
constant , such that there is no time algorithm
for counting -homomorphisms.Comment: To be published in Symposium on Discrete Algorithms (SODA) 2021 Added
conclusion section in the new versio