3 research outputs found

    Faster sublinear approximations of kk-cliques for low arboricity graphs

    Full text link
    Given query access to an undirected graph GG, we consider the problem of computing a (1Β±Ο΅)(1\pm\epsilon)-approximation of the number of kk-cliques in GG. The standard query model for general graphs allows for degree queries, neighbor queries, and pair queries. Let nn be the number of vertices, mm be the number of edges, and nkn_k be the number of kk-cliques. Previous work by Eden, Ron and Seshadhri (STOC 2018) gives an Oβˆ—(nnk1/k+mk/2nk)O^*(\frac{n}{n^{1/k}_k} + \frac{m^{k/2}}{n_k})-time algorithm for this problem (we use Oβˆ—(β‹…)O^*(\cdot) to suppress \poly(\log n, 1/\epsilon, k^k) dependencies). Moreover, this bound is nearly optimal when the expression is sublinear in the size of the graph. Our motivation is to circumvent this lower bound, by parameterizing the complexity in terms of \emph{graph arboricity}. The arboricity of GG is a measure for the graph density "everywhere". We design an algorithm for the class of graphs with arboricity at most Ξ±\alpha, whose running time is Oβˆ—(min⁑{nΞ±kβˆ’1nk, nnk1/k+mΞ±kβˆ’2nk})O^*(\min\{\frac{n\alpha^{k-1}}{n_k},\, \frac{n}{n_k^{1/k}}+\frac{m \alpha^{k-2}}{n_k} \}). We also prove a nearly matching lower bound. For all graphs, the arboricity is O(m)O(\sqrt m), so this bound subsumes all previous results on sublinear clique approximation. As a special case of interest, consider minor-closed families of graphs, which have constant arboricity. Our result implies that for any minor-closed family of graphs, there is a (1Β±Ο΅)(1\pm\epsilon)-approximation algorithm for nkn_k that has running time Oβˆ—(nnk)O^*(\frac{n}{n_k}). Such a bound was not known even for the special (classic) case of triangle counting in planar graphs

    How to Count Triangles, without Seeing the Whole Graph

    Full text link
    Triangle counting is a fundamental problem in the analysis of large graphs. There is a rich body of work on this problem, in varying streaming and distributed models, yet all these algorithms require reading the whole input graph. In many scenarios, we do not have access to the whole graph, and can only sample a small portion of the graph (typically through crawling). In such a setting, how can we accurately estimate the triangle count of the graph? We formally study triangle counting in the {\em random walk} access model introduced by Dasgupta et al (WWW '14) and Chierichetti et al (WWW '16). We have access to an arbitrary seed vertex of the graph, and can only perform random walks. This model is restrictive in access and captures the challenges of collecting real-world graphs. Even sampling a uniform random vertex is a hard task in this model. Despite these challenges, we design a provable and practical algorithm, TETRIS, for triangle counting in this model. TETRIS is the first provably sublinear algorithm (for most natural parameter settings) that approximates the triangle count in the random walk model, for graphs with low mixing time. Our result builds on recent advances in the theory of sublinear algorithms. The final sample built by TETRIS is a careful mix of random walks and degree-biased sampling of neighborhoods. Empirically, TETRIS accurately counts triangles on a variety of large graphs, getting estimates within 5\% relative error by looking at 3\% of the number of edges.Comment: Accepted for publication in KDD 202

    Near-Linear Time Homomorphism Counting in Bounded Degeneracy Graphs: The Barrier of Long Induced Cycles

    Full text link
    Counting homomorphisms of a constant sized pattern graph HH in an input graph GG is a fundamental computational problem. There is a rich history of studying the complexity of this problem, under various constraints on the input GG and the pattern HH. Given the significance of this problem and the large sizes of modern inputs, we investigate when near-linear time algorithms are possible. We focus on the case when the input graph has bounded degeneracy, a commonly studied and practically relevant class for homomorphism counting. It is known from previous work that for certain classes of HH, HH-homomorphisms can be counted exactly in near-linear time in bounded degeneracy graphs. Can we precisely characterize the patterns HH for which near-linear time algorithms are possible? We completely resolve this problem, discovering a clean dichotomy using fine-grained complexity. Let mm denote the number of edges in GG. We prove the following: if the largest induced cycle in HH has length at most 55, then there is an O(mlog⁑m)O(m\log m) algorithm for counting HH-homomorphisms in bounded degeneracy graphs. If the largest induced cycle in HH has length at least 66, then (assuming standard fine-grained complexity conjectures) there is a constant γ>0\gamma > 0, such that there is no o(m1+γ)o(m^{1+\gamma}) time algorithm for counting HH-homomorphisms.Comment: To be published in Symposium on Discrete Algorithms (SODA) 2021 Added conclusion section in the new versio
    corecore