399 research outputs found
A Simple Sublinear-Time Algorithm for Counting Arbitrary Subgraphs via Edge Sampling
In the subgraph counting problem, we are given a (large) input graph G(V, E) and a (small) target graph H (e.g., a triangle); the goal is to estimate the number of occurrences of H in G. Our focus here is on designing sublinear-time algorithms for approximately computing number of occurrences of H in G in the setting where the algorithm is given query access to G. This problem has been studied in several recent papers which primarily focused on specific families of graphs H such as triangles, cliques, and stars. However, not much is known about approximate counting of arbitrary graphs H in the literature. This is in sharp contrast to the closely related subgraph enumeration problem that has received significant attention in the database community as the database join problem. The AGM bound shows that the maximum number of occurrences of any arbitrary subgraph H in a graph G with m edges is O(m^{rho(H)}), where rho(H) is the fractional edge-cover of H, and enumeration algorithms with matching runtime are known for any H.
We bridge this gap between subgraph counting and subgraph enumeration by designing a simple sublinear-time algorithm that can estimate the number of occurrences of any arbitrary graph H in G, denoted by #H, to within a (1 +/- epsilon)-approximation with high probability in O(m^{rho(H)}/#H) * poly(log(n),1/epsilon) time. Our algorithm is allowed the standard set of queries for general graphs, namely degree queries, pair queries and neighbor queries, plus an additional edge-sample query that returns an edge chosen uniformly at random. The performance of our algorithm matches those of Eden et al. [FOCS 2015, STOC 2018] for counting triangles and cliques and extend them to all choices of subgraph H under the additional assumption of edge-sample queries
Sampling arbitrary subgraphs exactly uniformly in sublinear time
We present a simple sublinear-time algorithm for sampling an arbitrary subgraph H exactly uniformly from a graph G, to which the algorithm has access by performing the following types of queries: (1) uniform vertex queries, (2) degree queries, (3) neighbor queries, (4) pair queries and (5) edge sampling queries. The query complexity and running time of our algorithm are Õ(min{m, (m^ρ(H))/#H}) and Õ((m^ρ(H))/#H}), respectively, where ρ(H) is the fractional edge-cover of H and #H is the number of copies of H in G. For any clique on r vertices, i.e., H = K_r, our algorithm is almost optimal as any algorithm that samples an H from any distribution that has Ω(1) total probability mass on the set of all copies of H must perform Ω(min{m, (m^ρ(H))/(#H⋅(cr)^r)}) queries. Together with the query and time complexities of the (1±ε)-approximation algorithm for the number of subgraphs H by Assadi et al. [Sepehr Assadi et al., 2018] and the lower bound by Eden and Rosenbaum [Eden and Rosenbaum, 2018] for approximately counting cliques, our results suggest that in our query model, approximately counting cliques is "equivalent to" exactly uniformly sampling cliques, in the sense that the query and time complexities of exactly uniform sampling and randomized approximate counting are within polylogarithmic factor of each other. This stands in interesting contrast to an analogous relation between approximate counting and almost uniformly sampling for self-reducible problems in the polynomial-time regime by Jerrum, Valiant and Vazirani [Jerrum et al., 1986]
Fast counting with tensor networks
We introduce tensor network contraction algorithms for counting satisfying
assignments of constraint satisfaction problems (#CSPs). We represent each
arbitrary #CSP formula as a tensor network, whose full contraction yields the
number of satisfying assignments of that formula, and use graph theoretical
methods to determine favorable orders of contraction. We employ our heuristics
for the solution of #P-hard counting boolean satisfiability (#SAT) problems,
namely monotone #1-in-3SAT and #Cubic-Vertex-Cover, and find that they
outperform state-of-the-art solvers by a significant margin.Comment: v2: added results for monotone #1-in-3SAT; published versio
On Approximating the Number of -cliques in Sublinear Time
We study the problem of approximating the number of -cliques in a graph
when given query access to the graph.
We consider the standard query model for general graphs via (1) degree
queries, (2) neighbor queries and (3) pair queries. Let denote the number
of vertices in the graph, the number of edges, and the number of
-cliques. We design an algorithm that outputs a
-approximation (with high probability) for , whose
expected query complexity and running time are
O\left(\frac{n}{C_k^{1/k}}+\frac{m^{k/2}}{C_k}\right)\poly(\log
n,1/\varepsilon,k).
Hence, the complexity of the algorithm is sublinear in the size of the graph
for . Furthermore, we prove a lower bound showing that
the query complexity of our algorithm is essentially optimal (up to the
dependence on , and ).
The previous results in this vein are by Feige (SICOMP 06) and by Goldreich
and Ron (RSA 08) for edge counting () and by Eden et al. (FOCS 2015) for
triangle counting (). Our result matches the complexities of these
results.
The previous result by Eden et al. hinges on a certain amortization technique
that works only for triangle counting, and does not generalize for larger
cliques. We obtain a general algorithm that works for any by
designing a procedure that samples each -clique incident to a given set
of vertices with approximately equal probability. The primary difficulty is in
finding cliques incident to purely high-degree vertices, since random sampling
within neighbors has a low success probability. This is achieved by an
algorithm that samples uniform random high degree vertices and a careful
tradeoff between estimating cliques incident purely to high-degree vertices and
those that include a low-degree vertex
Sampling Multiple Edges Efficiently
We present a sublinear time algorithm that allows one to sample multiple edges from a distribution that is pointwise ?-close to the uniform distribution, in an amortized-efficient fashion. We consider the adjacency list query model, where access to a graph G is given via degree and neighbor queries.
The problem of sampling a single edge in this model has been raised by Eden and Rosenbaum (SOSA 18). Let n and m denote the number of vertices and edges of G, respectively. Eden and Rosenbaum provided upper and lower bounds of ?^*(n/? m) for sampling a single edge in general graphs (where O^*(?) suppresses poly(1/?) and poly(log n) dependencies). We ask whether the query complexity lower bound for sampling a single edge can be circumvented when multiple samples are required. That is, can we get an improved amortized per-sample cost if we allow a preprocessing phase? We answer in the affirmative.
We present an algorithm that, if one knows the number of required samples q in advance, has an overall cost that is sublinear in q, namely, O^*(? q ?(n/? m)), which is strictly preferable to O^*(q? (n/? m)) cost resulting from q invocations of the algorithm by Eden and Rosenbaum.
Subsequent to a preliminary version of this work, T?tek and Thorup (arXiv, preprint) proved that this bound is essentially optimal
Towards a Decomposition-Optimal Algorithm for Counting and Sampling Arbitrary Motifs in Sublinear Time
We consider the problem of sampling and approximately counting an arbitrary given motif H in a graph G, where access to G is given via queries: degree, neighbor, and pair, as well as uniform edge sample queries. Previous algorithms for these tasks were based on a decomposition of H into a collection of odd cycles and stars, denoted D^*(H) = {O_{k?},...,O_{k_q}, S_{p?},...,S_{p_?}}. These algorithms were shown to be optimal for the case where H is a clique or an odd-length cycle, but no other lower bounds were known.
We present a new algorithm for sampling arbitrary motifs which, up to poly(log n) factors, is always at least as good, and for most graphs G is strictly better. The main ingredient leading to this improvement is an improved uniform algorithm for sampling stars, which might be of independent interest, as it allows to sample vertices according to the p-th moment of the degree distribution.
Finally, we prove that this algorithm is decomposition-optimal for decompositions that contain at least one odd cycle. These are the first lower bounds for motifs H with a nontrivial decomposition, i.e., motifs that have more than a single component in their decomposition
- …