Search CORE

10 research outputs found

Sublinear Time Estimation of Degree Distribution Moments: The Degeneracy Connection

Author: Eden Talya
Ron Dana
Seshadhri C.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017)
Publication date: 01/01/2017
Field of study

We revisit the classic problem of estimating the degree distribution moments of an undirected graph. Consider an undirected graph G=(V,E) with n (non-isolated) vertices, and define (for s > 0) mu_s = 1n * sum_{v in V} d^s_v. Our aim is to estimate mu_s within a multiplicative error of (1+epsilon) (for a given approximation parameter epsilon>0) in sublinear time. We consider the sparse graph model that allows access to: uniform random vertices, queries for the degree of any vertex, and queries for a neighbor of any vertex. For the case of s=1 (the average degree), widetilde{O}(sqrt{n}) queries suffice for any constant epsilon (Feige, SICOMP 06 and Goldreich-Ron, RSA 08). Gonen-Ron-Shavitt (SIDMA 11) extended this result to all integral s > 0, by designing an algorithms that performs widetilde{O}(n^{1-1/(s+1)}) queries. (Strictly speaking, their algorithm approximates the number of star-subgraphs of a given size, but a slight modification gives an algorithm for moments.) We design a new, significantly simpler algorithm for this problem. In the worst-case, it exactly matches the bounds of Gonen-Ron-Shavitt, and has a much simpler proof. More importantly, the running time of this algorithm is connected to the degeneracy of G. This is (essentially) the maximum density of an induced subgraph. For the family of graphs with degeneracy at most alpha, it has a query complexity of widetilde{O}left(frac{n^{1-1/s}}{mu^{1/s}_s} Big(alpha^{1/s} + min{alpha,mu^{1/s}_s}Big)right) = widetilde{O}(n^{1-1/s}alpha/mu^{1/s}_s). Thus, for the class of bounded degeneracy graphs (which includes all minor closed families and preferential attachment graphs), we can estimate the average degree in widetilde{O}(1) queries, and can estimate the variance of the degree distribution in widetilde{O}(sqrt{n}) queries. This is a major improvement over the previous worst-case bounds. Our key insight is in designing an estimator for mu_s that has low variance when G does not have large dense subgraphs

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Sampling and Counting Edges via Vertex Accesses

Author: Thorup Mikkel
Tětek Jakub
Publication venue
Publication date: 08/07/2021
Field of study

We consider the problems of sampling and counting edges from a graph on

n

vertices where our basic access is via uniformly sampled vertices. When we have a vertex, we can see its degree, and access its neighbors. Eden and Rosenbaum [SOSA 2018] have shown it is possible to sample an edge

\epsilon

-uniformly in

O(\sqrt{1/\epsilon}\frac{n}{\sqrt{m}})

vertex accesses. Here, we get down to expected

O(\log(1/\epsilon)\frac{n}{\sqrt{m}})

vertex accesses. Next, we consider the problem of sampling

s>1

edges. For this we introduce a model that we call hash-based neighbor access. We show that, w.h.p, we can sample

s

edges exactly uniformly at random, with or without replacement, in

\tilde{O}(\sqrt{s} \frac{n}{\sqrt{m}} + s)

vertex accesses. We present a matching lower bound of

\Omega(\sqrt{s} \frac{n}{\sqrt{m}} + s)

which holds for

\epsilon

-uniform edge multi-sampling with some constant

\epsilon>0

even though our positive result has

\epsilon=0

. We then give an algorithm for edge counting. W.h.p., we count the number of edges to within error

\epsilon

in time

\tilde{O}(\frac{n}{\epsilon\sqrt{m}} + \frac{1}{\epsilon^2})

. When

\epsilon

is not too small (for

\epsilon \geq \frac{\sqrt m}{n}

), we present a near-matching lower-bound of

\Omega(\frac{n}{\epsilon \sqrt{m}})

. In the same range, the previous best upper and lower bounds were polynomially worse in

\epsilon

. Finally, we give an algorithm that instead of hash-based neighbor access uses the more standard pair queries (``are vertices

u

and

v

adjacent''). W.h.p. it returns

1+\epsilon

approximation of the number of edges and runs in expected time

\tilde{O}(\frac{n}{\epsilon \sqrt{m}} + \frac{1}{\epsilon^4})

. This matches our lower bound when

\epsilon

is not too small, specifically for

\epsilon \geq \frac{m^{1/6}}{n^{1/3}}

.Comment: This paper subsumes the arXiv report (arXiv:2009.11178) which only contains the result on sampling one edg

arXiv.org e-Print Archive

Copenhagen University Research Information System

Parallel Algorithms for Small Subgraph Counting

Author: Biswas Amartya Shankha
Eden Talya
Liu Quanquan C.
Mitrović Slobodan
Rubinfeld Ronitt
Publication venue
Publication date: 29/05/2020
Field of study

Subgraph counting is a fundamental problem in analyzing massive graphs, often studied in the context of social and complex networks. There is a rich literature on designing efficient, accurate, and scalable algorithms for this problem. In this work, we tackle this challenge and design several new algorithms for subgraph counting in the Massively Parallel Computation (MPC) model: Given a graph

G

over

n

vertices,

m

edges and

T

triangles, our first main result is an algorithm that, with high probability, outputs a

(1+\varepsilon)

-approximation to

T

, with optimal round and space complexity provided any

S \geq \max{(\sqrt m, n^2/m)}

space per machine, assuming

T=\Omega(\sqrt{m/n})

. Our second main result is an

\tilde{O}_{\delta}(\log \log n)

-rounds algorithm for exactly counting the number of triangles, parametrized by the arboricity

\alpha

of the input graph. The space per machine is

O(n^{\delta})

for any constant

\delta

, and the total space is

O(m\alpha)

, which matches the time complexity of (combinatorial) triangle counting in the sequential model. We also prove that this result can be extended to exactly counting

k

-cliques for any constant

k

, with the same round complexity and total space

O(m\alpha^{k-2})

. Alternatively, allowing

O(\alpha^2)

space per machine, the total space requirement reduces to

O(n\alpha^2)

. Finally, we prove that a recent result of Bera, Pashanasangi and Seshadhri (ITCS 2020) for exactly counting all subgraphs of size at most

5

, can be implemented in the MPC model in

\tilde{O}_{\delta}(\sqrt{\log n})

rounds,

O(n^{\delta})

space per machine and

O(m\alpha^3)

total space. Therefore, this result also exhibits the phenomenon that a time bound in the sequential model translates to a space bound in the MPC model

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Sublinear-Time Algorithms for Counting Star Subgraphs via Edge Sampling

Author: Aliakbarpour Maryam
Biswas Amartya Shankha
Gouleakis Themistoklis
Peebles John Lee Thompson
Rubinfeld Ronitt
Yodpinyanee Anak
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 03/02/2018
Field of study

We study the problem of estimating the value of sums of the form S[subscript p]≜∑([x[subscript i] over p]) when one has the ability to sample x[subscript i]≥0 with probability proportional to its magnitude. When p=2 , this problem is equivalent to estimating the selectivity of a self-join query in database systems when one can sample rows randomly. We also study the special case when {x[subscript i]} is the degree sequence of a graph, which corresponds to counting the number of p-stars in a graph when one has the ability to sample edges randomly. Our algorithm for a (1 ± ε) -multiplicative approximation of S[subscript p] has query and time complexities O(mloglogn/ϵ[superscript 2]S[superscript 1/p][subscript p]). Here, m=∑x[subscript i]/2 is the number of edges in the graph, or equivalently, half the number of records in the database table. Similarly, n is the number of vertices in the graph and the number of unique values in the database table. We also provide tight lower bounds (up to polylogarithmic factors) in almost all cases, even when {x[subscript i]} is a degree sequence and one is allowed to use the structure of the graph to try to get a better estimate. We are not aware of any prior lower bounds on the problem of join selectivity estimation. For the graph problem, prior work which assumed the ability to sample only vertices uniformly gave algorithms with matching lower bounds (Gonen et al. in SIAM J Comput 25:1365–1411, 2011). With the ability to sample edges randomly, we show that one can achieve faster algorithms for approximating the number of star subgraphs, bypassing the lower bounds in this prior work. For example, in the regime where S[subscript p]≤n , and p=2 , our upper bound is [~ over O](n/S[superscript 1/2][subscript p]), in contrast to their Ω(n/S[superscript 1/3][subscript p]) lower bound when no random edge queries are available. In addition, we consider the problem of counting the number of directed paths of length two when the graph is directed. This problem is equivalent to estimating the selectivity of a join query between two distinct tables. We prove that the general version of this problem cannot be solved in sublinear time. However, when the ratio between in-degree and out-degree is bounded—or equivalently, when the ratio between the number of occurrences of values in the two columns being joined is bounded—we give a sublinear time algorithm via a reduction to the undirected case. Keywords: Subgraphs, Approximate counting, Randomized algorithms, Sublinear-time algorithmsNational Science Foundation (U.S.). Graduate Research Fellowship Program (Grant CCF-1217423)National Science Foundation (U.S.). Graduate Research Fellowship Program (Grant CCF-1065125)National Science Foundation (U.S.). Graduate Research Fellowship Program (Grant CCF-1420692)National Science Foundation (U.S.). Graduate Research Fellowship Program (Grant CCF-1122374

DSpace@MIT