Search CORE

19 research outputs found

Parallel Algorithms for Small Subgraph Counting

Author: Biswas Amartya Shankha
Eden Talya
Liu Quanquan C.
Mitrović Slobodan
Rubinfeld Ronitt
Publication venue
Publication date: 29/05/2020
Field of study

Subgraph counting is a fundamental problem in analyzing massive graphs, often studied in the context of social and complex networks. There is a rich literature on designing efficient, accurate, and scalable algorithms for this problem. In this work, we tackle this challenge and design several new algorithms for subgraph counting in the Massively Parallel Computation (MPC) model: Given a graph

G

over

n

vertices,

m

edges and

T

triangles, our first main result is an algorithm that, with high probability, outputs a

(1+\varepsilon)

-approximation to

T

, with optimal round and space complexity provided any

S \geq \max{(\sqrt m, n^2/m)}

space per machine, assuming

T=\Omega(\sqrt{m/n})

. Our second main result is an

\tilde{O}_{\delta}(\log \log n)

-rounds algorithm for exactly counting the number of triangles, parametrized by the arboricity

\alpha

of the input graph. The space per machine is

O(n^{\delta})

for any constant

\delta

, and the total space is

O(m\alpha)

, which matches the time complexity of (combinatorial) triangle counting in the sequential model. We also prove that this result can be extended to exactly counting

k

-cliques for any constant

k

, with the same round complexity and total space

O(m\alpha^{k-2})

. Alternatively, allowing

O(\alpha^2)

space per machine, the total space requirement reduces to

O(n\alpha^2)

. Finally, we prove that a recent result of Bera, Pashanasangi and Seshadhri (ITCS 2020) for exactly counting all subgraphs of size at most

5

, can be implemented in the MPC model in

\tilde{O}_{\delta}(\sqrt{\log n})

rounds,

O(n^{\delta})

space per machine and

O(m\alpha^3)

total space. Therefore, this result also exhibits the phenomenon that a time bound in the sequential model translates to a space bound in the MPC model

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Massively Parallel Algorithms for Distance Approximation and Spanners

Author: Biswas Amartya Shankha
Dory Michal
Ghaffari Mohsen
Mitrović Slobodan
Nazari Yasamin
Publication venue
Publication date: 31/01/2021
Field of study

Over the past decade, there has been increasing interest in distributed/parallel algorithms for processing large-scale graphs. By now, we have quite fast algorithms -- usually sublogarithmic-time and often

poly(\log\log n)

-time, or even faster -- for a number of fundamental graph problems in the massively parallel computation (MPC) model. This model is a widely-adopted theoretical abstraction of MapReduce style settings, where a number of machines communicate in an all-to-all manner to process large-scale data. Contributing to this line of work on MPC graph algorithms, we present

poly(\log k) \in poly(\log\log n)

round MPC algorithms for computing

O(k^{1+{o(1)}})

-spanners in the strongly sublinear regime of local memory. To the best of our knowledge, these are the first sublogarithmic-time MPC algorithms for spanner construction. As primary applications of our spanners, we get two important implications, as follows: -For the MPC setting, we get an

O(\log^2\log n)

-round algorithm for

O(\log^{1+o(1)} n)

approximation of all pairs shortest paths (APSP) in the near-linear regime of local memory. To the best of our knowledge, this is the first sublogarithmic-time MPC algorithm for distance approximations. -Our result above also extends to the Congested Clique model of distributed computing, with the same round complexity and approximation guarantee. This gives the first sub-logarithmic algorithm for approximating APSP in weighted graphs in the Congested Clique model

arXiv.org e-Print Archive

Repository for Publications and Research Data

DSpace@MIT

Learning Spanning Forests Optimally using CUT Queries in Weighted Undirected Graphs

Author: Chakrabarty Deeparnab
Liao Hang
Publication venue
Publication date: 16/06/2023
Field of study

In this paper we describe a randomized algorithm which returns a maximal spanning forest of an unknown {\em weighted} undirected graph making

O(n)

\mathsf{CUT}

queries in expectation. For weighted graphs, this is optimal due to a result in [Auza and Lee, 2021] which shows an

\Omega(n)

lower bound for zero-error randomized algorithms. %To our knowledge, it is the only regime of this problem where we have upper and lower bounds tight up to constants. These questions have been extensively studied in the past few years, especially due to the problem's connections to symmetric submodular function minimization. We also describe a simple polynomial time deterministic algorithm that makes

O(\frac{n\log n}{\log\log n})

queries on undirected unweighted graphs and returns a maximal spanning forest, thereby (slightly) improving upon the state-of-the-art

arXiv.org e-Print Archive

Graph sparsification for derandomizing massively parallel computation with low space

Author: Czumaj Artur
Davies Peter
Parter Merav
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

Massively Parallel Computation (MPC) is an emerging model which distills core aspects of distributed and parallel computation. It was developed as a tool to solve (typically graph) problems in systems where input is distributed over many machines with limited space. Recent work has focused on the regime in which machines have sublinear (in n, number of nodes in the input graph) space, with randomized algorithms presented for the fundamental problems of Maximal Matching and Maximal Independent Set. There are, however, no prior corresponding deterministic algorithms. A major challenge in the sublinear space setting is that the local space of each machine may be too small to store all the edges incident to a single node. To overcome this barrier we introduce a new graph sparsification technique that deterministically computes a low-degree subgraph with additional desired properties: degrees in the subgraph are sufficiently small that nodes’ neighborhoods can be stored on single machines, and solving the problem on the subgraph provides significant global progress towards solving the problem for the original input graph. Using this framework to derandomize the well-known randomized algorithm of Luby [SICOMP’86], we obtain O(log(\Delta) + loglog(n))- round deterministic MPC algorithms for solving the fundamental problems of Maximal Matching and Maximal Independent Set with O(n epsilon) space on each machine for any constant epsilon > 0. Based on the recent work of Ghaffari et al. [FOCS’18], this additive O(loglog(n)) factor is conditionally essential. These algorithms can also be shown to run in O(log(\Delta)) rounds in the closely related model of CONGESTED CLIQUE, improving upon the state-of-the-art bound of O(log^2(\Delta)) rounds by Censor-Hillel et al. [DISC’17]

arXiv.org e-Print Archive

Crossref

Warwick Research Archives Portal Repository

IST Austria: PubRep (Institute of Science and Technology)

Exponential Speedup over Locality in MPC with Optimal Memory

Author: Balliu Alkida
Brandt Sebastian
Fischer Manuela
Latypov Rustam
Maus Yannic
Olivetti Dennis
Uitto Jara
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 36th International Symposium on Distributed Computing (DISC 2022)
Publication date: 01/01/2022
Field of study

Locally Checkable Labeling (LCL) problems are graph problems in which a solution is correct if it satisfies some given constraints in the local neighborhood of each node. Example problems in this class include maximal matching, maximal independent set, and coloring problems. A successful line of research has been studying the complexities of LCL problems on paths/cycles, trees, and general graphs, providing many interesting results for the LOCAL model of distributed computing. In this work, we initiate the study of LCL problems in the low-space Massively Parallel Computation (MPC) model. In particular, on forests, we provide a method that, given the complexity of an LCL problem in the LOCAL model, automatically provides an exponentially faster algorithm for the low-space MPC setting that uses optimal global memory, that is, truly linear. While restricting to forests may seem to weaken the result, we emphasize that all known (conditional) lower bounds for the MPC setting are obtained by lifting lower bounds obtained in the distributed setting in tree-like networks (either forests or high girth graphs), and hence the problems that we study are challenging already on forests. Moreover, the most important technical feature of our algorithms is that they use optimal global memory, that is, memory linear in the number of edges of the graph. In contrast, most of the state-of-the-art algorithms use more than linear global memory. Further, they typically start with a dense graph, sparsify it, and then solve the problem on the residual graph, exploiting the relative increase in global memory. On forests, this is not possible, because the given graph is already as sparse as it can be, and using optimal memory requires new solutions

Repository for Publications and Research Data

CISPA – Helmholtz-Zentrum für Informationssicherheit

Dagstuhl Research Online Publication Server

Aaltodoc Publication Archive

Massively Parallel Single-Source SimRanks in $o(\log n)$ Rounds

Author: Luo Siqiang
Zhu Zulun
Publication venue
Publication date: 08/04/2023
Field of study

SimRank is one of the most fundamental measures that evaluate the structural similarity between two nodes in a graph and has been applied in a plethora of data management tasks. These tasks often involve single-source SimRank computation that evaluates the SimRank values between a source node

s

and all other nodes. Due to its high computation complexity, single-source SimRank computation for large graphs is notoriously challenging, and hence recent studies resort to distributed processing. To our surprise, although SimRank has been widely adopted for two decades, theoretical aspects of distributed SimRanks with provable results have rarely been studied. In this paper, we conduct a theoretical study on single-source SimRank computation in the Massive Parallel Computation (MPC) model, which is the standard theoretical framework modeling distributed systems such as MapReduce, Hadoop, or Spark. Existing distributed SimRank algorithms enforce either

\Omega(\log n)

communication round complexity or

\Omega(n)

machine space for a graph of

n

nodes. We overcome this barrier. Particularly, given a graph of

n

nodes, for any query node

v

and constant error

\epsilon>\frac{3}{n}

, we show that using

O(\log^2 \log n)

rounds of communication among machines is almost enough to compute single-source SimRank values with at most

\epsilon

absolute errors, while each machine only needs a space sub-linear to

n

. To the best of our knowledge, this is the first single-source SimRank algorithm in MPC that can overcome the

\Theta(\log n)

round complexity barrier with provable result accuracy

arXiv.org e-Print Archive