Search CORE

1,815 research outputs found

Shared Memory Parallel Subgraph Enumeration

Author: Kimmig Raphael
Meyerhenke Henning
Strash Darren
Publication venue
Publication date: 25/05/2017
Field of study

The subgraph enumeration problem asks us to find all subgraphs of a target graph that are isomorphic to a given pattern graph. Determining whether even one such isomorphic subgraph exists is NP-complete---and therefore finding all such subgraphs (if they exist) is a time-consuming task. Subgraph enumeration has applications in many fields, including biochemistry and social networks, and interestingly the fastest algorithms for solving the problem for biochemical inputs are sequential. Since they depend on depth-first tree traversal, an efficient parallelization is far from trivial. Nevertheless, since important applications produce data sets with increasing difficulty, parallelism seems beneficial. We thus present here a shared-memory parallelization of the state-of-the-art subgraph enumeration algorithms RI and RI-DS (a variant of RI for dense graphs) by Bonnici et al. [BMC Bioinformatics, 2013]. Our strategy uses work stealing and our implementation demonstrates a significant speedup on real-world biochemical data---despite a highly irregular data access pattern. We also improve RI-DS by pruning the search space better; this further improves the empirical running times compared to the already highly tuned RI-DS.Comment: 18 pages, 12 figures, To appear at the 7th IEEE Workshop on Parallel / Distributed Computing and Optimization (PDCO 2017

arXiv.org e-Print Archive

Crossref

Shared-Memory Parallel Maximal Clique Enumeration

Author: Das Apurba
Sanei-Mehri Seyed-Vahid
Tirthapura Srikanta
Tirthapura Srikanta
Publication venue
Publication date: 01/01/2018
Field of study

We present shared-memory parallel methods for Maximal Clique Enumeration (MCE) from a graph. MCE is a fundamental and well-studied graph analytics task, and is a widely used primitive for identifying dense structures in a graph. Due to its computationally intensive nature, parallel methods are imperative for dealing with large graphs. However, surprisingly, there do not yet exist scalable and parallel methods for MCE on a shared-memory parallel machine. In this work, we present efficient shared-memory parallel algorithms for MCE, with the following properties: (1) the parallel algorithms are provably work-efficient relative to a state-of-the-art sequential algorithm (2) the algorithms have a provably small parallel depth, showing that they can scale to a large number of processors, and (3) our implementations on a multicore machine shows a good speedup and scaling behavior with increasing number of cores, and are substantially faster than prior shared-memory parallel algorithms for MCE.Comment: 10 pages, 3 figures, proceedings of the 25th IEEE International Conference on. High Performance Computing, Data, and Analytics (HiPC), 201

arXiv.org e-Print Archive

Digital Repository @ Iowa State University (ISU)

Crossref

Parallel Maximum Clique Algorithms with Applications to Network Analysis and Storage

Author: Ali Patwary
Assefaw H. Gebremedhin
David F. Gleich
Md. Mostofa
Ryan A. Rossi
Publication venue
Publication date: 25/12/2013
Field of study

We propose a fast, parallel maximum clique algorithm for large sparse graphs that is designed to exploit characteristics of social and information networks. The method exhibits a roughly linear runtime scaling over real-world networks ranging from 1000 to 100 million nodes. In a test on a social network with 1.8 billion edges, the algorithm finds the largest clique in about 20 minutes. Our method employs a branch and bound strategy with novel and aggressive pruning techniques. For instance, we use the core number of a vertex in combination with a good heuristic clique finder to efficiently remove the vast majority of the search space. In addition, we parallelize the exploration of the search tree. During the search, processes immediately communicate changes to upper and lower bounds on the size of maximum clique, which occasionally results in a super-linear speedup because vertices with large search spaces can be pruned by other processes. We apply the algorithm to two problems: to compute temporal strong components and to compress graphs.Comment: 11 page

arXiv.org e-Print Archive

CiteSeerX

GPU-accelerated subgraph enumeration on partitioned graphs

Author: GUO Wentian
HE Bingsheng
LI Yuchen
SHA Mo
TAN Kian-Lee
XIAO Xiaokui
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/06/2020
Field of study

Ministry of Education, Singapore under its Academic Research Funding Tier

Crossref

Institutional Knowledge at Singapore Management University

Efficient Strategies for Graph Pattern Mining Algorithms on GPUs

Author: Dias Vinicius
Ferraz Samuel
Meira Jr Wagner
Teixeira Carlos H. C.
Teodoro George
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/12/2022
Field of study

Graph Pattern Mining (GPM) is an important, rapidly evolving, and computation demanding area. GPM computation relies on subgraph enumeration, which consists in extracting subgraphs that match a given property from an input graph. Graphics Processing Units (GPUs) have been an effective platform to accelerate applications in many areas. However, the irregularity of subgraph enumeration makes it challenging for efficient execution on GPU due to typical uncoalesced memory access, divergence, and load imbalance. Unfortunately, these aspects have not been fully addressed in previous work. Thus, this work proposes novel strategies to design and implement subgraph enumeration efficiently on GPU. We support a depth-first search style search (DFS-wide) that maximizes memory performance while providing enough parallelism to be exploited by the GPU, along with a warp-centric design that minimizes execution divergence and improves utilization of the computing capabilities. We also propose a low-cost load balancing layer to avoid idleness and redistribute work among thread warps in a GPU. Our strategies have been deployed in a system named DuMato, which provides a simple programming interface to allow efficient implementation of GPM algorithms. Our evaluation has shown that DuMato is often an order of magnitude faster than state-of-the-art GPM systems and can mine larger subgraphs (up to 12 vertices).Comment: Accepted for publication on IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'22

arXiv.org e-Print Archive

On the Distributed Complexity of Large-Scale Graph Computations

Author: Pandurangan Gopal
Robinson Peter
Scquizzato Michele
Publication venue
Publication date: 01/01/2018
Field of study

Motivated by the increasing need to understand the distributed algorithmic foundations of large-scale graph computations, we study some fundamental graph problems in a message-passing model for distributed computing where

k \geq 2

machines jointly perform computations on graphs with

n

nodes (typically,

n \gg k

). The input graph is assumed to be initially randomly partitioned among the

k

machines, a common implementation in many real-world systems. Communication is point-to-point, and the goal is to minimize the number of communication {\em rounds} of the computation. Our main contribution is the {\em General Lower Bound Theorem}, a theorem that can be used to show non-trivial lower bounds on the round complexity of distributed large-scale data computations. The General Lower Bound Theorem is established via an information-theoretic approach that relates the round complexity to the minimal amount of information required by machines to solve the problem. Our approach is generic and this theorem can be used in a "cookbook" fashion to show distributed lower bounds in the context of several problems, including non-graph problems. We present two applications by showing (almost) tight lower bounds for the round complexity of two fundamental graph problems, namely {\em PageRank computation} and {\em triangle enumeration}. Our approach, as demonstrated in the case of PageRank, can yield tight lower bounds for problems (including, and especially, under a stochastic partition of the input) where communication complexity techniques are not obvious. Our approach, as demonstrated in the case of triangle enumeration, can yield stronger round lower bounds as well as message-round tradeoffs compared to approaches that use communication complexity techniques

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università di Padova