Search CORE

29 research outputs found

A scalable parallel union-find algorithm for distributed memory computers

Author: Ali Patwary
Fredrik Manne
Md Mostofa
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2010
Field of study

Abstract The Union-Find algorithm is used for maintaining a number of nonoverlapping sets from a finite universe of elements. The algorithm has applications in a number of areas including the computation of spanning trees and in image processing. Although the algorithm is inherently sequential there has been some previous efforts at constructing parallel implementations. These have mainly focused on shared memory computers. In this paper we present the first scalable parallel implementation of the Union-Find algorithm suitable for distributed memory computers. Our new parallel algorithm is based on an observation of how the Find part of the sequential algorithm can be executed more efficiently. We show the efficiency of our implementation through a series of tests to compute spanning forests of very large graphs

CiteSeerX

Parallel Maximum Clique Algorithms with Applications to Network Analysis and Storage

Author: Ali Patwary
Assefaw H. Gebremedhin
David F. Gleich
Md. Mostofa
Ryan A. Rossi
Publication venue
Publication date: 25/12/2013
Field of study

We propose a fast, parallel maximum clique algorithm for large sparse graphs that is designed to exploit characteristics of social and information networks. The method exhibits a roughly linear runtime scaling over real-world networks ranging from 1000 to 100 million nodes. In a test on a social network with 1.8 billion edges, the algorithm finds the largest clique in about 20 minutes. Our method employs a branch and bound strategy with novel and aggressive pruning techniques. For instance, we use the core number of a vertex in combination with a good heuristic clique finder to efficiently remove the vast majority of the search space. In addition, we parallelize the exploration of the search tree. During the search, processes immediately communicate changes to upper and lower bounds on the size of maximum clique, which occasionally results in a super-linear speedup because vertices with large search spaces can be pruned by other processes. We apply the algorithm to two problems: to compute temporal strong components and to compress graphs.Comment: 11 page

arXiv.org e-Print Archive

CiteSeerX

Galactos: Computing the Anisotropic 3-Point Correlation Function for 2 Billion Galaxies

Author: Austin Brian
Bard Deborah
Deslippe Jack
Dubey Pradeep
Eisenstein Daniel J
Friesen Brian
Patwary Md. Mostofa Ali
Prabhat
Satish Nadathur
Slepian Zachary
Sundaram Narayanan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

The nature of dark energy and the complete theory of gravity are two central questions currently facing cosmology. A vital tool for addressing them is the 3-point correlation function (3PCF), which probes deviations from a spatially random distribution of galaxies. However, the 3PCF's formidable computational expense has prevented its application to astronomical surveys comprising millions to billions of galaxies. We present Galactos, a high-performance implementation of a novel, O(N^2) algorithm that uses a load-balanced k-d tree and spherical harmonic expansions to compute the anisotropic 3PCF. Our implementation is optimized for the Intel Xeon Phi architecture, exploiting SIMD parallelism, instruction and thread concurrency, and significant L1 and L2 cache reuse, reaching 39% of peak performance on a single node. Galactos scales to the full Cori system, achieving 9.8PF (peak) and 5.06PF (sustained) across 9636 nodes, making the 3PCF easily computable for all galaxies in the observable universe.Comment: 11 pages, 7 figures, accepted to SuperComputing 201

arXiv.org e-Print Archive

Crossref

eScholarship - University of California