29 research outputs found
A scalable parallel union-find algorithm for distributed memory computers
Abstract The Union-Find algorithm is used for maintaining a number of nonoverlapping sets from a finite universe of elements. The algorithm has applications in a number of areas including the computation of spanning trees and in image processing. Although the algorithm is inherently sequential there has been some previous efforts at constructing parallel implementations. These have mainly focused on shared memory computers. In this paper we present the first scalable parallel implementation of the Union-Find algorithm suitable for distributed memory computers. Our new parallel algorithm is based on an observation of how the Find part of the sequential algorithm can be executed more efficiently. We show the efficiency of our implementation through a series of tests to compute spanning forests of very large graphs
Parallel Maximum Clique Algorithms with Applications to Network Analysis and Storage
We propose a fast, parallel maximum clique algorithm for large sparse graphs
that is designed to exploit characteristics of social and information networks.
The method exhibits a roughly linear runtime scaling over real-world networks
ranging from 1000 to 100 million nodes. In a test on a social network with 1.8
billion edges, the algorithm finds the largest clique in about 20 minutes. Our
method employs a branch and bound strategy with novel and aggressive pruning
techniques. For instance, we use the core number of a vertex in combination
with a good heuristic clique finder to efficiently remove the vast majority of
the search space. In addition, we parallelize the exploration of the search
tree. During the search, processes immediately communicate changes to upper and
lower bounds on the size of maximum clique, which occasionally results in a
super-linear speedup because vertices with large search spaces can be pruned by
other processes. We apply the algorithm to two problems: to compute temporal
strong components and to compress graphs.Comment: 11 page
Galactos: Computing the Anisotropic 3-Point Correlation Function for 2 Billion Galaxies
The nature of dark energy and the complete theory of gravity are two central
questions currently facing cosmology. A vital tool for addressing them is the
3-point correlation function (3PCF), which probes deviations from a spatially
random distribution of galaxies. However, the 3PCF's formidable computational
expense has prevented its application to astronomical surveys comprising
millions to billions of galaxies. We present Galactos, a high-performance
implementation of a novel, O(N^2) algorithm that uses a load-balanced k-d tree
and spherical harmonic expansions to compute the anisotropic 3PCF. Our
implementation is optimized for the Intel Xeon Phi architecture, exploiting
SIMD parallelism, instruction and thread concurrency, and significant L1 and L2
cache reuse, reaching 39% of peak performance on a single node. Galactos scales
to the full Cori system, achieving 9.8PF (peak) and 5.06PF (sustained) across
9636 nodes, making the 3PCF easily computable for all galaxies in the
observable universe.Comment: 11 pages, 7 figures, accepted to SuperComputing 201