74,563 research outputs found
Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable
There has been significant recent interest in parallel graph processing due
to the need to quickly analyze the large graphs available today. Many graph
codes have been designed for distributed memory or external memory. However,
today even the largest publicly-available real-world graph (the Hyperlink Web
graph with over 3.5 billion vertices and 128 billion edges) can fit in the
memory of a single commodity multicore server. Nevertheless, most experimental
work in the literature report results on much smaller graphs, and the ones for
the Hyperlink graph use distributed or external memory. Therefore, it is
natural to ask whether we can efficiently solve a broad class of graph problems
on this graph in memory.
This paper shows that theoretically-efficient parallel graph algorithms can
scale to the largest publicly-available graphs using a single machine with a
terabyte of RAM, processing them in minutes. We give implementations of
theoretically-efficient parallel algorithms for 20 important graph problems. We
also present the optimizations and techniques that we used in our
implementations, which were crucial in enabling us to process these large
graphs quickly. We show that the running times of our implementations
outperform existing state-of-the-art implementations on the largest real-world
graphs. For many of the problems that we consider, this is the first time they
have been solved on graphs at this scale. We have made the implementations
developed in this work publicly-available as the Graph-Based Benchmark Suite
(GBBS).Comment: This is the full version of the paper appearing in the ACM Symposium
on Parallelism in Algorithms and Architectures (SPAA), 201
A randomised primal-dual algorithm for distributed radio-interferometric imaging
Next generation radio telescopes, like the Square Kilometre Array, will
acquire an unprecedented amount of data for radio astronomy. The development of
fast, parallelisable or distributed algorithms for handling such large-scale
data sets is of prime importance. Motivated by this, we investigate herein a
convex optimisation algorithmic structure, based on primal-dual
forward-backward iterations, for solving the radio interferometric imaging
problem. It can encompass any convex prior of interest. It allows for the
distributed processing of the measured data and introduces further flexibility
by employing a probabilistic approach for the selection of the data blocks used
at a given iteration. We study the reconstruction performance with respect to
the data distribution and we propose the use of nonuniform probabilities for
the randomised updates. Our simulations show the feasibility of the
randomisation given a limited computing infrastructure as well as important
computational advantages when compared to state-of-the-art algorithmic
structures.Comment: 5 pages, 3 figures, Proceedings of the European Signal Processing
Conference (EUSIPCO) 2016, Related journal publication available at
https://arxiv.org/abs/1601.0402
Dynamic load balancing for the distributed mining of molecular structures
In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of
methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the
past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially
render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to
discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no
reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic
partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiverinitiated
load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer
Institute’s HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed
approach also allows for dynamic resource aggregation in a non dedicated computational environment. These features make it suitable
for large-scale, multi-domain, heterogeneous environments, such as computational grids
- …