120,538 research outputs found
Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable
There has been significant recent interest in parallel graph processing due
to the need to quickly analyze the large graphs available today. Many graph
codes have been designed for distributed memory or external memory. However,
today even the largest publicly-available real-world graph (the Hyperlink Web
graph with over 3.5 billion vertices and 128 billion edges) can fit in the
memory of a single commodity multicore server. Nevertheless, most experimental
work in the literature report results on much smaller graphs, and the ones for
the Hyperlink graph use distributed or external memory. Therefore, it is
natural to ask whether we can efficiently solve a broad class of graph problems
on this graph in memory.
This paper shows that theoretically-efficient parallel graph algorithms can
scale to the largest publicly-available graphs using a single machine with a
terabyte of RAM, processing them in minutes. We give implementations of
theoretically-efficient parallel algorithms for 20 important graph problems. We
also present the optimizations and techniques that we used in our
implementations, which were crucial in enabling us to process these large
graphs quickly. We show that the running times of our implementations
outperform existing state-of-the-art implementations on the largest real-world
graphs. For many of the problems that we consider, this is the first time they
have been solved on graphs at this scale. We have made the implementations
developed in this work publicly-available as the Graph-Based Benchmark Suite
(GBBS).Comment: This is the full version of the paper appearing in the ACM Symposium
on Parallelism in Algorithms and Architectures (SPAA), 201
The Parallelism Motifs of Genomic Data Analysis
Genomic data sets are growing dramatically as the cost of sequencing
continues to decline and small sequencing devices become available. Enormous
community databases store and share this data with the research community, but
some of these genomic data analysis problems require large scale computational
platforms to meet both the memory and computational requirements. These
applications differ from scientific simulations that dominate the workload on
high end parallel systems today and place different requirements on programming
support, software libraries, and parallel architectural design. For example,
they involve irregular communication patterns such as asynchronous updates to
shared data structures. We consider several problems in high performance
genomics analysis, including alignment, profiling, clustering, and assembly for
both single genomes and metagenomes. We identify some of the common
computational patterns or motifs that help inform parallelization strategies
and compare our motifs to some of the established lists, arguing that at least
two key patterns, sorting and hashing, are missing
Scalable Kernelization for Maximum Independent Sets
The most efficient algorithms for finding maximum independent sets in both
theory and practice use reduction rules to obtain a much smaller problem
instance called a kernel. The kernel can then be solved quickly using exact or
heuristic algorithms---or by repeatedly kernelizing recursively in the
branch-and-reduce paradigm. It is of critical importance for these algorithms
that kernelization is fast and returns a small kernel. Current algorithms are
either slow but produce a small kernel, or fast and give a large kernel. We
attempt to accomplish both of these goals simultaneously, by giving an
efficient parallel kernelization algorithm based on graph partitioning and
parallel bipartite maximum matching. We combine our parallelization techniques
with two techniques to accelerate kernelization further: dependency checking
that prunes reductions that cannot be applied, and reduction tracking that
allows us to stop kernelization when reductions become less fruitful. Our
algorithm produces kernels that are orders of magnitude smaller than the
fastest kernelization methods, while having a similar execution time.
Furthermore, our algorithm is able to compute kernels with size comparable to
the smallest known kernels, but up to two orders of magnitude faster than
previously possible. Finally, we show that our kernelization algorithm can be
used to accelerate existing state-of-the-art heuristic algorithms, allowing us
to find larger independent sets faster on large real-world networks and
synthetic instances.Comment: Extended versio
Recent Advances in Graph Partitioning
We survey recent trends in practical algorithms for balanced graph
partitioning together with applications and future research directions
- …