1 research outputs found
Algorithm-Level Optimizations for Scalable Parallel Graph Processing
Efficiently processing large graphs is challenging, since parallel graph algorithms suffer from
poor scalability and performance due to many factors, including heavy communication and load-imbalance.
Furthermore, it is difficult to express graph algorithms, as users need to understand
and effectively utilize the underlying execution of the algorithm on the distributed system. The
performance of graph algorithms depends not only on the characteristics of the system (such as
latency, available RAM, etc.), but also on the characteristics of the input graph (small-world scalefree,
mesh, long-diameter, etc.), and characteristics of the algorithm (sparse computation vs. dense
communication). The best execution strategy, therefore, often heavily depends on the combination
of input graph, system and algorithm.
Fine-grained expression exposes maximum parallelism in the algorithm and allows the user to
concentrate on a single vertex, making it easier to express parallel graph algorithms. However,
this often loses information about the machine, making it difficult to extract performance and
scalability from fine-grained algorithms.
To address these issues, we present a model for expressing parallel graph algorithms using a
fine-grained expression. Our model decouples the algorithm-writer from the underlying details
of the system, graph, and execution and tuning of the algorithm. We also present various graph
paradigms that optimize the execution of graph algorithms for various types of input graphs and
systems. We show our model is general enough to allow graph algorithms to use the various graph
paradigms for the best/fastest execution, and demonstrate good performance and scalability for
various different graphs, algorithms, and systems to 100,000+ cores