thesis

Chromatic scheduling of dynamic data-graph computations

Abstract

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (pages 67-73).Data-graph computations are a parallel-programming model popularized by programming systems such as Pregel, GraphLab, PowerGraph, and GraphChi. A fundamental issue in parallelizing data-graph computations is the avoidance of races between computation occurring on overlapping regions of the graph. Common solutions such as locking protocols and bulk-synchronous execution often sacrifice performance, update atomicity, or determinism. A known alternative is chromatic scheduling which uses a vertex coloring of the conflict graph to divide data-graph updates into sets which may be parallelized without races. To date, however, only static data-graph computations, which do not schedule updates at runtime, have employed chromatic scheduling. I introduce PRISM, a work-efficient scheduling algorithm for dynamic data-graph computations that uses chromatic scheduling. For a collection of four application benchmarks on a modern multicore machine, chromatic scheduling approximately doubles the performance of the lock-based GraphLab implementation, and triples the performance of GraphChi's update execution phase when enforcing determinism. Chromatic scheduling motivates the development of efficient deterministic parallel coloring algorithms. New analysis of the Jones-Plassmann message-passing algorithm shows that only O([Delta] + In A in V/ In ln V) rounds are needed to color a graph G = (V, E) with max vertex degree [Delta], generalizing previous results for bounded degree graphs. A new log-degree ordering heuristic is described which can reduce the number of colors used in practice, while only increasing the number of rounds by a logrithmic factor. An efficient implementation for the shared-memory setting is described and analyzed using the CRQW contention model, showing that this algorithm performs [Theta](V + E) work and has expected span O([Delta] In [Delta]A + 1n 2[Delta] In V/In In V). Benchmarks on a set of real world graphs show that, in practice, these parallel algorithms achieve modest speedup over optimized serial code (around 4x on a 12-core machine).by Tim Kaler.M. Eng

    Similar works