5,339 research outputs found

    Streaming and Massively Parallel Algorithms for Edge Coloring

    Get PDF
    A valid edge-coloring of a graph is an assignment of "colors" to its edges such that no two incident edges receive the same color. The goal is to find a proper coloring that uses few colors. (Note that the maximum degree, Delta, is a trivial lower bound.) In this paper, we revisit this fundamental problem in two models of computation specific to massive graphs, the Massively Parallel Computations (MPC) model and the Graph Streaming model: - Massively Parallel Computation: We give a randomized MPC algorithm that with high probability returns a Delta+O~(Delta^(3/4)) edge coloring in O(1) rounds using O(n) space per machine and O(m) total space. The space per machine can also be further improved to n^(1-Omega(1)) if Delta = n^Omega(1). Our algorithm improves upon a previous result of Harvey et al. [SPAA 2018]. - Graph Streaming: Since the output of edge-coloring is as large as its input, we consider a standard variant of the streaming model where the output is also reported in a streaming fashion. The main challenge is that the algorithm cannot "remember" all the reported edge colors, yet has to output a proper edge coloring using few colors. We give a one-pass O~(n)-space streaming algorithm that always returns a valid coloring and uses 5.44 Delta colors with high probability if the edges arrive in a random order. For adversarial order streams, we give another one-pass O~(n)-space algorithm that requires O(Delta^2) colors

    Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable

    Full text link
    There has been significant recent interest in parallel graph processing due to the need to quickly analyze the large graphs available today. Many graph codes have been designed for distributed memory or external memory. However, today even the largest publicly-available real-world graph (the Hyperlink Web graph with over 3.5 billion vertices and 128 billion edges) can fit in the memory of a single commodity multicore server. Nevertheless, most experimental work in the literature report results on much smaller graphs, and the ones for the Hyperlink graph use distributed or external memory. Therefore, it is natural to ask whether we can efficiently solve a broad class of graph problems on this graph in memory. This paper shows that theoretically-efficient parallel graph algorithms can scale to the largest publicly-available graphs using a single machine with a terabyte of RAM, processing them in minutes. We give implementations of theoretically-efficient parallel algorithms for 20 important graph problems. We also present the optimizations and techniques that we used in our implementations, which were crucial in enabling us to process these large graphs quickly. We show that the running times of our implementations outperform existing state-of-the-art implementations on the largest real-world graphs. For many of the problems that we consider, this is the first time they have been solved on graphs at this scale. We have made the implementations developed in this work publicly-available as the Graph-Based Benchmark Suite (GBBS).Comment: This is the full version of the paper appearing in the ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), 201

    On the Complexity of Local Distributed Graph Problems

    Full text link
    This paper is centered on the complexity of graph problems in the well-studied LOCAL model of distributed computing, introduced by Linial [FOCS '87]. It is widely known that for many of the classic distributed graph problems (including maximal independent set (MIS) and (Δ+1)(\Delta+1)-vertex coloring), the randomized complexity is at most polylogarithmic in the size nn of the network, while the best deterministic complexity is typically 2O(logn)2^{O(\sqrt{\log n})}. Understanding and narrowing down this exponential gap is considered to be one of the central long-standing open questions in the area of distributed graph algorithms. We investigate the problem by introducing a complexity-theoretic framework that allows us to shed some light on the role of randomness in the LOCAL model. We define the SLOCAL model as a sequential version of the LOCAL model. Our framework allows us to prove completeness results with respect to the class of problems which can be solved efficiently in the SLOCAL model, implying that if any of the complete problems can be solved deterministically in logO(1)n\log^{O(1)} n rounds in the LOCAL model, we can deterministically solve all efficient SLOCAL-problems (including MIS and (Δ+1)(\Delta+1)-coloring) in logO(1)n\log^{O(1)} n rounds in the LOCAL model. We show that a rather rudimentary looking graph coloring problem is complete in the above sense: Color the nodes of a graph with colors red and blue such that each node of sufficiently large polylogarithmic degree has at least one neighbor of each color. The problem admits a trivial zero-round randomized solution. The result can be viewed as showing that the only obstacle to getting efficient determinstic algorithms in the LOCAL model is an efficient algorithm to approximately round fractional values into integer values

    Relaxed Schedulers Can Efficiently Parallelize Iterative Algorithms

    Full text link
    There has been significant progress in understanding the parallelism inherent to iterative sequential algorithms: for many classic algorithms, the depth of the dependence structure is now well understood, and scheduling techniques have been developed to exploit this shallow dependence structure for efficient parallel implementations. A related, applied research strand has studied methods by which certain iterative task-based algorithms can be efficiently parallelized via relaxed concurrent priority schedulers. These allow for high concurrency when inserting and removing tasks, at the cost of executing superfluous work due to the relaxed semantics of the scheduler. In this work, we take a step towards unifying these two research directions, by showing that there exists a family of relaxed priority schedulers that can efficiently and deterministically execute classic iterative algorithms such as greedy maximal independent set (MIS) and matching. Our primary result shows that, given a randomized scheduler with an expected relaxation factor of kk in terms of the maximum allowed priority inversions on a task, and any graph on nn vertices, the scheduler is able to execute greedy MIS with only an additive factor of poly(kk) expected additional iterations compared to an exact (but not scalable) scheduler. This counter-intuitive result demonstrates that the overhead of relaxation when computing MIS is not dependent on the input size or structure of the input graph. Experimental results show that this overhead can be clearly offset by the gain in performance due to the highly scalable scheduler. In sum, we present an efficient method to deterministically parallelize iterative sequential algorithms, with provable runtime guarantees in terms of the number of executed tasks to completion.Comment: PODC 2018, pages 377-386 in proceeding

    A Fast and Scalable Graph Coloring Algorithm for Multi-core and Many-core Architectures

    Full text link
    Irregular computations on unstructured data are an important class of problems for parallel programming. Graph coloring is often an important preprocessing step, e.g. as a way to perform dependency analysis for safe parallel execution. The total run time of a coloring algorithm adds to the overall parallel overhead of the application whereas the number of colors used determines the amount of exposed parallelism. A fast and scalable coloring algorithm using as few colors as possible is vital for the overall parallel performance and scalability of many irregular applications that depend upon runtime dependency analysis. Catalyurek et al. have proposed a graph coloring algorithm which relies on speculative, local assignment of colors. In this paper we present an improved version which runs even more optimistically with less thread synchronization and reduced number of conflicts compared to Catalyurek et al.'s algorithm. We show that the new technique scales better on multi-core and many-core systems and performs up to 1.5x faster than its predecessor on graphs with high-degree vertices, while keeping the number of colors at the same near-optimal levels.Comment: To appear in the proceedings of Euro Par 201

    Gunrock: GPU Graph Analytics

    Full text link
    For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs, have presented two significant challenges to developing a programmable high-performance graph library. "Gunrock", our graph-processing system designed specifically for the GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. We characterize the performance of various optimization strategies and evaluate Gunrock's overall performance on different GPU architectures on a wide range of graph primitives that span from traversal-based algorithms and ranking algorithms, to triangle counting and bipartite-graph-based algorithms. The results show that on a single GPU, Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives and CPU shared-memory graph libraries such as Ligra and Galois, and better performance than any other GPU high-level graph library.Comment: 52 pages, invited paper to ACM Transactions on Parallel Computing (TOPC), an extended version of PPoPP'16 paper "Gunrock: A High-Performance Graph Processing Library on the GPU

    OBDD-Based Representation of Interval Graphs

    Full text link
    A graph G=(V,E)G = (V,E) can be described by the characteristic function of the edge set χE\chi_E which maps a pair of binary encoded nodes to 1 iff the nodes are adjacent. Using \emph{Ordered Binary Decision Diagrams} (OBDDs) to store χE\chi_E can lead to a (hopefully) compact representation. Given the OBDD as an input, symbolic/implicit OBDD-based graph algorithms can solve optimization problems by mainly using functional operations, e.g. quantification or binary synthesis. While the OBDD representation size can not be small in general, it can be provable small for special graph classes and then also lead to fast algorithms. In this paper, we show that the OBDD size of unit interval graphs is O( V /log V )O(\ | V \ | /\log \ | V \ |) and the OBDD size of interval graphs is $O(\ | V \ | \log \ | V \ |)whichbothimproveaknownresultfromNunkesserandWoelfel(2009).Furthermore,wecanshowthatusingourvariableorderandnodelabelingforintervalgraphstheworstcaseOBDDsizeis which both improve a known result from Nunkesser and Woelfel (2009). Furthermore, we can show that using our variable order and node labeling for interval graphs the worst-case OBDD size is \Omega(\ | V \ | \log \ | V \ |).Weusethestructureoftheadjacencymatricestoprovethesebounds.Thismethodmaybeofindependentinterestandcanbeappliedtoothergraphclasses.Wealsodevelopamaximummatchingalgorithmonunitintervalgraphsusing. We use the structure of the adjacency matrices to prove these bounds. This method may be of independent interest and can be applied to other graph classes. We also develop a maximum matching algorithm on unit interval graphs using O(\log \ | V \ |)operationsandacoloringalgorithmforunitandgeneralintervalsgraphsusing operations and a coloring algorithm for unit and general intervals graphs using O(\log^2 \ | V \ |)$ operations and evaluate the algorithms empirically.Comment: 29 pages, accepted for 39th International Workshop on Graph-Theoretic Concepts 201
    corecore