17 research outputs found

    Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable

    Full text link
    There has been significant recent interest in parallel graph processing due to the need to quickly analyze the large graphs available today. Many graph codes have been designed for distributed memory or external memory. However, today even the largest publicly-available real-world graph (the Hyperlink Web graph with over 3.5 billion vertices and 128 billion edges) can fit in the memory of a single commodity multicore server. Nevertheless, most experimental work in the literature report results on much smaller graphs, and the ones for the Hyperlink graph use distributed or external memory. Therefore, it is natural to ask whether we can efficiently solve a broad class of graph problems on this graph in memory. This paper shows that theoretically-efficient parallel graph algorithms can scale to the largest publicly-available graphs using a single machine with a terabyte of RAM, processing them in minutes. We give implementations of theoretically-efficient parallel algorithms for 20 important graph problems. We also present the optimizations and techniques that we used in our implementations, which were crucial in enabling us to process these large graphs quickly. We show that the running times of our implementations outperform existing state-of-the-art implementations on the largest real-world graphs. For many of the problems that we consider, this is the first time they have been solved on graphs at this scale. We have made the implementations developed in this work publicly-available as the Graph-Based Benchmark Suite (GBBS).Comment: This is the full version of the paper appearing in the ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), 201

    Parallel Algorithms for Counting Problems on Graphs Using Graphics Processing Units

    Get PDF
    The availability of Graphics Processing Units (GPUs) with multicore architecture have enabled parallel computations using extensive multi-threading. Recent advancements in computer hardware have led to the usage of graphics processors for solving general purpose problems. Using GPUs for computation is a highly efficient and low-cost alternative as compared to currently available multicore Central Processing Units (CPUs). Also, in the past decade there has been tremendous growth in the World Wide Web and Online Social Networks. Social networking sites such as Facebook, Twitter and LinkedIn, with millions of users are a huge source of data. These data sets can be used for research in the fields of anthropology, social psychology, economics among others. Our research focuses on converting real-world problems into graph theoretic problems and using GPUs to solve them. The graph problems that we focus on in our research involve counting the number of subgraphs that satisfy a given property. For example, given a graph G=(V,E) and an integer k<=|V|, we provide algorithms to count the number of: a) connected subgraphs of size k; b) cliques of size k; and c) independent sets of size k, and other similar problems. Also, properties that are affected by the dynamic nature of the graphs i.e., addition or removal of edges or nodes, for example change in the number of triangles and connected components in the graph, are also studied. Sequential access to global memory and contention at the size-limited shared memory have been main impediments to fully exploiting potential performance in GPUs. Therefore, we propose novel memory storage and retrieval methods, based on using search techniques on graphs and converting it into trees, that enable parallel graph computations to overcome the above issues. We also analyze and utilize primitives such as memory access coalescing and avoiding partition camping that offset the increase in access latency of using a slower but larger global memory. In addition, we introduce graph compression techniques that further reduce memory requirements and overheads. Our experimental results for the GPU implementation show a significant speedup over the CPU counterpart for the problems described above

    Parallel Breadth-First Search on Distributed Memory Systems

    Full text link

    High performance graph analysis on parallel architectures

    Get PDF
    PhD ThesisOver the last decade pharmacology has been developing computational methods to enhance drug development and testing. A computational method called network pharmacology uses graph analysis tools to determine protein target sets that can lead on better targeted drugs for diseases as Cancer. One promising area of network-based pharmacology is the detection of protein groups that can produce better e ects if they are targeted together by drugs. However, the e cient prediction of such protein combinations is still a bottleneck in the area of computational biology. The computational burden of the algorithms used by such protein prediction strategies to characterise the importance of such proteins consists an additional challenge for the eld of network pharmacology. Such computationally expensive graph algorithms as the all pairs shortest path (APSP) computation can a ect the overall drug discovery process as needed network analysis results cannot be given on time. An ideal solution for these highly intensive computations could be the use of super-computing. However, graph algorithms have datadriven computation dictated by the structure of the graph and this can lead to low compute capacity utilisation with execution times dominated by memory latency. Therefore, this thesis seeks optimised solutions for the real-world graph problems of critical node detection and e ectiveness characterisation emerged from the collaboration with a pioneer company in the eld of network pharmacology as part of a Knowledge Transfer Partnership (KTP) / Secondment (KTS). In particular, we examine how genetic algorithms could bene t the prediction of protein complexes where their removal could produce a more e ective 'druggable' impact. Furthermore, we investigate how the problem of all pairs shortest path (APSP) computation can be bene ted by the use of emerging parallel hardware architectures as GPU- and FPGA- desktop-based accelerators. In particular, we address the problem of critical node detection with the development of a heuristic search method. It is based on a genetic algorithm that computes optimised node combinations where their removal causes greater impact than common impact analysis strategies. Furthermore, we design a general pattern for parallel network analysis on multi-core architectures that considers graph's embedded properties. It is a divide and conquer approach that decomposes a graph into smaller subgraphs based on its strongly connected components and computes the all pairs shortest paths concurrently on GPU. Furthermore, we use linear algebra to design an APSP approach based on the BFS algorithm. We use algebraic expressions to transform the problem of path computation to multiple independent matrix-vector multiplications that are executed concurrently on FPGA. Finally, we analyse how the optimised solutions of perturbation analysis and parallel graph processing provided in this thesis will impact the drug discovery process.This research was part of a Knowledge Transfer Partnership (KTP) and Knowledge Transfer Secondment (KTS) between e-therapeutics PLC and Newcastle University. It was supported as a collaborative project by e-therapeutics PLC and Technology Strategy boar

    Models for Parallel Computation in Multi-Core, Heterogeneous, and Ultra Wide-Word Architectures

    Get PDF
    Multi-core processors have become the dominant processor architecture with 2, 4, and 8 cores on a chip being widely available and an increasing number of cores predicted for the future. In addition, the decreasing costs and increasing programmability of Graphic Processing Units (GPUs) have made these an accessible source of parallel processing power in general purpose computing. Among the many research challenges that this scenario has raised are the fundamental problems related to theoretical modeling of computation in these architectures. In this thesis we study several aspects of computation in modern parallel architectures, from modeling of computation in multi-cores and heterogeneous platforms, to multi-core cache management strategies, through the proposal of an architecture that exploits bit-parallelism on thousands of bits. Observing that in practice multi-cores have a small number of cores, we propose a model for low-degree parallelism for these architectures. We argue that assuming a small number of processors (logarithmic in a problem's input size) simplifies the design of parallel algorithms. We show that in this model a large class of divide-and-conquer and dynamic programming algorithms can be parallelized with simple modifications to sequential programs, while achieving optimal parallel speedups. We further explore low-degree-parallelism in computation, providing evidence of fundamental differences in practice and theory between systems with a sublinear and linear number of processors, and suggesting a sharp theoretical gap between the classes of problems that are efficiently parallelizable in each case. Efficient strategies to manage shared caches play a crucial role in multi-core performance. We propose a model for paging in multi-core shared caches, which extends classical paging to a setting in which several threads share the cache. We show that in this setting traditional cache management policies perform poorly, and that any effective strategy must partition the cache among threads, with a partition that adapts dynamically to the demands of each thread. Inspired by the shared cache setting, we introduce the minimum cache usage problem, an extension to classical sequential paging in which algorithms must account for the amount of cache they use. This cache-aware model seeks algorithms with good performance in terms of faults and the amount of cache used, and has applications in energy efficient caching and in shared cache scenarios. The wide availability of GPUs has added to the parallel power of multi-cores, however, most applications underutilize the available resources. We propose a model for hybrid computation in heterogeneous systems with multi-cores and GPU, and describe strategies for generic parallelization and efficient scheduling of a large class of divide-and-conquer algorithms. Lastly, we introduce the Ultra-Wide Word architecture and model, an extension of the word-RAM model, that allows for constant time operations on thousands of bits in parallel. We show that a large class of existing algorithms can be implemented in the Ultra-Wide Word model, achieving speedups comparable to those of multi-threaded computations, while avoiding the more difficult aspects of parallel programming

    Concurrent Data Structures Using Multiword Compare and Swap

    Get PDF
    To maximize the performance of concurrent data structures, researchers have turned to highly complex fine-grained techniques. Resulting algorithms are often extremely difficult to understand and prove correct, allowing for highly cited works to contain correctness bugs that go undetected for long periods of time. This complexity is perceived as a necessary sacrifice: simpler, more general techniques cannot attain competitive performance with these fine-grained implementations. To challenge this perception, this work presents three data structures created using multi-word compare-and-swap (KCAS), version numbering, and double-collect searches that showcase the power of using a more coarse-grained approach. First, a novel lock-free binary search tree (BST) is presented that is both fully-internal and balanced, which is able to achieve competitive performance with the state-of-the-art fine-grained concurrent BSTs while being significantly simpler. Next, the first concurrent implementation of an Euler-tour data-structure is outlined, solving fully-dynamic graph connectivity. Finally, a KCAS variant of an (a,b)-tree implementation is presented, which shows significant performance improvements in certain workloads when compared to the original

    LIPIcs, Volume 244, ESA 2022, Complete Volume

    Get PDF
    LIPIcs, Volume 244, ESA 2022, Complete Volum

    36th International Symposium on Theoretical Aspects of Computer Science: STACS 2019, March 13-16, 2019, Berlin, Germany

    Get PDF

    Automatisches Zeichnen von Graphen für modellgetriebene Softwareentwicklung

    Get PDF
    As shown previously by Fuhrmann, there are several concepts for increasing the productivity of MDE by improving the practical handling of models. The automatic layout of graph-based models is a key enabler in this context. However, there is a striking contrast between the abundance of research results in the field of graph layout methods and the current state of graphical modeling tools, where only a tiny fraction of these results are ever adopted. This thesis aims to bridge this gap on three separate levels: specialized layout algorithms, configuration management, and software infrastructure. Regarding layout algorithms, here we focus on the layer-based approach. We examine its extension to include ports and hyperedges, which are essential features of certain kinds of graphs, e.g. data flow models. The main contribution is the handling of constraints on the positioning of ports, which is done mainly in the crossing minimization and edge routing phases. Hyperedges are represented with normal edges, simplifying their handling but introducing inaccuracies for counting crossings. A final extension discussed here is a sketch-driven approach for simple integration of user interactivity. An abstract layout is the selection of a layout algorithm with a mapping of its parameters to specific values. We discuss a new meta model allowing to specify the structure of a graph as well as its abstract layout and its concrete layout, i.e. positioning data computed by the layout algorithm. This forms a basis for efficient management of layout configurations. Furthermore, we investigate an evolutionary algorithm for searching the solution space of abstract layouts, taking readability criteria into account for evaluating solutions. The software infrastructure developed here targets the connection of arbitrary diagram viewers (front-ends) with arbitrary graph layout algorithms (back-ends). The main challenge is to find suitable abstractions that allow such generality and at the same time keep the complexity as low as possible. We discuss a possible realization based on the Eclipse platform, which is used by several modeling tools, e.g. the Graphical Modeling Framework. A web-based survey has been conducted among users of the layout infrastructure in order to evaluate to what extent the stated goals have been met. The overall feedback collected from this survey is very positive.Wie bereits von Fuhrmann gezeigt, kann die Produktivität modellgetriebener Softwareentwicklung durch zahlreiche Konzepte zur Verbesserung der praktischen Handhabung von Modellen erhöht werden. Dabei ist das automatische Layout graphenbasierter Modelle ein zentraler Schlüssel. Allerdings gibt es einen bemerkenswerten Kontrast zwischen der Fülle an Forschungsergebnissen im Bereich des Graphen-Layout und dem aktuellen Stand graphischer Modellierungswerkzeuge, bei denen nur ein kleiner Teil dieser Ergebnisse übernommen wird. Das Ziel dieser Arbeit ist diese Lücke auf drei separaten Ebenen zu überbrücken: spezialisierte Layout-Algorithmen, Verwaltung von Konfigurationen und Software-Infrastruktur. Im Bezug auf Layout-Algorithmen liegt der Schwerpunkt auf dem Layer-basierten Ansatz. Wir untersuchen dessen Erweiterung zur Unterstützung von Ports und Hyperkanten, was wesentliche Bestandteile bestimmter Arten von Graphen sind, z.B. Datenflussmodelle. Der Hauptbeitrag ist die Einbeziehung von Bedingungen für die Positionierung von Ports, vor allem während der Kreuzungsminimierung und der Kantenführungsphase. Hyperkanten werden durch normale Kanten repräsentiert, was deren Verarbeitung vereinfacht aber Ungenauigkeiten beim Zählen von Kreuzungen verursacht. Als letzte Erweiterung betrachten wir einen Sketch-basierten Ansatz für die einfache Integration von Nutzerinteraktivität. Ein abstraktes Layout ist die Auswahl eines Layout-Algorithmus zusammen mit einer Abbildung seiner Parameter auf konkrete Werte, während ein konkretes Layout Positionsdaten beschreibt, die von einem Algorithmus berechnet wurden. Wir diskutieren ein neues Metamodell, mit dem sowohl die Struktur als auch das abstrakte sowie das konkrete Layout eines Graphen spezifiziert werden kann. Dies bildet eine Grundlage für die effiziente Verwaltung von Layout-Konfigurationen. Zudem untersuchen wir einen evolutionären Algorithmus für die Suche im Lösungsraum abstrakter Layouts, wobei zur Bewertung von Lösungen Ästhetikkriterien ausgewertet werden. Die in dieser Arbeit entwickelte Software-Infrastruktur hat als Ziel, beliebige Graphen-basierte Diagramme (front-ends) mit beliebigen Layout-Algorithmen (back-ends) zu verbinden. Die größte Herausforderung dabei ist das Finden geeigneter Abstraktionen, die eine solche Allgemeingültigkeit erlauben und gleichzeitig die Komplexität so niedrig wie möglich halten. Wir betrachten eine mögliche Realisierung, die auf Eclipse basiert, eine von vielen Modellierungswerkzeugen verwendete Plattform. Eine Web-basierte Umfrage wurde unter Nutzern der Layout-Infrastruktur durchgeführt, um zu untersuchen inwieweit die gesteckten Ziele erfüllt worden sind. Die allgemeine Resonanz zu dieser Umfrage ist sehr positiv
    corecore