9 research outputs found

    Graphs, Matrices, and the GraphBLAS: Seven Good Reasons

    Get PDF
    The analysis of graphs has become increasingly important to a wide range of applications. Graph analysis presents a number of unique challenges in the areas of (1) software complexity, (2) data complexity, (3) security, (4) mathematical complexity, (5) theoretical analysis, (6) serial performance, and (7) parallel performance. Implementing graph algorithms using matrix-based approaches provides a number of promising solutions to these challenges. The GraphBLAS standard (istc- bigdata.org/GraphBlas) is being developed to bring the potential of matrix based graph algorithms to the broadest possible audience. The GraphBLAS mathematically defines a core set of matrix-based graph operations that can be used to implement a wide class of graph algorithms in a wide range of programming environments. This paper provides an introduction to the GraphBLAS and describes how the GraphBLAS can be used to address many of the challenges associated with analysis of graphs.Comment: 10 pages; International Conference on Computational Science workshop on the Applications of Matrix Computational Methods in the Analysis of Modern Dat

    One-Sided Communication for High Performance Computing Applications

    Get PDF
    Thesis (Ph.D.) - Indiana University, Computer Sciences, 2009Parallel programming presents a number of critical challenges to application developers. Traditionally, message passing, in which a process explicitly sends data and another explicitly receives the data, has been used to program parallel applications. With the recent growth in multi-core processors, the level of parallelism necessary for next generation machines is cause for concern in the message passing community. The one-sided programming paradigm, in which only one of the two processes involved in communication actively participates in message transfer, has seen increased interest as a potential replacement for message passing. One-sided communication does not carry the heavy per-message overhead associated with modern message passing libraries. The paradigm offers lower synchronization costs and advanced data manipulation techniques such as remote atomic arithmetic and synchronization operations. These combine to present an appealing interface for applications with random communication patterns, which traditionally present message passing implementations with difficulties. This thesis presents a taxonomy of both the one-sided paradigm and of applications which are ideal for the one-sided interface. Three case studies, based on real-world applications, are used to motivate both taxonomies and verify the applicability of the MPI one-sided communication and Cray SHMEM one-sided interfaces to real-world problems. While our results show a number of short-comings with existing implementations, they also suggest that a number of applications could benefit from the one-sided paradigm. Finally, an implementation of the MPI one-sided interface within Open MPI is presented, which provides a number of unique performance features necessary for efficient use of the one-sided programming paradigm

    Reconfigurable computing for large-scale graph traversal algorithms

    Get PDF
    This thesis proposes a reconfigurable computing approach for supporting parallel processing in large-scale graph traversal algorithms. Our approach is based on a reconfigurable hardware architecture which exploits the capabilities of both FPGAs (Field-Programmable Gate Arrays) and a multi-bank parallel memory subsystem. The proposed methodology to accelerate graph traversal algorithms has been applied to three case studies, revealing that application-specific hardware customisations can benefit performance. A summary of our four contributions is as follows. First, a reconfigurable computing approach to accelerate large-scale graph traversal algorithms. We propose a reconfigurable hardware architecture which decouples computation and communication while keeping multiple memory requests in flight at any given time, taking advantage of the high bandwidth of multi-bank memory subsystems. Second, a demonstration of the effectiveness of our approach through two case studies: the breadth-first search algorithm, and a graphlet counting algorithm from bioinformatics. Both case studies involve graph traversal, but each of them adopts a different graph data representation. Third, a method for using on-chip memory resources in FPGAs to reduce off-chip memory accesses for accelerating graph traversal algorithms, through a case-study of the All-Pairs Shortest-Paths algorithm. This case study has been applied to process human brain network data. Fourth, an evaluation of an approach based on instruction-set extension for FPGA design against many-core GPUs (Graphics Processing Units), based on a set of benchmarks with different memory access characteristics. It is shown that while GPUs excel at streaming applications, the proposed approach can outperform GPUs in applications with poor locality characteristics, such as graph traversal problems.Open Acces

    Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable

    Full text link
    There has been significant recent interest in parallel graph processing due to the need to quickly analyze the large graphs available today. Many graph codes have been designed for distributed memory or external memory. However, today even the largest publicly-available real-world graph (the Hyperlink Web graph with over 3.5 billion vertices and 128 billion edges) can fit in the memory of a single commodity multicore server. Nevertheless, most experimental work in the literature report results on much smaller graphs, and the ones for the Hyperlink graph use distributed or external memory. Therefore, it is natural to ask whether we can efficiently solve a broad class of graph problems on this graph in memory. This paper shows that theoretically-efficient parallel graph algorithms can scale to the largest publicly-available graphs using a single machine with a terabyte of RAM, processing them in minutes. We give implementations of theoretically-efficient parallel algorithms for 20 important graph problems. We also present the optimizations and techniques that we used in our implementations, which were crucial in enabling us to process these large graphs quickly. We show that the running times of our implementations outperform existing state-of-the-art implementations on the largest real-world graphs. For many of the problems that we consider, this is the first time they have been solved on graphs at this scale. We have made the implementations developed in this work publicly-available as the Graph-Based Benchmark Suite (GBBS).Comment: This is the full version of the paper appearing in the ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), 201

    Graph analytics on modern massively parallel systems

    Get PDF
    Graphs provide a very flexible abstraction for understanding and modeling complex systems in many fields such as physics, biology, neuroscience, engineering, and social science. Only in the last two decades, with the advent of Big Data era, supercomputers equipped by accelerators –i.e., Graphics Processing Unit (GPUs)–, advanced networking, and highly parallel file systems have been used to analyze graph properties such as reachability, diameter, connected components, centrality, and clustering coefficient. Today graphs of interest may be composed by millions, sometimes billions, of nodes and edges and exhibit a highly irregular structure. As a consequence, the design of efficient and scalable graph algorithms is an extraordinary challenge due to irregular communication and memory access patterns, high synchronization costs, and lack of data locality. In the present dissertation, we start off with a brief and gentle introduction for the reader to graph analytics and massively parallel systems. In particular, we present the intersection between graph analytics and parallel architectures in the current state-of-the-art and discuss the challenges encountered when solving such problems on large-scale graphs on these architectures (Chapter 1). In Chapter 2, some preliminary definitions and graph-theoretical notions are provided together with a description of the synthetic graphs used in the literature to model real-world networks. In Chapters 3-5, we present and tackle three different relevant problems in graph analysis: reachability (Chapter 3), Betweenness Centrality (Chapter 4), and clustering coefficient (Chapter 5). In detail, Chapter 3 tackles reachability problems by providing two scalable algorithms and implementations which efficiently solve st-connectivity problems on very large-scale graphs Chapter 4 considers the problem of identifying most relevant nodes in a network which plays a crucial role in several applications, including transportation and communication networks, social network analysis, and biological networks. In particular, we focus on a well-known centrality metrics, namely Betweenness Centrality (BC), and present two different distributed algorithms for the BC computation on unweighted and weighted graphs. For unweighted graphs, we present a new communication-efficient algorithm based on the combination of bi-dimensional (2D) decomposition and multi-level parallelism. Furthermore, new algorithms which exploit the underlying graph topology to reduce the time and space usage of betweenness centrality computations are described as well. Concerning weighted graphs, we provide a scalable algorithm based on an algebraic formulation of the problem. Finally, thorough comprehensive experimental results on synthetic and real- world large-scale graphs, we show that the proposed techniques are effective in practice and achieve significant speedups against state-of-the-art solutions. Chapter 5 considers clustering coefficients problem. Similarly to Betweenness Centrality, it is a fundamental tool in network analysis, as it specifically measures how nodes tend to cluster together in a network. In the chapter, we first extend caching techniques to Remote Memory Access (RMA) operations on distributed-memory system. The caching layer is mainly designed to avoid inter-node communications in order to achieve similar benefits for irregular applications as communication-avoiding algorithms. We also show how cached RMA is able to improve the performance of a new distributed asynchronous algorithm for the computation of local clustering coefficients. Finally, Chapter 6 contains a brief summary of the key contributions described in the dissertation and presents potential future directions of the work

    Graph-based Analysis of Dynamic Systems

    Get PDF
    The analysis of dynamic systems provides insights into their time-dependent characteristics. This enables us to monitor, evaluate, and improve systems from various areas. They are often represented as graphs that model the system's components and their relations. The analysis of the resulting dynamic graphs yields great insights into the system's underlying structure, its characteristics, as well as properties of single components. The interpretation of these results can help us understand how a system works and how parameters influence its performance. This knowledge supports the design of new systems and the improvement of existing ones. The main issue in this scenario is the performance of analyzing the dynamic graph to obtain relevant properties. While various approaches have been developed to analyze dynamic graphs, it is not always clear which one performs best for the analysis of a specific graph. The runtime also depends on many other factors, including the size and topology of the graph, the frequency of changes, and the data structures used to represent the graph in memory. While the benefits and drawbacks of many data structures are well-known, their runtime is hard to predict when used for the representation of dynamic graphs. Hence, tools are required to benchmark and compare different algorithms for the computation of graph properties and data structures for the representation of dynamic graphs in memory. Based on deeper insights into their performance, new algorithms can be developed and efficient data structures can be selected. In this thesis, we present four contributions to tackle these problems: A benchmarking framework for dynamic graph analysis, novel algorithms for the efficient analysis of dynamic graphs, an approach for the parallelization of dynamic graph analysis, and a novel paradigm to select and adapt graph data structures. In addition, we present three use cases from the areas of social, computer, and biological networks to illustrate the great insights provided by their graph-based analysis. We present a new benchmarking framework for the analysis of dynamic graphs, the Dynamic Network Analyzer (DNA). It provides tools to benchmark and compare different algorithms for the analysis of dynamic graphs as well as the data structures used to represent them in memory. DNA supports the development of new algorithms and the automatic verification of their results. Its visualization component provides different ways to represent dynamic graphs and the results of their analysis. We introduce three new stream-based algorithms for the analysis of dynamic graphs. We evaluate their performance on synthetic as well as real-world dynamic graphs and compare their runtimes to snapshot-based algorithms. Our results show great performance gains for all three algorithms. The new stream-based algorithm StreaM_k, which counts the frequencies of k-vertex motifs, achieves speedups up to 19,043 x for synthetic and 2882 x for real-world datasets. We present a novel approach for the distributed processing of dynamic graphs, called parallel Dynamic Graph Analysis (pDNA). To analyze a dynamic graph, the work is distributed by a partitioner that creates subgraphs and assigns them to workers. They compute the properties of their respective subgraph using standard algorithms. Their results are used by the collator component to merge them to the properties of the original graph. We evaluate the performance of pDNA for the computation of five graph properties on two real-world dynamic graphs with up to 32 workers. Our approach achieves great speedups, especially for the analysis of complex graph measures. We introduce two novel approaches for the selection of efficient graph data structures. The compile-time approach estimates the workload of an analysis after an initial profiling phase and recommends efficient data structures based on benchmarking results. It achieves speedups of up to 5.4 x over baseline data structure configurations for the analysis of real-word dynamic graphs. The run-time approach monitors the workload during analysis and exchanges the graph representation if it finds a configuration that promises to be more efficient for the current workload. Compared to baseline configurations, it achieves speedups up to 7.3 x for the analysis of a synthetic workload. Our contributions provide novel approaches for the efficient analysis of dynamic graphs and tools to further investigate the trade-offs between different factors that influence the performance.:1 Introduction 2 Notation and Terminology 3 Related Work 4 DNA - Dynamic Network Analyzer 5 Algorithms 6 Parallel Dynamic Network Analysis 7 Selection of Efficient Graph Data Structures 8 Use Cases 9 Conclusion A DNA - Dynamic Network Analyzer B Algorithms C Selection of Efficient Graph Data Structures D Parallel Dynamic Network Analysis E Graph-based Intrusion Detection System F Molecular Dynamic