20 research outputs found

    Using Graph Properties to Speed-up GPU-based Graph Traversal: A Model-driven Approach

    Get PDF
    While it is well-known and acknowledged that the performance of graph algorithms is heavily dependent on the input data, there has been surprisingly little research to quantify and predict the impact the graph structure has on performance. Parallel graph algorithms, running on many-core systems such as GPUs, are no exception: most research has focused on how to efficiently implement and tune different graph operations on a specific GPU. However, the performance impact of the input graph has only been taken into account indirectly as a result of the graphs used to benchmark the system. In this work, we present a case study investigating how to use the properties of the input graph to improve the performance of the breadth-first search (BFS) graph traversal. To do so, we first study the performance variation of 15 different BFS implementations across 248 graphs. Using this performance data, we show that significant speed-up can be achieved by combining the best implementation for each level of the traversal. To make use of this data-dependent optimization, we must correctly predict the relative performance of algorithms per graph level, and enable dynamic switching to the optimal algorithm for each level at runtime. We use the collected performance data to train a binary decision tree, to enable high-accuracy predictions and fast switching. We demonstrate empirically that our decision tree is both fast enough to allow dynamic switching between implementations, without noticeable overhead, and accurate enough in its prediction to enable significant BFS speedup. We conclude that our model-driven approach (1) enables BFS to outperform state of the art GPU algorithms, and (2) can be adapted for other BFS variants, other algorithms, or more specific datasets

    Evaluación y punto de referencia para bases de datos orientadas a grafos

    Get PDF
    Las bases de datos orientadas a grafos (BDG) han adquirido popularidad dentro del análisis de datos masivos ya que proveen un rendimiento superior a la que se obtiene a través de una base de datos relacional en los escenarios en donde la alta conectividad entre datos se convierte en el principal componente para interpretar datos y obtener información de ellos para la toma de decisiones. Este trabajo de obtención de grado (TOG) tiene como objetivo desarrollar una metodología para comparar y evaluar de forma equitativa a distintos motores que se especializan en el manejo de grafos. En primera instancia se analizan los estudios relacionados a este tema que fungirán como soporte para nuestra investigación e identificar los puntos a mejorar para crear un proceso de evaluación y realizar el punto de referencia de bases de datos orientadas a grafos que son populares por el tiempo que llevan desarrollándose y siendo utilizadas en la industria y la academia, y otros que han surgido recientemente para mejorar las limitantes que hay en el mercado. El principal componente del trabajo es crear los pasos requeridos para ejecutar pruebas con distintos tipos de conjuntos de datos y algoritmos a un grupo de BDG. De forma posterior se ejecuta un caso de estudio en el cual se define un ambiente de validación homogéneo para todo sistema, donde se puedan tener las especificaciones de hardware y software para que todas las BDG puedan correr sin restricciones. De la misma manera se definen los aspectos a evaluar, que incluyen, pero no están limitados a las capacidades del lenguaje de consultas que proveen, la integración con otras plataformas o sistemas, y el soporte que cada una provee para la ejecución de algoritmos sobre grafos. La selección de los datos que se cargarán en las distintas plataformas debe considerar que tengan el formato adecuado que cada BDG soporta, y en caso contrario, analizar si los datos pueden ser convertidos para ser utilizados en la base de datos. Finalmente, se toman como caso de prueba las bases de datos basadas en grafos GraphDB, JanusGraph, Neo4j, y TigerGraph. El caso de estudio utiliza la metodología desarrollada a través de este trabajo para evaluar las bases de datos.ITESO, A. C.Consejo Nacional de Ciencia y Tecnologí

    JGraphT -- A Java library for graph data structures and algorithms

    Full text link
    Mathematical software and graph-theoretical algorithmic packages to efficiently model, analyze and query graphs are crucial in an era where large-scale spatial, societal and economic network data are abundantly available. One such package is JGraphT, a programming library which contains very efficient and generic graph data-structures along with a large collection of state-of-the-art algorithms. The library is written in Java with stability, interoperability and performance in mind. A distinctive feature of this library is the ability to model vertices and edges as arbitrary objects, thereby permitting natural representations of many common networks including transportation, social and biological networks. Besides classic graph algorithms such as shortest-paths and spanning-tree algorithms, the library contains numerous advanced algorithms: graph and subgraph isomorphism; matching and flow problems; approximation algorithms for NP-hard problems such as independent set and TSP; and several more exotic algorithms such as Berge graph detection. Due to its versatility and generic design, JGraphT is currently used in large-scale commercial, non-commercial and academic research projects. In this work we describe in detail the design and underlying structure of the library, and discuss its most important features and algorithms. A computational study is conducted to evaluate the performance of JGraphT versus a number of similar libraries. Experiments on a large number of graphs over a variety of popular algorithms show that JGraphT is highly competitive with other established libraries such as NetworkX or the BGL.Comment: Major Revisio

    Evaluation and Analysis of Distributed Graph-Parallel Processing Frameworks

    Get PDF
    A number of graph-parallel processing frameworks have been proposed to address the needs of processing complex and large-scale graph structured datasets in recent years. Although significant performance improvement made by those frameworks were reported, comparative advantages of each of these frameworks over the others have not been fully studied, which impedes the best utilization of those frameworks for a specific graph computing task and setting. In this work, we conducted a comparison study on parallel processing systems for large-scale graph computations in a systematic manner, aiming to reveal the characteristics of those systems in performing common graph algorithms with real-world datasets on the same ground. We selected three popular graph-parallel processing frameworks (Giraph, GPS and GraphLab) for the study and also include a representative general data-parallel computing system— Spark—in the comparison in order to understand how well a general data-parallel system can run graph problems. We applied basic performance metrics measuring speed, resource utilization, and scalability to answer a basic question of which graph-parallel processing platform is better suited for what applications and datasets. Three widely-used graph algorithms— clustering coefficient, shortest path length, and PageRank score—were used for benchmarking on the targeted computing systems.We ran those algorithms against three real world network datasets with diverse characteristics and scales on a research cluster and have obtained a number of interesting observations. For instance, all evaluated systems showed poor scalability (i.e., the runtime increases with more computing nodes) with small datasets likely due to communication overhead. Further, out of the evaluated graphparallel computing platforms, PowerGraph consistently exhibits better performance than others

    A Partition-centric Distributed Algorithm for Identifying Euler Circuits in Large Graphs

    Full text link
    Finding the Eulerian circuit in graphs is a classic problem, but inadequately explored for parallel computation. With such cycles finding use in neuroscience and Internet of Things for large graphs, designing a distributed algorithm for finding the Euler circuit is important. Existing parallel algorithms are impractical for commodity clusters and Clouds. We propose a novel partition-centric algorithm to find the Euler circuit, over large graphs partitioned across distributed machines and executed iteratively using a Bulk Synchronous Parallel (BSP) model. The algorithm finds partial paths and cycles within each partition, and refines these into longer paths by recursively merging the partitions. We describe the algorithm, analyze its complexity, validate it on Apache Spark for large graphs, and offer experimental results. We also identify memory bottlenecks in the algorithm and propose an enhanced design to address it.Comment: To appear in Proceedings of 5th IEEE International Workshop on High-Performance Big Data, Deep Learning, and Cloud Computing, In conjunction with The 33rd IEEE International Parallel and Distributed Processing Symposium (IPDPS 2019), Rio de Janeiro, Brazil, May 20th, 201
    corecore