Search CORE

54 research outputs found

A Comparison of Parallel Graph Processing Implementations

Author: Norris Boyana
Pollard Samuel
Publication venue
Publication date: 16/05/2017
Field of study

The rapidly growing number of large network analysis problems has led to the emergence of many parallel and distributed graph processing systems---one survey in 2014 identified over 80. Since then, the landscape has evolved; some packages have become inactive while more are being developed. Determining the best approach for a given problem is infeasible for most developers. To enable easy, rigorous, and repeatable comparison of the capabilities of such systems, we present an approach and associated software for analyzing the performance and scalability of parallel, open-source graph libraries. We demonstrate our approach on five graph processing packages: GraphMat, the Graph500, the Graph Algorithm Platform Benchmark Suite, GraphBIG, and PowerGraph using synthetic and real-world datasets. We examine previously overlooked aspects of parallel graph processing performance such as phases of execution and energy usage for three algorithms: breadth first search, single source shortest paths, and PageRank and compare our results to Graphalytics.Comment: 10 pages, 10 figures, Submitted to EuroPar 2017 and rejected. Revised and submitted to IEEE Cluster 201

arXiv.org e-Print Archive

Crossref

Distributed Triangle Counting in the Graphulo Matrix Math Library

Author: Hutchison Dylan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/09/2017
Field of study

Triangle counting is a key algorithm for large graph analysis. The Graphulo library provides a framework for implementing graph algorithms on the Apache Accumulo distributed database. In this work we adapt two algorithms for counting triangles, one that uses the adjacency matrix and another that also uses the incidence matrix, to the Graphulo library for server-side processing inside Accumulo. Cloud-based experiments show a similar performance profile for these different approaches on the family of power law Graph500 graphs, for which data skew increasingly bottlenecks. These results motivate the design of skew-aware hybrid algorithms that we propose for future work.Comment: Honorable mention in the 2017 IEEE HPEC's Graph Challeng

arXiv.org e-Print Archive

Crossref

The LDBC Graphalytics Benchmark

Author: Anderson Michael
Boncz Peter
Capotă Mihai
Chafi Hassan
Depner Siegfried
Hegeman Tim
Heldens Stijn
Iosup Alexandru
Manhardt Thomas
Musaafir Ahmed
Nai Lifeng
Ngai Wing Lung
Pérez Arnau Prat
Sundaram Narayanan
Szárnyas Gábor
Tănase Ilie Gabriel
Uta Alexandru
Xia Yinglong
Publication venue
Publication date: 15/02/2023
Field of study

In this document, we describe LDBC Graphalytics, an industrial-grade benchmark for graph analysis platforms. The main goal of Graphalytics is to enable the fair and objective comparison of graph analysis platforms. Due to the diversity of bottlenecks and performance issues such platforms need to address, Graphalytics consists of a set of selected deterministic algorithms for full-graph analysis, standard graph datasets, synthetic dataset generators, and reference output for validation purposes. Its test harness produces deep metrics that quantify multiple kinds of systems scalability, weak and strong, and robustness, such as failures and performance variability. The benchmark also balances comprehensiveness with runtime necessary to obtain the deep metrics. The benchmark comes with open-source software for generating performance data, for validating algorithm results, for monitoring and sharing performance data, and for obtaining the final benchmark result as a standard performance report

arXiv.org e-Print Archive

Performance Introspection of Graph Databases

Author: Macko Peter
Margo Daniel Wyatt
Seltzer Margo I.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

The explosion of graph data in social and biological networks, recommendation systems, provenance databases, etc. makes graph storage and processing of paramount importance. We present a performance introspection framework for graph databases, PIG, which provides both a toolset and methodology for understanding graph database performance. PIG consists of a hierarchical collection of benchmarks that compose to produce performance models; the models provide a way to illuminate the strengths and weaknesses of a particular implementation. The suite has three layers of benchmarks: primitive operations, composite access patterns, and graph algorithms. While the framework could be used to compare different graph database systems, its primary goal is to help explain the observed performance of a particular system. Such introspection allows one to evaluate the degree to which systems exploit their knowledge of graph access patterns. We present both the PIG methodology and infrastructure and then demonstrate its efficacy by analyzing the popular Neo4j and DEX graph databases.Engineering and Applied Science

CiteSeerX

Crossref

Harvard University - DASH

Computing methods for parallel processing and analysis on complex networks

Author: Vázquez Benítez Luis Andrés
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2015
Field of study

Nowadays to solve some problems is required to model complex systems to simulate and understand its behavior. A good example of one of those complex systems is the Facebook Social Network, this system represents people and their relationships, Other example, the Internet composed by a vast number of servers, computers, modems and routers, All Science field (physics, economics political, and so on) have complex systems which are complex because of the big volume of data required to represent them and their fast change on their structure Analyze the behavior of these complex systems is important to create simulations or discover dynamics over it with main goal of understand how it works. Some complex systems cannot be easily modeled; We can begin by analyzing their structure, this is possible creating a network model, Mapping the problem´s entities and the relations between them. Some popular analysis over the structure of a network are: • The Community Detection – discover how their entities are grouped • Identify the most important entities – measure the node´s influence over the network • Features over whole network like – the diameter, number of triangles, clustering coefficient, and the shortest path between two entities. Multiple algorithms have been created to give a result to these analyses over the network model although if they are executed by one machine take a lot of time to complete the task or may not be executed due to machine limitation resources. As more demanding applications have been appearing to process the algorithms of these type of analysis, several parallel programming models and different kind of hardware architecture have been created to deal with the big input of data, reduce the time execution, save power consumption and enhance the efficiency in the computation in each machine also taking in mine the application requirements. Parallelize these algorithms are a challenge due to: • We need to analyze data dependence to implement a parallel version of the algorithm always taking in mine the scalability and the performance of the code. • Create a implementation of the algorithm for one parallel programming model like MapReduce (Apache Hadoop), RDD (Apache Spark), Pregel(Apache Giraph) these oriented to bigdata or HPC models how MPI + OpenMP , OmpSS or CUDA. • Distribute the data input over the processing platform for each node or offload it into accelerators such as GPU or FPGA and so on. • Store the data input and store the result of the processing requires techniques of Distribute file systems(HDFS), distribute NoSQL Data Bases (Object Data Bases, Graph Data Bases, Document Data Bases) or traditional relational Data Bases(oracle, SQL server). In this Master Thesis, we decided create Graph processing using Apache bigdata Tools mainly creating testing over MareNostrum III and the Amazon cloud for some Community Detection Algorithms using SNAP Graphs with ground-truth communities. Creating a comparative between their parallel computational time execution and scalability

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

LDBC Graphalytics: A Benchmark for Large-Scale Graph Analysis on Parallel and Distributed Platforms

Author: Alexandru Iosup
Arnau Prat-Pérez
Gabriel Tȃnase
Hassan Chafi
Michael Anderson
Mihai Capotȃ
Nai ⊕ Peter Boncz
Narayanan Sundaram
Ngai △ Stijn Heldens
Thomas Manhardt
Tim Hegeman
Wing Lung
Yinglong Xia
⊗ Lifeng
⊙ Ilie
Publication venue
Publication date: 06/03/2020
Field of study

ABSTRACT In this paper we introduce LDBC Graphalytics, a new industrial-grade benchmark for graph analysis platforms. It consists of six deterministic algorithms, standard datasets, synthetic dataset generators, and reference output, that enable the objective comparison of graph analysis platforms. Its test harness produces deep metrics that quantify multiple kinds of system scalability, such as horizontal/vertical and weak/strong, and of robustness, such as failures and performance variability. The benchmark comes with open-source software for generating data and monitoring performance. We describe and analyze six implementations of the benchmark (three from the community, three from the industry), providing insights into the strengths and weaknesses of the platforms. Key to our contribution, vendors perform the tuning and benchmarking of their platforms

CiteSeerX

Graphulo Implementation of Server-Side Sparse Matrix Multiply in the Accumulo Database

Author: Fuchs Adam
Gadepally Vijay
Hutchison Dylan
Kepner Jeremy
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/08/2015
Field of study

The Apache Accumulo database excels at distributed storage and indexing and is ideally suited for storing graph data. Many big data analytics compute on graph data and persist their results back to the database. These graph calculations are often best performed inside the database server. The GraphBLAS standard provides a compact and efficient basis for a wide range of graph applications through a small number of sparse matrix operations. In this article, we implement GraphBLAS sparse matrix multiplication server-side by leveraging Accumulo's native, high-performance iterators. We compare the mathematics and performance of inner and outer product implementations, and show how an outer product implementation achieves optimal performance near Accumulo's peak write rate. We offer our work as a core component to the Graphulo library that will deliver matrix math primitives for graph analytics within Accumulo.Comment: To be presented at IEEE HPEC 2015: http://www.ieee-hpec.org

arXiv.org e-Print Archive

Crossref

Stress-testing clouds for big data applications

Author: Sutii A.
Publication venue
Publication date: 01/01/2013
Field of study

Repository TU/e

Pure OAI Repository

Adaptation, deployment and evaluation of a railway simulator in cloud environments

Author: Caíno Lores Silvina
Publication venue
Publication date: 14/07/2014
Field of study

Many scientific areas make extensive use of computer simulations to study realworld processes. As they become more complex and resource-intensive, traditional programming paradigms running on supercomputers have shown to be limited by their hardware resources. The Cloud and its elastic nature has been increasingly seen as a valid alternative for simulation execution, as it aims to provide virtually infinite resources, thus unlimited scalability. In order to bene t from this, simulators must be adapted to this paradigm since cloud migration tends to add virtualization and communication overhead. This work has the main objective of migrating a power consumption railway simulator to the Cloud, with minimal impact in the original code and preserving performance. We propose a data-centric adaptation based in MapReduce to distribute the simulation load across several nodes while minimising data transmission. We deployed our solution on an Amazon EC2 virtual cluster and measured its performance. We did the same in in our local cluster to compare the solution's performance against the original application when the Cloud's overhead is not present. Our tests show that the resulting application is highly scalable and shows a better overall performance regarding the original simulator in both environments. This document summarises the author's work during the whole adaptation development process .Ingeniería Informátic

Universidad Carlos III de Madrid e-Archivo