Search CORE

87 research outputs found

Everything You Always Wanted to Know about Multicore Graph Processing but Were Afraid to Ask

Author: Lepers Baptiste Joseph Eustache
Malicevic Jasmina
Zwaenepoel Willy
Publication venue: USENIX Association
Publication date: 02/06/2017
Field of study

Graph processing systems are used in a wide variety of fields, ranging from biology to social networks, and a large number of such systems have been described in the recent literature. We perform a systematic comparison of various techniques proposed to speed up in-memory multicore graph processing. In addition, we take an end- to-end view of execution time, including not only algorithm execution time, but also pre-processing time and the time to load the graph input data from storage. More specifically, we study various data structures to represent the graph in memory, various approaches to pre-processing and various ways to structure the graph computation. We also investigate approaches to improve cache locality, synchronization, and NUMA-awareness. In doing so, we take our inspiration from a number of graph processing systems, and implement the techniques they propose in a single system. We then selectively enable different techniques, allowing us to assess their benefits in isolation and independent of unrelated implementation considerations. Our main observation is that the cost of pre-processing in many circumstances dominates the cost of algorithm execution, calling into question the benefits of proposed algorithmic optimizations that rely on extensive pre- processing. Equally surprising, using radix sort turns out to be the most efficient way of pre-processing the graph input data into adjacency lists, when the graph in- put data is already in memory or is loaded from fast storage. Furthermore, we adapt a technique developed for out-of-core graph processing, and show that it significantly improves cache locality. Finally, we demonstrate that NUMA-awareness and its attendant pre-processing costs are beneficial only on large machines and for certain algorithms

Infoscience - École polytechnique fédérale de Lausanne

Characterization of the Impact of Hardware Islands on OLTP

Author: Ailamaki Anastasia
De Oliveira Branco Miguel Sérgio
Pandis Ippokratis
Porobic Danica
Tozun Pinar
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/11/2015
Field of study

Modern hardware is abundantly parallel and increasingly heterogeneous. The numerous processing cores have non-uniform access latencies to the main memory and processor caches, which causes variability in the communication costs. Unfortunately, database systems mostly assume that all processing cores are the same and that microarchitecture differences are not significant enough to appear in critical database execution paths. As we demonstrate in this paper, however, non-uniform core topology does appear in the critical path and conventional database architectures achieve suboptimal and even worse, unpredictable performance. We perform a detailed performance analysis of OLTP deployments in servers with multiple cores per CPU (multicore) and multiple CPUs per server (multisocket). We compare different database deployment strategies where we vary the number and size of independent database instances running on a single server, from a single shared-everything instance to fine-grained shared-nothing configurations. We quantify the impact of non-uniform hardware on various deployments by (a) examining how efficiently each deployment uses the available hardware resources and (b) measuring the impact of distributed transactions and skewed requests on different workloads. We show that no strategy is optimal for all cases and that the best choice depends on the combination of hardware topology and workload characteristics. Finally, we argue that transaction processing systems must be aware of the hardware topology in order to achieve predictably high performance

Infoscience - École polytechnique fédérale de Lausanne

Analyzing the impact of system architecture on the scalability of OLTP engines for high-contention workloads

Author: Adya A.
Appuswamy R.
Bernstein P. A.
Cowling J.
Harizopoulos S.
Lozi J.-P.
Mu S.
Narula N.
Publication venue: 'VLDB Endowment'
Publication date
Field of study

Crossref

GraphM : an efficient storage system for high throughput of concurrent graph processing

Author: Blum Johannes
Bronson Nathan
Gonzalez Joseph E.
Kumar Pradeep
Kyrola Aapo
Liu Hang
Malicevic Jasmina
Meyer Ulrich
Shi Jiaxin
Shun Julian
Vora Keval
Zhang Yu
Zheng Da
Zhu Xiaowei
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

With the rapidly growing demand of graph processing in the real world, a large number of iterative graph processing jobs run concurrently on the same underlying graph. However, the storage engines of existing graph processing frameworks are mainly designed for running an individual job. Our studies show that they are inefficient when running concurrent jobs due to the redundant data storage and access overhead. To cope with this issue, we develop an efficient storage system, called GraphM. It can be integrated into the existing graph processing systems to efficiently support concurrent iterative graph processing jobs for higher throughput by fully exploiting the similarities of the data accesses between these concurrent jobs. GraphM regularizes the traversing order of the graph partitions for concurrent graph processing jobs by streaming the partitions into the main memory and the Last-Level Cache (LLC) in a common order, and then processes the related jobs concurrently in a novel fine-grained synchronization. In this way, the concurrent jobs share the same graph structure data in the LLC/memory and also the data accesses to the graph, so as to amortize the storage consumption and the data access overhead. To demonstrate the efficiency of GraphM, we plug it into state-of-the-art graph processing systems, including GridGraph, GraphChi, PowerGraph, and Chaos. Experiments results show that GraphM improves the throughput by 1.73~13 times

Crossref

Warwick Research Archives Portal Repository