1,818 research outputs found
Power-Efficient and Highly Scalable Parallel Graph Sampling using FPGAs
Energy efficiency is a crucial problem in data centers where big data is generally represented by directed or undirected graphs. Analysis of this big data graph is challenging due to volume and velocity of the data as well as irregular memory access patterns. Graph sampling is one of the most effective ways to reduce the size of graph while maintaining crucial characteristics. In this paper we present design and implementation of an FPGA based graph sampling method which is both time- and energy-efficient. This is in contrast to existing parallel approaches which include memory-distributed clusters, multicore and GPUs. Our strategy utilizes a novel graph data structure, that we call COPRA that allows time- and memory-efficient representation of graphs suitable for reconfigurable hardware such as FPGAs. Our experiments show that our proposed techniques are 2x faster and 3x more energy efficient as compared to serial CPU version of the algorithm. We further show that our proposed techniques give comparable speedups to GPU and multi-threaded CPU architecture while energy consumption is 10x less than GPU and 2x less than CPU
LEGaTO: first steps towards energy-efficient toolset for heterogeneous computing
LEGaTO is a three-year EU H2020 project which started in December 2017. The LEGaTO project will leverage task-based programming models to provide a software ecosystem for Made-in-Europe heterogeneous hardware composed of CPUs, GPUs, FPGAs and dataflow engines. The aim is to attain one order of magnitude energy savings from the edge to the converged cloud/HPC.Peer ReviewedPostprint (author's final draft
Parallel Sampling-Pipeline for Indefinite Stream of Heterogeneous Graphs using OpenCL for FPGAs
In the field of data science, a huge amount of data, generally represented as graphs, needs to be processed and analyzed. It is of utmost importance that this data be processed swiftly and efficiently to save time and energy. The volume and velocity of data, along with irregular access patterns in graph data structures, pose challenges in terms of analysis and processing. Further, a big chunk of time and energy is spent on analyzing these graphs on large compute clusters and/or data-centers. Filtering and refining of data using graph sampling techniques are one of the most effective ways to speed up the analysis. Efficient accelerators, such as FPGAs, have proven to significantly lower the energy cost of running an algorithm. To this end, we present the design and implementation of a parallel graph sampling technique, for a large number of input graphs streaming into a FPGA. A parallel approach using OpenCL for FPGAs was adopted to come up with a solution that is both time- and energyefficient. We introduce a novel graph data structure, suitable for streaming graphs on FPGAs, that allows time- and memory-efficient representation of graphs. Our experiments show that our proposed technique is 3x faster and 2x more energy efficient as compared to serial CPU version of the algorithm
LightRW: FPGA Accelerated Graph Dynamic Random Walks
Graph dynamic random walks (GDRWs) have recently emerged as a powerful
paradigm for graph analytics and learning applications, including graph
embedding and graph neural networks. Despite the fact that many existing
studies optimize the performance of GDRWs on multi-core CPUs, massive random
memory accesses and costly synchronizations cause severe resource
underutilization, and the processing of GDRWs is usually the key performance
bottleneck in many graph applications. This paper studies an alternative
architecture, FPGA, to address these issues in GDRWs, as FPGA has the ability
of hardware customization so that we are able to explore fine-grained pipeline
execution and specialized memory access optimizations. Specifically, we propose
{LightRW}, a novel FPGA-based accelerator for GDRWs. LightRW embraces a series
of optimizations to enable fine-grained pipeline execution on the chip and to
exploit the massive parallelism of FPGA while significantly reducing memory
accesses. As current commonly used sampling methods in GDRWs do not efficiently
support fine-grained pipeline execution, we develop a parallelized reservoir
sampling method to sample multiple vertices per cycle for efficient pipeline
execution. To address the random memory access issues, we propose a
degree-aware configurable caching method that buffers hot vertices on-chip to
alleviate random memory accesses and a dynamic burst access engine that
efficiently retrieves neighbors. Experimental results show that our
optimization techniques are able to improve the performance of GDRWs on FPGA
significantly. Moreover, LightRW delivers up to 9.55x and 9.10x speedup over
the state-of-the-art CPU-based MetaPath and Node2vec random walks,
respectively. This work is open-sourced on GitHub at
https://github.com/Xtra-Computing/LightRW.Comment: Accepted to SIGMOD 202
- …