26 research outputs found
Retrospective: A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing
Our ISCA 2015 paper provides a new programmable processing-in-memory (PIM)
architecture and system design that can accelerate key data-intensive
applications, with a focus on graph processing workloads. Our major idea was to
completely rethink the system, including the programming model, data
partitioning mechanisms, system support, instruction set architecture, along
with near-memory execution units and their communication architecture, such
that an important workload can be accelerated at a maximum level using a
distributed system of well-connected near-memory accelerators. We built our
accelerator system, Tesseract, using 3D-stacked memories with logic layers,
where each logic layer contains general-purpose processing cores and cores
communicate with each other using a message-passing programming model. Cores
could be specialized for graph processing (or any other application to be
accelerated).
To our knowledge, our paper was the first to completely design a near-memory
accelerator system from scratch such that it is both generally programmable and
specifically customizable to accelerate important applications, with a case
study on major graph processing workloads. Ensuing work in academia and
industry showed that similar approaches to system design can greatly benefit
both graph processing workloads and other applications, such as machine
learning, for which ideas from Tesseract seem to have been influential.
This short retrospective provides a brief analysis of our ISCA 2015 paper and
its impact. We briefly describe the major ideas and contributions of the work,
discuss later works that built on it or were influenced by it, and make some
educated guesses on what the future may bring on PIM and accelerator systems.Comment: Selected to the 50th Anniversary of ISCA (ACM/IEEE International
Symposium on Computer Architecture), Commemorative Issue, 202
Efficient load balancing techniques for graph traversal applications on GPUs
Efficiently implementing a load balancing technique in graph traversal applications for GPUs is a critical task. It is a key feature of GPU applications as it can sensibly impact on the overall application performance. Different strategies have been proposed to deal with such an issue. Nevertheless, the efficiency of each of them strongly depends on the graph characteristics and no one is the best solution for any graph. This paper presents three different balancing techniques and how they have been implemented to fully exploit the GPU architecture. It also proposes a set of support strategies that can be modularly applied to the main balancing techniques to better address the graph characteristics. The paper presents an analysis and a comparison of the three techniques and support strategies with the best solutions at the state of the art over a large dataset of representative graphs. The analysis allows statically identifying, given graph characteristics and for each of the proposed techniques, the best combination of supports, and that such a solution is more efficient than the techniques at the state of the art
Spoofax at Oracle: Domain-Specific Language Engineering for Large-Scale Graph Analytics
For the last decade, teams at Oracle relied on the Spoofax language workbench to develop a family of domain-specific languages for graph analytics in research projects and in product development. In this paper, we analyze the requirements for integrating language processors into large-scale graph analytics toolkits and for the development of these language processors as part of a larger product development process. We discuss how Spoofax helps to meet these requirements and point out the need for future improvements