89 research outputs found
Characterizing the Communication Requirements of GNN Accelerators: A Model-Based Approach
Relational data present in real world graph representations demands for tools
capable to study it accurately. In this regard Graph Neural Network (GNN) is a
powerful tool, wherein various models for it have also been developed over the
past decade. Recently, there has been a significant push towards creating
accelerators that speed up the inference and training process of GNNs. These
accelerators, however, do not delve into the impact of their dataflows on the
overall data movement and, hence, on the communication requirements. In this
paper, we formulate analytical models that capture the amount of data movement
in the most recent GNN accelerator frameworks. Specifically, the proposed
models capture the dataflows and hardware setup of these accelerator designs
and expose their scalability characteristics for a set of hardware, GNN model
and input graph parameters. Additionally, the proposed approach provides means
for the comparative analysis of the vastly different GNN accelerators.Comment: ISCAS 202
Computing graph neural networks: A survey from algorithms to accelerators
Graph Neural Networks (GNNs) have exploded onto the machine learning scene in recent years owing to their capability to model and learn from graph-structured data. Such an ability has strong implications in a wide variety of fields whose data are inherently relational, for which conventional neural networks do not perform well. Indeed, as recent reviews can attest, research in the area of GNNs has grown rapidly and has lead to the development of a variety of GNN algorithm variants as well as to the exploration of ground-breaking applications in chemistry, neurology, electronics, or communication networks, among others. At the current stage research, however, the efficient processing of GNNs is still an open challenge for several reasons. Besides of their novelty, GNNs are hard to compute due to their dependence on the input graph, their combination of dense and very sparse operations, or the need to scale to huge graphs in some applications. In this context, this article aims to make two main contributions. On the one hand, a review of the field of GNNs is presented from the perspective of computing. This includes a brief tutorial on the GNN fundamentals, an overview of the evolution of the field in the last decade, and a summary of operations carried out in the multiple phases of different GNN algorithm variants. On the other hand, an in-depth analysis of current software and hardware acceleration schemes is provided, from which a hardware-software, graph-aware, and communication-centric vision for GNN accelerators is distilled.This work is possible thanks to funding from the European Union’s Horizon 2020 research and innovation programme under Grant No. 863337 (WiPLASH project) and the Spanish Ministry of Economy and Competitiveness under contract TEC2017-90034-C2-1-R (ALLIANCE project) that receives funding from FEDER.Peer ReviewedPostprint (published version
IOPS: An Unified SpMM Accelerator Based on Inner-Outer-Hybrid Product
Sparse matrix multiplication (SpMM) is widely applied to numerous domains,
such as graph processing, machine learning, and data analytics. However, inner
product based SpMM induces redundant zero-element computing for mismatched
nonzero operands, while outer product based approach lacks input reuse across
Process Elements (PEs) and poor output locality for accumulating partial sum
(psum) matrices. Besides, current works only focus on sparse-sparse matrix
multiplication (SSMM) or sparse-dense matrix multiplication (SDMM), rarely
performing efficiently for both. To address these problems, this paper proposes
an unified SpMM accelerator, called IOPS, hybridizing inner with outer
products. It reuses the input matrix among PEs with inner product dataflow, and
removes zero-element calculations with outer product approach in each PE, which
can efficiently process SSMM and SDMM. Moreover, an address mapping method is
designed to accumulate the irregular sparse psum matrices, reducing the latency
and DRAM access of psum accumulating. Furthermore, an adaptive partition
strategy is proposed to tile the input matrices based on their sparsity ratios,
effectively utilizing the storage of architecture and reducing DRAM access.
Compared with the SSMM accelerator, SpArch, we achieve 1.7x~6.3x energy
efficiency and 1.2x~4.4x resource efficiency, with 1.4x~2.1x DRAM access
saving
Gemmini: Enabling Systematic Deep-Learning Architecture Evaluation via Full-Stack Integration
DNN accelerators are often developed and evaluated in isolation without
considering the cross-stack, system-level effects in real-world environments.
This makes it difficult to appreciate the impact of System-on-Chip (SoC)
resource contention, OS overheads, and programming-stack inefficiencies on
overall performance/energy-efficiency. To address this challenge, we present
Gemmini, an open-source*, full-stack DNN accelerator generator. Gemmini
generates a wide design-space of efficient ASIC accelerators from a flexible
architectural template, together with flexible programming stacks and full SoCs
with shared resources that capture system-level effects. Gemmini-generated
accelerators have also been fabricated, delivering up to three
orders-of-magnitude speedups over high-performance CPUs on various DNN
benchmarks.
* https://github.com/ucb-bar/gemminiComment: To appear at the 58th IEEE/ACM Design Automation Conference (DAC),
December 2021, San Francisco, CA, US
A Construction Kit for Efficient Low Power Neural Network Accelerator Designs
Implementing embedded neural network processing at the edge requires
efficient hardware acceleration that couples high computational performance
with low power consumption. Driven by the rapid evolution of network
architectures and their algorithmic features, accelerator designs are
constantly updated and improved. To evaluate and compare hardware design
choices, designers can refer to a myriad of accelerator implementations in the
literature. Surveys provide an overview of these works but are often limited to
system-level and benchmark-specific performance metrics, making it difficult to
quantitatively compare the individual effect of each utilized optimization
technique. This complicates the evaluation of optimizations for new accelerator
designs, slowing-down the research progress. This work provides a survey of
neural network accelerator optimization approaches that have been used in
recent works and reports their individual effects on edge processing
performance. It presents the list of optimizations and their quantitative
effects as a construction kit, allowing to assess the design choices for each
building block separately. Reported optimizations range from up to 10'000x
memory savings to 33x energy reductions, providing chip designers an overview
of design choices for implementing efficient low power neural network
accelerators
Towards Cognitive AI Systems: a Survey and Prospective on Neuro-Symbolic AI
The remarkable advancements in artificial intelligence (AI), primarily driven
by deep neural networks, have significantly impacted various aspects of our
lives. However, the current challenges surrounding unsustainable computational
trajectories, limited robustness, and a lack of explainability call for the
development of next-generation AI systems. Neuro-symbolic AI (NSAI) emerges as
a promising paradigm, fusing neural, symbolic, and probabilistic approaches to
enhance interpretability, robustness, and trustworthiness while facilitating
learning from much less data. Recent NSAI systems have demonstrated great
potential in collaborative human-AI scenarios with reasoning and cognitive
capabilities. In this paper, we provide a systematic review of recent progress
in NSAI and analyze the performance characteristics and computational operators
of NSAI models. Furthermore, we discuss the challenges and potential future
directions of NSAI from both system and architectural perspectives.Comment: Workshop on Systems for Next-Gen AI Paradigms, 6th Conference on
Machine Learning and Systems (MLSys), June 4-8, 2023, Miami, FL, US
Understanding the design-space of sparse/dense multiphase GNN dataflows on spatial accelerators
Graph Neural Networks (GNNs) have garnered a lot of recent interest because of their success in learning representations from graph-structured data across several critical applications in cloud and HPC. Owing to their unique compute and memory characteristics that come from an interplay between dense and sparse phases of computations, the emergence of recon-figurable dataflow (aka spatial) accelerators offers promise for acceleration by mapping optimized dataflows (i.e., computation order and parallelism) for both phases. The goal of this work is to characterize and understand the design-space of dataflow choices for running GNNs on spatial accelerators in order for mappers or design-space exploration tools to optimize the dataflow based on the workload. Specifically, we propose a taxonomy to describe all possible choices for mapping the dense and sparse phases of GNN inference, spatially and temporally over a spatial accelerator, capturing both the intra-phase dataflow and the inter-phase (pipelined) dataflow. Using this taxonomy, we do deep-dives into the cost and benefits of several dataflows and perform case studies on implications of hardware parameters for dataflows and value of flexibility to support pipelined execution.Parts of this work were supported through a fellowship by NEC Laboratories Europe, Project grant PID2020-112827GB-I00 funded by MCIN/AEI/ 10.13039/501100011033, RTI2018-098156-B-C53 (MCIU/AEI/FEDER,UE) and grant 20749/FPI/18 from FundaciĂłn SĂ©neca.Peer ReviewedPostprint (author's final draft
- …