1,739 research outputs found
Accelerating Neural Network Inference with Processing-in-DRAM: From the Edge to the Cloud
Neural networks (NNs) are growing in importance and complexity. A neural
network's performance (and energy efficiency) can be bound either by
computation or memory resources. The processing-in-memory (PIM) paradigm, where
computation is placed near or within memory arrays, is a viable solution to
accelerate memory-bound NNs. However, PIM architectures vary in form, where
different PIM approaches lead to different trade-offs. Our goal is to analyze,
discuss, and contrast DRAM-based PIM architectures for NN performance and
energy efficiency. To do so, we analyze three state-of-the-art PIM
architectures: (1) UPMEM, which integrates processors and DRAM arrays into a
single 2D chip; (2) Mensa, a 3D-stack-based PIM architecture tailored for edge
devices; and (3) SIMDRAM, which uses the analog principles of DRAM to execute
bit-serial operations. Our analysis reveals that PIM greatly benefits
memory-bound NNs: (1) UPMEM provides 23x the performance of a high-end GPU when
the GPU requires memory oversubscription for a general matrix-vector
multiplication kernel; (2) Mensa improves energy efficiency and throughput by
3.0x and 3.1x over the Google Edge TPU for 24 Google edge NN models; and (3)
SIMDRAM outperforms a CPU/GPU by 16.7x/1.4x for three binary NNs. We conclude
that the ideal PIM architecture for NN models depends on a model's distinct
attributes, due to the inherent architectural design choices.Comment: This is an extended and updated version of a paper published in IEEE
Micro, pp. 1-14, 29 Aug. 2022. arXiv admin note: text overlap with
arXiv:2109.1432
Benchmarking a New Paradigm: An Experimental Analysis of a Real Processing-in-Memory Architecture
Many modern workloads, such as neural networks, databases, and graph
processing, are fundamentally memory-bound. For such workloads, the data
movement between main memory and CPU cores imposes a significant overhead in
terms of both latency and energy. A major reason is that this communication
happens through a narrow bus with high latency and limited bandwidth, and the
low data reuse in memory-bound workloads is insufficient to amortize the cost
of main memory access. Fundamentally addressing this data movement bottleneck
requires a paradigm where the memory system assumes an active role in computing
by integrating processing capabilities. This paradigm is known as
processing-in-memory (PIM).
Recent research explores different forms of PIM architectures, motivated by
the emergence of new 3D-stacked memory technologies that integrate memory with
a logic layer where processing elements can be easily placed. Past works
evaluate these architectures in simulation or, at best, with simplified
hardware prototypes. In contrast, the UPMEM company has designed and
manufactured the first publicly-available real-world PIM architecture.
This paper provides the first comprehensive analysis of the first
publicly-available real-world PIM architecture. We make two key contributions.
First, we conduct an experimental characterization of the UPMEM-based PIM
system using microbenchmarks to assess various architecture limits such as
compute throughput and memory bandwidth, yielding new insights. Second, we
present PrIM, a benchmark suite of 16 workloads from different application
domains (e.g., linear algebra, databases, graph processing, neural networks,
bioinformatics).Comment: Our open source software is available at
https://github.com/CMU-SAFARI/prim-benchmark
A survey of near-data processing architectures for neural networks
Data-intensive workloads and applications, such as machine learning (ML), are fundamentally limited by traditional computing systems based on the von-Neumann architecture. As data movement operations and energy consumption become key bottlenecks in the design of computing systems, the interest in unconventional approaches such as Near-Data Processing (NDP), machine learning, and especially neural network (NN)-based accelerators has grown significantly. Emerging memory technologies, such as ReRAM and 3D-stacked, are promising for efficiently architecting NDP-based accelerators for NN due to their capabilities to work as both high-density/low-energy storage and in/near-memory computation/search engine. In this paper, we present a survey of techniques for designing NDP architectures for NN. By classifying the techniques based on the memory technology employed, we underscore their similarities and differences. Finally, we discuss open challenges and future perspectives that need to be explored in order to improve and extend the adoption of NDP architectures for future computing platforms. This paper will be valuable for computer architects, chip designers, and researchers in the area of machine learning.This work has been supported by the CoCoUnit ERC Advanced Grant of the EU’s Horizon 2020 program (grant No 833057), the Spanish State Research Agency (MCIN/AEI) under grant PID2020-113172RB-I00, and the ICREA Academia program.Peer ReviewedPostprint (published version
Memory-Centric Computing
Memory-centric computing aims to enable computation capability in and near
all places where data is generated and stored. As such, it can greatly reduce
the large negative performance and energy impact of data access and data
movement, by fundamentally avoiding data movement and reducing data access
latency & energy. Many recent studies show that memory-centric computing can
greatly improve system performance and energy efficiency. Major industrial
vendors and startup companies have also recently introduced memory chips that
have sophisticated computation capabilities.
This talk describes promising ongoing research and development efforts in
memory-centric computing. We classify such efforts into two major fundamental
categories: 1) processing using memory, which exploits analog operational
properties of memory structures to perform massively-parallel operations in
memory, and 2) processing near memory, which integrates processing capability
in memory controllers, the logic layer of 3D-stacked memory technologies, or
memory chips to enable high-bandwidth and low-latency memory access to
near-memory logic. We show both types of architectures (and their combination)
can enable orders of magnitude improvements in performance and energy
consumption of many important workloads, such as graph analytics, databases,
machine learning, video processing, climate modeling, genome analysis. We
discuss adoption challenges for the memory-centric computing paradigm and
conclude with some research & development opportunities.Comment: To appear as an invited special session paper at DAC 202
- …