2 research outputs found
SMAUG: End-to-End Full-Stack Simulation Infrastructure for Deep Learning Workloads
In recent years, there has been tremendous advances in hardware acceleration
of deep neural networks. However, most of the research has focused on
optimizing accelerator microarchitecture for higher performance and energy
efficiency on a per-layer basis. We find that for overall single-batch
inference latency, the accelerator may only make up 25-40%, with the rest spent
on data movement and in the deep learning software framework. Thus far, it has
been very difficult to study end-to-end DNN performance during early stage
design (before RTL is available) because there are no existing DNN frameworks
that support end-to-end simulation with easy custom hardware accelerator
integration. To address this gap in research infrastructure, we present SMAUG,
the first DNN framework that is purpose-built for simulation of end-to-end deep
learning applications. SMAUG offers researchers a wide range of capabilities
for evaluating DNN workloads, from diverse network topologies to easy
accelerator modeling and SoC integration. To demonstrate the power and value of
SMAUG, we present case studies that show how we can optimize overall
performance and energy efficiency for up to 1.8-5x speedup over a baseline
system, without changing any part of the accelerator microarchitecture, as well
as show how SMAUG can tune an SoC for a camera-powered deep learning pipeline.Comment: 14 pages, 20 figure
A survey of near-data processing architectures for neural networks
Data-intensive workloads and applications, such as machine learning (ML), are fundamentally limited by traditional computing systems based on the von-Neumann architecture. As data movement operations and energy consumption become key bottlenecks in the design of computing systems, the interest in unconventional approaches such as Near-Data Processing (NDP), machine learning, and especially neural network (NN)-based accelerators has grown significantly. Emerging memory technologies, such as ReRAM and 3D-stacked, are promising for efficiently architecting NDP-based accelerators for NN due to their capabilities to work as both high-density/low-energy storage and in/near-memory computation/search engine. In this paper, we present a survey of techniques for designing NDP architectures for NN. By classifying the techniques based on the memory technology employed, we underscore their similarities and differences. Finally, we discuss open challenges and future perspectives that need to be explored in order to improve and extend the adoption of NDP architectures for future computing platforms. This paper will be valuable for computer architects, chip designers, and researchers in the area of machine learning.This work has been supported by the CoCoUnit ERC Advanced Grant of the EU’s Horizon 2020 program (grant No 833057), the Spanish State Research Agency (MCIN/AEI) under grant PID2020-113172RB-I00, and the ICREA Academia program.Peer ReviewedPostprint (published version