7 research outputs found
Towards general-purpose neural network computing
Accepted manuscrip
Recommended from our members
Towards General-Purpose Neural Network Computing
Machine learning is becoming pervasive, decades of research in neural network computation is now being leveraged to learn patterns in data and perform computations that are difficult to express using standard programming approaches. Recent work has demonstrated that custom hardware accelerators for neural network processing can outperform software implementations in both performance and power consumption. However, there is neither an agreed-upon interface to neural network accelerators nor a consensus on neural network hardware implementations. We present a generic set of software/hardware extensions, X-FILES, that allow for the general-purpose integration of feedforward and feedback neural network computation in applications. The interface is independent of the network type, configuration, and implementation. Using these proposed extensions, we demonstrate and evaluate an example dynamically allocated, multi-context neural network accelerator architecture, DANA. We show that the combination of X-FILES and our hardware prototype, DANA, enables generic support and increased throughput for neural-network-based computation in multi-threaded scenarios.Engineering and Applied Science
Neural network computing using on-chip accelerators
The use of neural networks, machine learning, or artificial intelligence, in its broadest and most controversial sense, has been a tumultuous journey involving three distinct hype cycles and a history dating back to the 1960s. Resurgent, enthusiastic interest in machine learning and its applications bolsters the case for machine learning as a fundamental computational kernel. Furthermore, researchers have demonstrated that machine learning can be utilized as an auxiliary component of applications to enhance or enable new types of computation such as approximate computing or automatic parallelization. In our view, machine learning becomes not the underlying application, but a ubiquitous component of applications. This view necessitates a different approach towards the deployment of machine learning computation that spans not only hardware design of accelerator architectures, but also user and supervisor software to enable the safe, simultaneous use of machine learning accelerator resources.
In this dissertation, we propose a multi-transaction model of neural network computation to meet the needs of future machine learning applications. We demonstrate that this model, encompassing a decoupled backend accelerator for inference and learning from hardware and software for managing neural network transactions can be achieved with low overhead and integrated with a modern RISC-V microprocessor. Our extensions span user and supervisor software and data structures and, coupled with our hardware, enable multiple transactions from different address spaces to execute simultaneously, yet safely. Together, our system demonstrates the utility of a multi-transaction model to increase energy efficiency improvements and improve overall accelerator throughput for machine learning applications
Gemmini: Enabling Systematic Deep-Learning Architecture Evaluation via Full-Stack Integration
DNN accelerators are often developed and evaluated in isolation without
considering the cross-stack, system-level effects in real-world environments.
This makes it difficult to appreciate the impact of System-on-Chip (SoC)
resource contention, OS overheads, and programming-stack inefficiencies on
overall performance/energy-efficiency. To address this challenge, we present
Gemmini, an open-source*, full-stack DNN accelerator generator. Gemmini
generates a wide design-space of efficient ASIC accelerators from a flexible
architectural template, together with flexible programming stacks and full SoCs
with shared resources that capture system-level effects. Gemmini-generated
accelerators have also been fabricated, delivering up to three
orders-of-magnitude speedups over high-performance CPUs on various DNN
benchmarks.
* https://github.com/ucb-bar/gemminiComment: To appear at the 58th IEEE/ACM Design Automation Conference (DAC),
December 2021, San Francisco, CA, US