439 research outputs found
Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators
We show that DNN accelerator micro-architectures and their program mappings
represent specific choices of loop order and hardware parallelism for computing
the seven nested loops of DNNs, which enables us to create a formal taxonomy of
all existing dense DNN accelerators. Surprisingly, the loop transformations
needed to create these hardware variants can be precisely and concisely
represented by Halide's scheduling language. By modifying the Halide compiler
to generate hardware, we create a system that can fairly compare these prior
accelerators. As long as proper loop blocking schemes are used, and the
hardware can support mapping replicated loops, many different hardware
dataflows yield similar energy efficiency with good performance. This is
because the loop blocking can ensure that most data references stay on-chip
with good locality and the processing units have high resource utilization. How
resources are allocated, especially in the memory system, has a large impact on
energy and performance. By optimizing hardware resource allocation while
keeping throughput constant, we achieve up to 4.2X energy improvement for
Convolutional Neural Networks (CNNs), 1.6X and 1.8X improvement for Long
Short-Term Memories (LSTMs) and multi-layer perceptrons (MLPs), respectively.Comment: Published as a conference paper at ASPLOS 202
Software-defined Design Space Exploration for an Efficient DNN Accelerator Architecture
Deep neural networks (DNNs) have been shown to outperform conventional
machine learning algorithms across a wide range of applications, e.g., image
recognition, object detection, robotics, and natural language processing.
However, the high computational complexity of DNNs often necessitates extremely
fast and efficient hardware. The problem gets worse as the size of neural
networks grows exponentially. As a result, customized hardware accelerators
have been developed to accelerate DNN processing without sacrificing model
accuracy. However, previous accelerator design studies have not fully
considered the characteristics of the target applications, which may lead to
sub-optimal architecture designs. On the other hand, new DNN models have been
developed for better accuracy, but their compatibility with the underlying
hardware accelerator is often overlooked. In this article, we propose an
application-driven framework for architectural design space exploration of DNN
accelerators. This framework is based on a hardware analytical model of
individual DNN operations. It models the accelerator design task as a
multi-dimensional optimization problem. We demonstrate that it can be
efficaciously used in application-driven accelerator architecture design. Given
a target DNN, the framework can generate efficient accelerator design solutions
with optimized performance and area. Furthermore, we explore the opportunity to
use the framework for accelerator configuration optimization under simultaneous
diverse DNN applications. The framework is also capable of improving neural
network models to best fit the underlying hardware resources
- …