The effectiveness of LSTM neural networks for popular tasks such as Automatic
Speech Recognition has fostered an increasing interest in LSTM inference
acceleration. Due to the recurrent nature and data dependencies of LSTM
computations, designing a customized architecture specifically tailored to its
computation pattern is crucial for efficiency. Since LSTMs are used for a
variety of tasks, generalizing this efficiency to diverse configurations, i.e.,
adaptiveness, is another key feature of these accelerators. In this work, we
first show the problem of low resource-utilization and adaptiveness for the
state-of-the-art LSTM implementations on GPU, FPGA and ASIC architectures. To
solve these issues, we propose an intelligent tiled-based dispatching mechanism
that efficiently handles the data dependencies and increases the adaptiveness
of LSTM computation. To do so, we propose LSTM-Sharp as a hardware accelerator,
which pipelines LSTM computation using an effective scheduling scheme to hide
most of the dependent serialization. Furthermore, LSTM-Sharp employs dynamic
reconfigurable architecture to adapt to the model's characteristics. LSTM-Sharp
achieves 1.5x, 2.86x, and 82x speedups on average over the state-of-the-art
ASIC, FPGA, and GPU implementations respectively, for different LSTM models and
resource budgets. Furthermore, we provide significant energy-reduction with
respect to the previous solutions, due to the low power dissipation of
LSTM-Sharp (383 GFLOPs/Watt)

Arnau, Jose-Maria

Gonzalez, Antonio

He, Yuxiong

Ruwase, Olatunji

Yazdani, Reza

Zhang, Minjia

ACM Transactions on Embedded Computing Systems

Yoruba

arXiv

The effectiveness of Recurrent Neural Networks (RNNs) for tasks such as
Automatic Speech Recognition has fostered interest in RNN inference
acceleration. Due to the recurrent nature and data dependencies of RNN
computations, prior work has designed customized architectures specifically
tailored to the computation pattern of RNN, getting high computation efficiency
for certain chosen model sizes. However, given that the dimensionality of RNNs
varies a lot for different tasks, it is crucial to generalize this efficiency
to diverse configurations. In this work, we identify adaptiveness as a key
feature that is missing from today's RNN accelerators. In particular, we first
show the problem of low resource-utilization and low adaptiveness for the
state-of-the-art RNN implementations on GPU, FPGA and ASIC architectures. To
solve these issues, we propose an intelligent tiled-based dispatching mechanism
for increasing the adaptiveness of RNN computation, in order to efficiently
handle the data dependencies. To do so, we propose Sharp as a hardware
accelerator, which pipelines RNN computation using an effective scheduling
scheme to hide most of the dependent serialization. Furthermore, Sharp employs
dynamic reconfigurable architecture to adapt to the model's characteristics.
Sharp achieves 2x, 2.8x, and 82x speedups on average, considering different RNN
models and resource budgets, compared to the state-of-the-art ASIC, FPGA, and
GPU implementations, respectively. Furthermore, we provide significant
energy-reduction with respect to the previous solutions, due to the low power
dissipation of Sharp (321 GFLOPS/Watt)

arXiv.org e-Print Archive

SHARP: An Adaptable, Energy-Efficient Accelerator for Recurrent Neural
  Network

The effectiveness of Recurrent Neural Networks (RNNs) for tasks such as Automatic Speech Recognition has fostered interest in RNN inference acceleration. Due to the recurrent nature and data dependencies of RNN computations, prior work has designed customized architectures specifically tailored to the computation pattern of RNN, getting high computation efficiency for certain chosen model sizes. However, given that the dimensionality of RNNs varies a lot for different tasks, it is crucial to generalize this efficiency to diverse configurations. In this work, we identify adaptiveness as a key feature that is missing from today’s RNN accelerators. In particular, we first show the problem of low resource utilization and low adaptiveness for the state-of-the-art RNN implementations on GPU, FPGA, and ASIC architectures. To solve these issues, we propose an intelligent tiled-based dispatching mechanism for increasing the adaptiveness of RNN computation, in order to efficiently handle the data dependencies. To do so, we propose Sharp as a hardware accelerator, which pipelines RNN computation using an effective scheduling scheme to hide most of the dependent serialization. Furthermore, Sharp employs dynamic reconfigurable architecture to adapt to the model’s characteristics. Sharp achieves 2×, 2.8×, and 82× speedups on average, considering different RNN models and resource budgets, compared to the state-of-the-art ASIC, FPGA, and GPU implementations, respectively. Furthermore, we provide significant energy reduction with respect to the previous solutions, due to the low power dissipation of Sharp (321 GFLOPS/Watt).This work has been supported by the CoCoUnit ERC Advanced Grant of the EU’s Horizon 2020 program (grant No 833057), the Spanish State Research Agency (MCIN/AEI) under grant PID2020-113172RB-I00, and the ICREA Academia program.Peer ReviewedPostprint (author's final draft

Yazdani Aminabadi, Reza

Arnau Montañés, José María

González Colás, Antonio María

UPCommons. Portal del coneixement obert de la UPC

English

SHARP: An adaptable, energy-efficient accelerator for recurrent neural networks

LSTM-Sharp: An Adaptable, Energy-Efficient Hardware Accelerator for Long
  Short-Term Memory

LSTM-Sharp: An Adaptable, Energy-Efficient Hardware Accelerator for Long Short-Term Memory

Abstract

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

UPCommons. Portal del coneixement obert de la UPC