3 research outputs found
LSTM-Sharp: An Adaptable, Energy-Efficient Hardware Accelerator for Long Short-Term Memory
The effectiveness of LSTM neural networks for popular tasks such as Automatic
Speech Recognition has fostered an increasing interest in LSTM inference
acceleration. Due to the recurrent nature and data dependencies of LSTM
computations, designing a customized architecture specifically tailored to its
computation pattern is crucial for efficiency. Since LSTMs are used for a
variety of tasks, generalizing this efficiency to diverse configurations, i.e.,
adaptiveness, is another key feature of these accelerators. In this work, we
first show the problem of low resource-utilization and adaptiveness for the
state-of-the-art LSTM implementations on GPU, FPGA and ASIC architectures. To
solve these issues, we propose an intelligent tiled-based dispatching mechanism
that efficiently handles the data dependencies and increases the adaptiveness
of LSTM computation. To do so, we propose LSTM-Sharp as a hardware accelerator,
which pipelines LSTM computation using an effective scheduling scheme to hide
most of the dependent serialization. Furthermore, LSTM-Sharp employs dynamic
reconfigurable architecture to adapt to the model's characteristics. LSTM-Sharp
achieves 1.5x, 2.86x, and 82x speedups on average over the state-of-the-art
ASIC, FPGA, and GPU implementations respectively, for different LSTM models and
resource budgets. Furthermore, we provide significant energy-reduction with
respect to the previous solutions, due to the low power dissipation of
LSTM-Sharp (383 GFLOPs/Watt)