Enhancing Energy-efficiency by Solving the Throughput Bottleneck of LSTM Cells for Embedded FPGAs

Abstract

To process sensor data in the Internet of Things(IoTs), embedded deep learning for 1-dimensional data is an important technique. In the past, CNNs were frequently used because they are simple to optimise for special embedded hardware such as FPGAs. This work proposes a novel LSTM cell optimisation aimed at energy-efficient inference on end devices. Using the traffic speed prediction as a case study, a vanilla LSTM model with the optimised LSTM cell achieves 17534 inferences per second while consuming only 3.8 ÎĽ\muJ per inference on the FPGA XC7S15 from Spartan-7 family. It achieves at least 5.4Ă—\times faster throughput and 1.37Ă—\times more energy efficient than existing approaches.Comment: 12 pages, 7 figure

    Similar works

    Full text

    thumbnail-image

    Available Versions