To process sensor data in the Internet of Things(IoTs), embedded deep
learning for 1-dimensional data is an important technique. In the past, CNNs
were frequently used because they are simple to optimise for special embedded
hardware such as FPGAs. This work proposes a novel LSTM cell optimisation aimed
at energy-efficient inference on end devices. Using the traffic speed
prediction as a case study, a vanilla LSTM model with the optimised LSTM cell
achieves 17534 inferences per second while consuming only 3.8 ÎĽJ per
inference on the FPGA XC7S15 from Spartan-7 family. It achieves at least
5.4Ă— faster throughput and 1.37Ă— more energy efficient than
existing approaches.Comment: 12 pages, 7 figure