1,474 research outputs found
Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions
In the past decade, Convolutional Neural Networks (CNNs) have demonstrated
state-of-the-art performance in various Artificial Intelligence tasks. To
accelerate the experimentation and development of CNNs, several software
frameworks have been released, primarily targeting power-hungry CPUs and GPUs.
In this context, reconfigurable hardware in the form of FPGAs constitutes a
potential alternative platform that can be integrated in the existing deep
learning ecosystem to provide a tunable balance between performance, power
consumption and programmability. In this paper, a survey of the existing
CNN-to-FPGA toolflows is presented, comprising a comparative study of their key
characteristics which include the supported applications, architectural
choices, design space exploration methods and achieved performance. Moreover,
major challenges and objectives introduced by the latest trends in CNN
algorithmic research are identified and presented. Finally, a uniform
evaluation methodology is proposed, aiming at the comprehensive, complete and
in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal,
201
Approximate FPGA-based LSTMs under Computation Time Constraints
Recurrent Neural Networks and in particular Long Short-Term Memory (LSTM)
networks have demonstrated state-of-the-art accuracy in several emerging
Artificial Intelligence tasks. However, the models are becoming increasingly
demanding in terms of computational and memory load. Emerging latency-sensitive
applications including mobile robots and autonomous vehicles often operate
under stringent computation time constraints. In this paper, we address the
challenge of deploying computationally demanding LSTMs at a constrained time
budget by introducing an approximate computing scheme that combines iterative
low-rank compression and pruning, along with a novel FPGA-based LSTM
architecture. Combined in an end-to-end framework, the approximation method's
parameters are optimised and the architecture is configured to address the
problem of high-performance LSTM execution in time-constrained applications.
Quantitative evaluation on a real-life image captioning application indicates
that the proposed methods required up to 6.5x less time to achieve the same
application-level accuracy compared to a baseline method, while achieving an
average of 25x higher accuracy under the same computation time constraints.Comment: Accepted at the 14th International Symposium in Applied
Reconfigurable Computing (ARC) 201
- …