6 research outputs found
Approximate FPGA-based LSTMs under Computation Time Constraints
Recurrent Neural Networks and in particular Long Short-Term Memory (LSTM)
networks have demonstrated state-of-the-art accuracy in several emerging
Artificial Intelligence tasks. However, the models are becoming increasingly
demanding in terms of computational and memory load. Emerging latency-sensitive
applications including mobile robots and autonomous vehicles often operate
under stringent computation time constraints. In this paper, we address the
challenge of deploying computationally demanding LSTMs at a constrained time
budget by introducing an approximate computing scheme that combines iterative
low-rank compression and pruning, along with a novel FPGA-based LSTM
architecture. Combined in an end-to-end framework, the approximation method's
parameters are optimised and the architecture is configured to address the
problem of high-performance LSTM execution in time-constrained applications.
Quantitative evaluation on a real-life image captioning application indicates
that the proposed methods required up to 6.5x less time to achieve the same
application-level accuracy compared to a baseline method, while achieving an
average of 25x higher accuracy under the same computation time constraints.Comment: Accepted at the 14th International Symposium in Applied
Reconfigurable Computing (ARC) 201
LSTM neural network implementation using memristive crossbar circuits and its various topologies
Neural Network (NN) algorithms have existed for long time now. However, they started to reemerge only after computers had been invented, because computational resources are required to implement NN algorithms. In fact, computers themselves are not fast enough to train and run the NNs. It can take days to train some complex neural networks for certain applications. One of the complex NNs that became widely used is Long-Short Term Memory (LSTM) NN algorithm.
As a broader approach to increase the computation speed and decrease power consumption of neural network algorithms, hardware realizations of the neural networks have emerged. Mainly FPGA and analog hardware are used for these purposes. On this occasion, it happens to be only FPGA implementations of LSTM exist. Using this lack, this thesis work mainly aims to show that LSTM neural network is realizable and functional in analog hardware. In fact, analog hardware using memristive crossbars can be a potential solution to the speed bottleneck experienced in software implementations of LSTM and other complex neural networks in general.
This work mainly focuses on implementation of already trained LSTM neural networks in analog circuitry. Since training consists of both forward and backward pass computations through NNs, first, there should be focus on implementing the circuitry that can run forward passes. This forward running circuit further can be extended to a complete circuit which would include training circuitry.
Additionally, there exists various LSTM topologies. Software analysis has been done to compare the performance of each LSTM architecture for time-series prediction and time-series classification applications. Each of the architectures can be implemented in analog circuitry without great difficulty using voltage-based LSTM circuit parts due its easiness to reconfigure. Fully functional implementation of the voltage-based memristive LSTM in SPICE circuit simulator is the main contribution of this thesis work. In comparison, current-based LSTM circuit parts may not be easily rearranged due to the difficulty of passing currents from one stage to the next without degradation in magnitude
Recommended from our members
Countering Acoustic Adversarial Attacks in Microphone-equipped Smart Home Devices
Deep neural networks (DNNs) continue to demonstrate superior generalization performance in an increasing range of applications, including speech recognition and image understanding. Recent innovations in compression algorithms, design of efficient architectures and hardware accelerators have prompted a rapid growth in deploying DNNs on mobile and IoT devices to redefine user experiences. Relying on the superior inference quality of DNNs, various voice-enabled devices have started to pervade our everyday lives and are increasingly used for, e.g., opening and closing doors, starting or stopping washing machines, ordering products online, and authenticating monetary transactions. As the popularity of these voice-enabled services increases, so does their risk of being attacked. Recently, DNNs have been shown to be extremely brittle under adversarial attacks and people with malicious intentions can potentially exploit this vulnerability to compromise DNN-based voice-enabled systems. Although some existing work already highlights the vulnerability of audio models, very little is known of the behaviour of compressed on-device audio models under adversarial attacks. This paper bridges this gap by investigating thoroughly the vulnerabilities of compressed audio DNNs and makes a stride towards making compressed models robust. In particular, we propose a stochastic compression technique that generates compressed models with greater robustness to adversarial attacks. We present an extensive set of evaluations on adversarial vulnerability and robustness of DNNs in two diverse audio recognition tasks, while considering two popular attack algorithms: FGSM and PGD. We found that error rates of conventionally trained audio DNNs under attack can be as high as 100%. Under both white- and black-box attacks, our proposed approach is found to decrease the error rate of DNNs under attack by a large margin.Noki
Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions
In the past decade, Convolutional Neural Networks (CNNs) have demonstrated
state-of-the-art performance in various Artificial Intelligence tasks. To
accelerate the experimentation and development of CNNs, several software
frameworks have been released, primarily targeting power-hungry CPUs and GPUs.
In this context, reconfigurable hardware in the form of FPGAs constitutes a
potential alternative platform that can be integrated in the existing deep
learning ecosystem to provide a tunable balance between performance, power
consumption and programmability. In this paper, a survey of the existing
CNN-to-FPGA toolflows is presented, comprising a comparative study of their key
characteristics which include the supported applications, architectural
choices, design space exploration methods and achieved performance. Moreover,
major challenges and objectives introduced by the latest trends in CNN
algorithmic research are identified and presented. Finally, a uniform
evaluation methodology is proposed, aiming at the comprehensive, complete and
in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal,
201