6 research outputs found

    Approximate FPGA-based LSTMs under Computation Time Constraints

    Full text link
    Recurrent Neural Networks and in particular Long Short-Term Memory (LSTM) networks have demonstrated state-of-the-art accuracy in several emerging Artificial Intelligence tasks. However, the models are becoming increasingly demanding in terms of computational and memory load. Emerging latency-sensitive applications including mobile robots and autonomous vehicles often operate under stringent computation time constraints. In this paper, we address the challenge of deploying computationally demanding LSTMs at a constrained time budget by introducing an approximate computing scheme that combines iterative low-rank compression and pruning, along with a novel FPGA-based LSTM architecture. Combined in an end-to-end framework, the approximation method's parameters are optimised and the architecture is configured to address the problem of high-performance LSTM execution in time-constrained applications. Quantitative evaluation on a real-life image captioning application indicates that the proposed methods required up to 6.5x less time to achieve the same application-level accuracy compared to a baseline method, while achieving an average of 25x higher accuracy under the same computation time constraints.Comment: Accepted at the 14th International Symposium in Applied Reconfigurable Computing (ARC) 201

    LSTM neural network implementation using memristive crossbar circuits and its various topologies

    Get PDF
    Neural Network (NN) algorithms have existed for long time now. However, they started to reemerge only after computers had been invented, because computational resources are required to implement NN algorithms. In fact, computers themselves are not fast enough to train and run the NNs. It can take days to train some complex neural networks for certain applications. One of the complex NNs that became widely used is Long-Short Term Memory (LSTM) NN algorithm. As a broader approach to increase the computation speed and decrease power consumption of neural network algorithms, hardware realizations of the neural networks have emerged. Mainly FPGA and analog hardware are used for these purposes. On this occasion, it happens to be only FPGA implementations of LSTM exist. Using this lack, this thesis work mainly aims to show that LSTM neural network is realizable and functional in analog hardware. In fact, analog hardware using memristive crossbars can be a potential solution to the speed bottleneck experienced in software implementations of LSTM and other complex neural networks in general. This work mainly focuses on implementation of already trained LSTM neural networks in analog circuitry. Since training consists of both forward and backward pass computations through NNs, first, there should be focus on implementing the circuitry that can run forward passes. This forward running circuit further can be extended to a complete circuit which would include training circuitry. Additionally, there exists various LSTM topologies. Software analysis has been done to compare the performance of each LSTM architecture for time-series prediction and time-series classification applications. Each of the architectures can be implemented in analog circuitry without great difficulty using voltage-based LSTM circuit parts due its easiness to reconfigure. Fully functional implementation of the voltage-based memristive LSTM in SPICE circuit simulator is the main contribution of this thesis work. In comparison, current-based LSTM circuit parts may not be easily rearranged due to the difficulty of passing currents from one stage to the next without degradation in magnitude

    Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions

    Get PDF
    In the past decade, Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance in various Artificial Intelligence tasks. To accelerate the experimentation and development of CNNs, several software frameworks have been released, primarily targeting power-hungry CPUs and GPUs. In this context, reconfigurable hardware in the form of FPGAs constitutes a potential alternative platform that can be integrated in the existing deep learning ecosystem to provide a tunable balance between performance, power consumption and programmability. In this paper, a survey of the existing CNN-to-FPGA toolflows is presented, comprising a comparative study of their key characteristics which include the supported applications, architectural choices, design space exploration methods and achieved performance. Moreover, major challenges and objectives introduced by the latest trends in CNN algorithmic research are identified and presented. Finally, a uniform evaluation methodology is proposed, aiming at the comprehensive, complete and in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal, 201
    corecore