4,860 research outputs found

    E-PUR: An Energy-Efficient Processing Unit for Recurrent Neural Networks

    Full text link
    Recurrent Neural Networks (RNNs) are a key technology for emerging applications such as automatic speech recognition, machine translation or image description. Long Short Term Memory (LSTM) networks are the most successful RNN implementation, as they can learn long term dependencies to achieve high accuracy. Unfortunately, the recurrent nature of LSTM networks significantly constrains the amount of parallelism and, hence, multicore CPUs and many-core GPUs exhibit poor efficiency for RNN inference. In this paper, we present E-PUR, an energy-efficient processing unit tailored to the requirements of LSTM computation. The main goal of E-PUR is to support large recurrent neural networks for low-power mobile devices. E-PUR provides an efficient hardware implementation of LSTM networks that is flexible to support diverse applications. One of its main novelties is a technique that we call Maximizing Weight Locality (MWL), which improves the temporal locality of the memory accesses for fetching the synaptic weights, reducing the memory requirements by a large extent. Our experimental results show that E-PUR achieves real-time performance for different LSTM networks, while reducing energy consumption by orders of magnitude with respect to general-purpose processors and GPUs, and it requires a very small chip area. Compared to a modern mobile SoC, an NVIDIA Tegra X1, E-PUR provides an average energy reduction of 92x

    Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Networks

    Full text link
    Fully realizing the potential of acceleration for Deep Neural Networks (DNNs) requires understanding and leveraging algorithmic properties. This paper builds upon the algorithmic insight that bitwidth of operations in DNNs can be reduced without compromising their classification accuracy. However, to prevent accuracy loss, the bitwidth varies significantly across DNNs and it may even be adjusted for each layer. Thus, a fixed-bitwidth accelerator would either offer limited benefits to accommodate the worst-case bitwidth requirements, or lead to a degradation in final accuracy. To alleviate these deficiencies, this work introduces dynamic bit-level fusion/decomposition as a new dimension in the design of DNN accelerators. We explore this dimension by designing Bit Fusion, a bit-flexible accelerator, that constitutes an array of bit-level processing elements that dynamically fuse to match the bitwidth of individual DNN layers. This flexibility in the architecture enables minimizing the computation and the communication at the finest granularity possible with no loss in accuracy. We evaluate the benefits of BitFusion using eight real-world feed-forward and recurrent DNNs. The proposed microarchitecture is implemented in Verilog and synthesized in 45 nm technology. Using the synthesis results and cycle accurate simulation, we compare the benefits of Bit Fusion to two state-of-the-art DNN accelerators, Eyeriss and Stripes. In the same area, frequency, and process technology, BitFusion offers 3.9x speedup and 5.1x energy savings over Eyeriss. Compared to Stripes, BitFusion provides 2.6x speedup and 3.9x energy reduction at 45 nm node when BitFusion area and frequency are set to those of Stripes. Scaling to GPU technology node of 16 nm, BitFusion almost matches the performance of a 250-Watt Titan Xp, which uses 8-bit vector instructions, while BitFusion merely consumes 895 milliwatts of power

    Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions

    Get PDF
    In the past decade, Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance in various Artificial Intelligence tasks. To accelerate the experimentation and development of CNNs, several software frameworks have been released, primarily targeting power-hungry CPUs and GPUs. In this context, reconfigurable hardware in the form of FPGAs constitutes a potential alternative platform that can be integrated in the existing deep learning ecosystem to provide a tunable balance between performance, power consumption and programmability. In this paper, a survey of the existing CNN-to-FPGA toolflows is presented, comprising a comparative study of their key characteristics which include the supported applications, architectural choices, design space exploration methods and achieved performance. Moreover, major challenges and objectives introduced by the latest trends in CNN algorithmic research are identified and presented. Finally, a uniform evaluation methodology is proposed, aiming at the comprehensive, complete and in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal, 201

    A Survey of Deep Learning for Data Caching in Edge Network

    Full text link
    The concept of edge caching provision in emerging 5G and beyond mobile networks is a promising method to deal both with the traffic congestion problem in the core network as well as reducing latency to access popular content. In that respect end user demand for popular content can be satisfied by proactively caching it at the network edge, i.e, at close proximity to the users. In addition to model based caching schemes learning-based edge caching optimizations has recently attracted significant attention and the aim hereafter is to capture these recent advances for both model based and data driven techniques in the area of proactive caching. This paper summarizes the utilization of deep learning for data caching in edge network. We first outline the typical research topics in content caching and formulate a taxonomy based on network hierarchical structure. Then, a number of key types of deep learning algorithms are presented, ranging from supervised learning to unsupervised learning as well as reinforcement learning. Furthermore, a comparison of state-of-the-art literature is provided from the aspects of caching topics and deep learning methods. Finally, we discuss research challenges and future directions of applying deep learning for cachin
    • …
    corecore