31,452 research outputs found

    Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices

    Full text link
    A recent trend in DNN development is to extend the reach of deep learning applications to platforms that are more resource and energy constrained, e.g., mobile devices. These endeavors aim to reduce the DNN model size and improve the hardware processing efficiency, and have resulted in DNNs that are much more compact in their structures and/or have high data sparsity. These compact or sparse models are different from the traditional large ones in that there is much more variation in their layer shapes and sizes, and often require specialized hardware to exploit sparsity for performance improvement. Thus, many DNN accelerators designed for large DNNs do not perform well on these models. In this work, we present Eyeriss v2, a DNN accelerator architecture designed for running compact and sparse DNNs. To deal with the widely varying layer shapes and sizes, it introduces a highly flexible on-chip network, called hierarchical mesh, that can adapt to the different amounts of data reuse and bandwidth requirements of different data types, which improves the utilization of the computation resources. Furthermore, Eyeriss v2 can process sparse data directly in the compressed domain for both weights and activations, and therefore is able to improve both processing speed and energy efficiency with sparse models. Overall, with sparse MobileNet, Eyeriss v2 in a 65nm CMOS process achieves a throughput of 1470.6 inferences/sec and 2560.3 inferences/J at a batch size of 1, which is 12.6x faster and 2.5x more energy efficient than the original Eyeriss running MobileNet. We also present an analysis methodology called Eyexam that provides a systematic way of understanding the performance limits for DNN processors as a function of specific characteristics of the DNN model and accelerator design; it applies these characteristics as sequential steps to increasingly tighten the bound on the performance limits.Comment: accepted for publication in IEEE Journal on Emerging and Selected Topics in Circuits and Systems. This extended version on arXiv also includes Eyexam in the appendi

    Energy-Efficient Multi-View Video Transmission with View Synthesis-Enabled Multicast

    Full text link
    Multi-view videos (MVVs) provide immersive viewing experience, at the cost of heavy load to wireless networks. Except for further improving viewing experience, view synthesis can create multicast opportunities for efficient transmission of MVVs in multiuser wireless networks, which has not been recognized in existing literature. In this paper, we would like to exploit view synthesis-enabled multicast opportunities for energy-efficient MVV transmission in a multiuser wireless network. Specifically, we first establish a mathematical model to characterize the impact of view synthesis on multicast opportunities and energy consumption. Then, we consider the optimization of view selection, transmission time and power allocation to minimize the weighted sum energy consumption for view transmission and synthesis, which is a challenging mixed discrete-continuous optimization problem. We propose an algorithm to obtain an optimal solution with reduced computational complexity by exploiting optimality properties. To further reduce computational complexity, we also propose two low-complexity algorithms to obtain two suboptimal solutions, based on continuous relaxation and Difference of Convex (DC) programming, respectively. Finally, numerical results demonstrate the advantage of the proposed solutions.Comment: 22 pages, 6 figures, to be published in GLOBECOM 201

    Video data compression using artificial neural network differential vector quantization

    Get PDF
    An artificial neural network vector quantizer is developed for use in data compression applications such as Digital Video. Differential Vector Quantization is used to preserve edge features, and a new adaptive algorithm, known as Frequency-Sensitive Competitive Learning, is used to develop the vector quantizer codebook. To develop real time performance, a custom Very Large Scale Integration Application Specific Integrated Circuit (VLSI ASIC) is being developed to realize the associative memory functions needed in the vector quantization algorithm. By using vector quantization, the need for Huffman coding can be eliminated, resulting in superior performance against channel bit errors than methods that use variable length codes
    • …
    corecore