5,982 research outputs found

    Analysis of a benchmark suite to evaluate mixed numeric and symbolic processing

    Get PDF
    The suite of programs that formed the benchmark for a proposed advanced computer is described and analyzed. The features of the processor and its operating system that are tested by the benchmark are discussed. The computer codes and the supporting data for the analysis are given as appendices

    On the efficient representation and execution of deep acoustic models

    Full text link
    In this paper we present a simple and computationally efficient quantization scheme that enables us to reduce the resolution of the parameters of a neural network from 32-bit floating point values to 8-bit integer values. The proposed quantization scheme leads to significant memory savings and enables the use of optimized hardware instructions for integer arithmetic, thus significantly reducing the cost of inference. Finally, we propose a "quantization aware" training process that applies the proposed scheme during network training and find that it allows us to recover most of the loss in accuracy introduced by quantization. We validate the proposed techniques by applying them to a long short-term memory-based acoustic model on an open-ended large vocabulary speech recognition task.Comment: Accepted conference paper: "The Annual Conference of the International Speech Communication Association (Interspeech), 2016

    Wavemoth -- Fast spherical harmonic transforms by butterfly matrix compression

    Full text link
    We present Wavemoth, an experimental open source code for computing scalar spherical harmonic transforms (SHTs). Such transforms are ubiquitous in astronomical data analysis. Our code performs substantially better than existing publicly available codes due to improvements on two fronts. First, the computational core is made more efficient by using small amounts of precomputed data, as well as paying attention to CPU instruction pipelining and cache usage. Second, Wavemoth makes use of a fast and numerically stable algorithm based on compressing a set of linear operators in a precomputation step. The resulting SHT scales as O(L^2 (log L)^2) for the resolution range of practical interest, where L denotes the spherical harmonic truncation degree. For low and medium-range resolutions, Wavemoth tends to be twice as fast as libpsht, which is the current state of the art implementation for the HEALPix grid. At the resolution of the Planck experiment, L ~ 4000, Wavemoth is between three and six times faster than libpsht, depending on the computer architecture and the required precision. Due to the experimental nature of the project, only spherical harmonic synthesis is currently supported, although adding support or spherical harmonic analysis should be trivial.Comment: 13 pages, 6 figures, accepted by ApJ

    Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions

    Get PDF
    In the past decade, Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance in various Artificial Intelligence tasks. To accelerate the experimentation and development of CNNs, several software frameworks have been released, primarily targeting power-hungry CPUs and GPUs. In this context, reconfigurable hardware in the form of FPGAs constitutes a potential alternative platform that can be integrated in the existing deep learning ecosystem to provide a tunable balance between performance, power consumption and programmability. In this paper, a survey of the existing CNN-to-FPGA toolflows is presented, comprising a comparative study of their key characteristics which include the supported applications, architectural choices, design space exploration methods and achieved performance. Moreover, major challenges and objectives introduced by the latest trends in CNN algorithmic research are identified and presented. Finally, a uniform evaluation methodology is proposed, aiming at the comprehensive, complete and in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal, 201
    corecore