5,982 research outputs found
Analysis of a benchmark suite to evaluate mixed numeric and symbolic processing
The suite of programs that formed the benchmark for a proposed advanced computer is described and analyzed. The features of the processor and its operating system that are tested by the benchmark are discussed. The computer codes and the supporting data for the analysis are given as appendices
On the efficient representation and execution of deep acoustic models
In this paper we present a simple and computationally efficient quantization
scheme that enables us to reduce the resolution of the parameters of a neural
network from 32-bit floating point values to 8-bit integer values. The proposed
quantization scheme leads to significant memory savings and enables the use of
optimized hardware instructions for integer arithmetic, thus significantly
reducing the cost of inference. Finally, we propose a "quantization aware"
training process that applies the proposed scheme during network training and
find that it allows us to recover most of the loss in accuracy introduced by
quantization. We validate the proposed techniques by applying them to a long
short-term memory-based acoustic model on an open-ended large vocabulary speech
recognition task.Comment: Accepted conference paper: "The Annual Conference of the
International Speech Communication Association (Interspeech), 2016
Wavemoth -- Fast spherical harmonic transforms by butterfly matrix compression
We present Wavemoth, an experimental open source code for computing scalar
spherical harmonic transforms (SHTs). Such transforms are ubiquitous in
astronomical data analysis. Our code performs substantially better than
existing publicly available codes due to improvements on two fronts. First, the
computational core is made more efficient by using small amounts of precomputed
data, as well as paying attention to CPU instruction pipelining and cache
usage. Second, Wavemoth makes use of a fast and numerically stable algorithm
based on compressing a set of linear operators in a precomputation step. The
resulting SHT scales as O(L^2 (log L)^2) for the resolution range of practical
interest, where L denotes the spherical harmonic truncation degree. For low and
medium-range resolutions, Wavemoth tends to be twice as fast as libpsht, which
is the current state of the art implementation for the HEALPix grid. At the
resolution of the Planck experiment, L ~ 4000, Wavemoth is between three and
six times faster than libpsht, depending on the computer architecture and the
required precision. Due to the experimental nature of the project, only
spherical harmonic synthesis is currently supported, although adding support or
spherical harmonic analysis should be trivial.Comment: 13 pages, 6 figures, accepted by ApJ
Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions
In the past decade, Convolutional Neural Networks (CNNs) have demonstrated
state-of-the-art performance in various Artificial Intelligence tasks. To
accelerate the experimentation and development of CNNs, several software
frameworks have been released, primarily targeting power-hungry CPUs and GPUs.
In this context, reconfigurable hardware in the form of FPGAs constitutes a
potential alternative platform that can be integrated in the existing deep
learning ecosystem to provide a tunable balance between performance, power
consumption and programmability. In this paper, a survey of the existing
CNN-to-FPGA toolflows is presented, comprising a comparative study of their key
characteristics which include the supported applications, architectural
choices, design space exploration methods and achieved performance. Moreover,
major challenges and objectives introduced by the latest trends in CNN
algorithmic research are identified and presented. Finally, a uniform
evaluation methodology is proposed, aiming at the comprehensive, complete and
in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal,
201
- …