681 research outputs found
Statistical Hardware Design With Multi-model Active Learning
With the rising complexity of numerous novel applications that serve our
modern society comes the strong need to design efficient computing platforms.
Designing efficient hardware is, however, a complex multi-objective problem
that deals with multiple parameters and their interactions. Given that there
are a large number of parameters and objectives involved in hardware design,
synthesizing all possible combinations is not a feasible method to find the
optimal solution. One promising approach to tackle this problem is statistical
modeling of a desired hardware performance. Here, we propose a model-based
active learning approach to solve this problem. Our proposed method uses
Bayesian models to characterize various aspects of hardware performance. We
also use transfer learning and Gaussian regression bootstrapping techniques in
conjunction with active learning to create more accurate models. Our proposed
statistical modeling method provides hardware models that are sufficiently
accurate to perform design space exploration as well as performance prediction
simultaneously. We use our proposed method to perform design space exploration
and performance prediction for various hardware setups, such as
micro-architecture design and OpenCL kernels for FPGA targets. Our experiments
show that the number of samples required to create performance models
significantly reduces while maintaining the predictive power of our proposed
statistical models. For instance, in our performance prediction setting, the
proposed method needs 65% fewer samples to create the model, and in the design
space exploration setting, our proposed method can find the best parameter
settings by exploring less than 50 samples.Comment: added a reference for GRP subsampling and corrected typo
Layerwise Noise Maximisation to Train Low-Energy Deep Neural Networks
Deep neural networks (DNNs) depend on the storage of a large number of
parameters, which consumes an important portion of the energy used during
inference. This paper considers the case where the energy usage of memory
elements can be reduced at the cost of reduced reliability. A training
algorithm is proposed to optimize the reliability of the storage separately for
each layer of the network, while incurring a negligible complexity overhead
compared to a conventional stochastic gradient descent training. For an
exponential energy-reliability model, the proposed training approach can
decrease the memory energy consumption of a DNN with binary parameters by
3.3 at isoaccuracy, compared to a reliable implementation.Comment: To be presented at AICAS 202
CNN2Gate: an implementation of convolutional neural networks inference on FPGAs with automated design space exploration
ABSTRACT: Convolutional Neural Networks (CNNs) have a major impact on our society, because of the numerous services they provide. These services include, but are not limited to image classification, video analysis, and speech recognition. Recently, the number of researches that utilize FPGAs to implement CNNs are increasing rapidly. This is due to the lower power consumption and easy reconfigurability that are offered by these platforms. Because of the research efforts put into topics, such as architecture, synthesis, and optimization, some new challenges are arising for integrating suitable hardware solutions to high-level machine learning software libraries. This paper introduces an integrated framework (CNN2Gate), which supports compilation of a CNN model for an FPGA target. CNN2Gate is capable of parsing CNN models from several popular high-level machine learning libraries, such as Keras, Pytorch, Caffe2, etc. CNN2Gate extracts computation flow of layers, in addition to weights and biases, and applies a “given” fixed-point quantization. Furthermore, it writes this information in the proper format for the FPGA vendor’s OpenCL synthesis tools that are then used to build and run the project on FPGA. CNN2Gate performs design-space exploration and fits the design on different FPGAs with limited logic resources automatically. This paper reports results of automatic synthesis and design-space exploration of AlexNet and VGG-16 on various Intel FPGA platforms
- …