2,396 research outputs found
Bayesian Optimized 1-Bit CNNs
Deep convolutional neural networks (DCNNs) have dominated the recent
developments in computer vision through making various record-breaking models.
However, it is still a great challenge to achieve powerful DCNNs in
resource-limited environments, such as on embedded devices and smart phones.
Researchers have realized that 1-bit CNNs can be one feasible solution to
resolve the issue; however, they are baffled by the inferior performance
compared to the full-precision DCNNs. In this paper, we propose a novel
approach, called Bayesian optimized 1-bit CNNs (denoted as BONNs), taking the
advantage of Bayesian learning, a well-established strategy for hard problems,
to significantly improve the performance of extreme 1-bit CNNs. We incorporate
the prior distributions of full-precision kernels and features into the
Bayesian framework to construct 1-bit CNNs in an end-to-end manner, which have
not been considered in any previous related methods. The Bayesian losses are
achieved with a theoretical support to optimize the network simultaneously in
both continuous and discrete spaces, aggregating different losses jointly to
improve the model capacity. Extensive experiments on the ImageNet and CIFAR
datasets show that BONNs achieve the best classification performance compared
to state-of-the-art 1-bit CNNs
FPGA-based Acceleration for Bayesian Convolutional Neural Networks
Neural networks (NNs) have demonstrated their potential in a variety of domains ranging from computer vision to natural language processing. Among various NNs, two-dimensional (2D) and three-dimensional (3D) convolutional neural networks (CNNs) have been widely adopted for a broad spectrum of applications such as image classification and video recognition, due to their excellent capabilities in extracting 2D and 3D features. However, standard 2D and 3D CNNs are not able to capture their model uncertainty which is crucial for many safety-critical applications including healthcare and autonomous driving. In contrast, Bayesian convolutional neural networks (BayesCNNs), as a variant of CNNs, have demonstrated their ability to express uncertainty in their prediction via a mathematical grounding. Nevertheless, BayesCNNs have not been widely used in industrial practice due to their compute requirements stemming from sampling and subsequent forward passes through the whole network multiple times. As a result, these requirements significantly increase the amount of computation and memory consumption in comparison to standard CNNs. This paper proposes a novel FPGA-based hardware architecture to accelerate both 2D and 3D BayesCNNs based on Monte Carlo Dropout. Compared with other state-of-the-art accelerators for BayesCNNs, the proposed design can achieve up to 4 times higher energy efficiency and 9 times better compute efficiency. An automatic framework capable of supporting partial Bayesian inference is proposed to explore the trade-off between algorithm and hardware performance. Extensive experiments are conducted to demonstrate that our framework can effectively find the optimal implementations in the design space
An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration
We empirically evaluate an undervolting technique, i.e., underscaling the
circuit supply voltage below the nominal level, to improve the power-efficiency
of Convolutional Neural Network (CNN) accelerators mapped to Field Programmable
Gate Arrays (FPGAs). Undervolting below a safe voltage level can lead to timing
faults due to excessive circuit latency increase. We evaluate the
reliability-power trade-off for such accelerators. Specifically, we
experimentally study the reduced-voltage operation of multiple components of
real FPGAs, characterize the corresponding reliability behavior of CNN
accelerators, propose techniques to minimize the drawbacks of reduced-voltage
operation, and combine undervolting with architectural CNN optimization
techniques, i.e., quantization and pruning. We investigate the effect of
environmental temperature on the reliability-power trade-off of such
accelerators. We perform experiments on three identical samples of modern
Xilinx ZCU102 FPGA platforms with five state-of-the-art image classification
CNN benchmarks. This approach allows us to study the effects of our
undervolting technique for both software and hardware variability. We achieve
more than 3X power-efficiency (GOPs/W) gain via undervolting. 2.6X of this gain
is the result of eliminating the voltage guardband region, i.e., the safe
voltage region below the nominal level that is set by FPGA vendor to ensure
correct functionality in worst-case environmental and circuit conditions. 43%
of the power-efficiency gain is due to further undervolting below the
guardband, which comes at the cost of accuracy loss in the CNN accelerator. We
evaluate an effective frequency underscaling technique that prevents this
accuracy loss, and find that it reduces the power-efficiency gain from 43% to
25%.Comment: To appear at the DSN 2020 conferenc
Dynamic Optimization of Neural Network Structures Using Probabilistic Modeling
Deep neural networks (DNNs) are powerful machine learning models and have
succeeded in various artificial intelligence tasks. Although various
architectures and modules for the DNNs have been proposed, selecting and
designing the appropriate network structure for a target problem is a
challenging task. In this paper, we propose a method to simultaneously optimize
the network structure and weight parameters during neural network training. We
consider a probability distribution that generates network structures, and
optimize the parameters of the distribution instead of directly optimizing the
network structure. The proposed method can apply to the various network
structure optimization problems under the same framework. We apply the proposed
method to several structure optimization problems such as selection of layers,
selection of unit types, and selection of connections using the MNIST,
CIFAR-10, and CIFAR-100 datasets. The experimental results show that the
proposed method can find the appropriate and competitive network structures.Comment: To appear in the Thirty-Second AAAI Conference on Artificial
Intelligence (AAAI-18), 9 page
Non-local Attention Optimized Deep Image Compression
This paper proposes a novel Non-Local Attention Optimized Deep Image
Compression (NLAIC) framework, which is built on top of the popular variational
auto-encoder (VAE) structure. Our NLAIC framework embeds non-local operations
in the encoders and decoders for both image and latent feature probability
information (known as hyperprior) to capture both local and global
correlations, and apply attention mechanism to generate masks that are used to
weigh the features for the image and hyperprior, which implicitly adapt bit
allocation for different features based on their importance. Furthermore, both
hyperpriors and spatial-channel neighbors of the latent features are used to
improve entropy coding. The proposed model outperforms the existing methods on
Kodak dataset, including learned (e.g., Balle2019, Balle2018) and conventional
(e.g., BPG, JPEG2000, JPEG) image compression methods, for both PSNR and
MS-SSIM distortion metrics
- …