2,818 research outputs found

    Accelerating fully spectral CNNs with adaptive activation functions on FPGA

    Get PDF
    Computing convolutional layers in frequency domain can largely reduce the computation overhead for training and inference of convolutional neural networks (CNNs). However, existing designs with such an idea require repeated spatial- and frequency-domain transforms due to the absence of nonlinear functions in the frequency domain, as such it makes the benefit less attractive for low-latency inference. This paper presents a fully spectral CNN approach by proposing a novel adaptive Rectified Linear Unit (ReLU) activation in spectral domain. The proposed design maintains the non-linearity in the network while taking into account the hardware efficiency in algorithm level. The spectral model size is further optimized by merging and fusing layers. Then, a customized hardware architecture is proposed to implement the designed spectral network on FPGA device with DSP optimizations for 8-bit fixed point multipliers. Our hardware accelerator is implemented on Intel's Arria 10 device and applied to the MNIST, SVHN, AT&T and CIFAR-10 datasets. Experimental results show a speed improvement of 6 × ~ 10 × and 4 × ~ 5.7 × compared to state-of-the-art spatial or FFT-based designs respectively, while achieving similar accuracy across the benchmark datasets

    A Comment on the Implementation of the Ziggurat Method

    Get PDF
    We show that the short period of the uniform random number generator in the published implementation of Marsaglia and Tsang's Ziggurat method for generating random deviates can lead to poor distributions. Changing the uniform random number generator used in its implementation fixes this issue.

    High-performance FPGA-based accelerator for Bayesian neural networks

    Get PDF
    Neural networks (NNs) have demonstrated their potential in a wide range of applications such as image recognition, decision making or recommendation systems. However, standard NNs are unable to capture their model uncertainty which is crucial for many safety-critical applications including healthcare and autonomous vehicles. In comparison, Bayesian neural networks (BNNs) are able to express uncertainty in their prediction via a mathematical grounding. Nevertheless, BNNs have not been as widely used in industrial practice, mainly because of their expensive computational cost and limited hardware performance. This work proposes a novel FPGA based hardware architecture to accelerate BNNs inferred through Monte Carlo Dropout. Compared with other state-of-the-art BNN accelerators, the proposed accelerator can achieve up to 4 times higher energy efficiency and 9 times better compute efficiency. Considering partial Bayesian inference, an automatic framework is proposed, which explores the trade-off between hardware and algorithmic performance. Extensive experiments are conducted to demonstrate that our proposed framework can effectively find the optimal points in the design space

    High-performance acceleration of 2-D and 3D CNNs on FPGAs using static block floating point

    Get PDF
    Over the past few years, 2-D convolutional neural networks (CNNs) have demonstrated their great success in a wide range of 2-D computer vision applications, such as image classification and object detection. At the same time, 3-D CNNs, as a variant of 2-D CNNs, have shown their excellent ability to analyze 3-D data, such as video and geometric data. However, the heavy algorithmic complexity of 2-D and 3-D CNNs imposes a substantial overhead over the speed of these networks, which limits their deployment in real-life applications. Although various domain-specific accelerators have been proposed to address this challenge, most of them only focus on accelerating 2-D CNNs, without considering their computational efficiency on 3-D CNNs. In this article, we propose a unified hardware architecture to accelerate both 2-D and 3-D CNNs with high hardware efficiency. Our experiments demonstrate that the proposed accelerator can achieve up to 92.4% and 85.2% multiply-accumulate efficiency on 2-D and 3-D CNNs, respectively. To improve the hardware performance, we propose a hardware-friendly quantization approach called static block floating point (BFP), which eliminates the frequent representation conversions required in traditional dynamic BFP arithmetic. Comparing with the integer linear quantization using zero-point, the static BFP quantization can decrease the logic resource consumption of the convolutional kernel design by nearly 50% on a field-programmable gate array (FPGA). Without time-consuming retraining, the proposed static BFP quantization is able to quantize the precision to 8-bit mantissa with negligible accuracy loss. As different CNNs on our reconfigurable system require different hardware and software parameters to achieve optimal hardware performance and accuracy, we also propose an automatic tool for parameter optimization. Based on our hardware design and optimization, we demonstrate that the proposed accelerator can achieve 3.8-5.6 times higher energy efficiency than graphics processing unit (GPU) implementation. Comparing with the state-of-the-art FPGA-based accelerators, our design achieves higher generality and up to 1.4-2.2 times higher resource efficiency on both 2-D and 3-D CNNs

    Sampling Distributions of Random Electromagnetic Fields in Mesoscopic or Dynamical Systems

    Full text link
    We derive the sampling probability density function (pdf) of an ideal localized random electromagnetic field, its amplitude and intensity in an electromagnetic environment that is quasi-statically time-varying statistically homogeneous or static statistically inhomogeneous. The results allow for the estimation of field statistics and confidence intervals when a single spatial or temporal stochastic process produces randomization of the field. Results for both coherent and incoherent detection techniques are derived, for Cartesian, planar and full-vectorial fields. We show that the functional form of the sampling pdf depends on whether the random variable is dimensioned (e.g., the sampled electric field proper) or is expressed in dimensionless standardized or normalized form (e.g., the sampled electric field divided by its sampled standard deviation). For dimensioned quantities, the electric field, its amplitude and intensity exhibit different types of Bessel KK sampling pdfs, which differ significantly from the asymptotic Gauss normal and χ2p(2)\chi^{(2)}_{2p} ensemble pdfs when ν\nu is relatively small. By contrast, for the corresponding standardized quantities, Student tt, Fisher-Snedecor FF and root-FF sampling pdfs are obtained that exhibit heavier tails than comparable Bessel KK pdfs. Statistical uncertainties obtained from classical small-sample theory for dimensionless quantities are shown to be overestimated compared to dimensioned quantities. Differences in the sampling pdfs arising from de-normalization versus de-standardization are obtained.Comment: 12 pages, 15 figures, accepted for publication in Phys. Rev. E, minor typos correcte

    High-Performance FPGA-based Accelerator for Bayesian Neural Networks

    Get PDF
    Neural networks (NNs) have demonstrated their potential in a wide range of applications such as image recognition, decision making or recommendation systems. However, standard NNs are unable to capture their model uncertainty which is crucial for many safety-critical applications including healthcare and autonomous vehicles. In comparison, Bayesian neural networks (BNNs) are able to express uncertainty in their prediction via a mathematical grounding. Nevertheless, BNNs have not been as widely used in industrial practice, mainly because of their expensive computational cost and limited hardware performance. This work proposes a novel FPGA based hardware architecture to accelerate BNNs inferred through Monte Carlo Dropout. Compared with other state-of-the-art BNN accelerators, the proposed accelerator can achieve up to 4 times higher energy efficiency and 9 times better compute efficiency. Considering partial Bayesian inference, an automatic framework is proposed, which explores the trade-off between hardware and algorithmic performance. Extensive experiments are conducted to demonstrate that our proposed framework can effectively find the optimal points in the design space

    Optimizing CNN-based segmentation with deeply customized convolutional and deconvolutional architectures on FPGA

    Get PDF
    Convolutional Neural Networks (CNNs) based algorithms have been successful in solving image recognition problems, showing very large accuracy improvement. In recent years, deconvolution layers are widely used as key components in the state-of-the-art CNNs for end-to-end training and models to support tasks such as image segmentation and super resolution. However, the deconvolution algorithms are computationally intensive which limits their applicability to real time applications. Particularly, there has been little research on the efficient implementations of deconvolution algorithms on FPGA platforms which have been widely used to accelerate CNN algorithms by practitioners and researchers due to their high performance and power efficiency. In this work, we propose and develop deconvolution architecture for efficient FPGA implementation. FPGA-based accelerators are proposed for both deconvolution and CNN algorithms. Besides, memory sharing between the computation modules is proposed for the FPGA-based CNN accelerator as well as for other optimization techniques. A non-linear optimization model based on the performance model is introduced to efficiently explore the design space in order to achieve optimal processing speed of the system and improve power efficiency. Furthermore, a hardware mapping framework is developed to automatically generate the low-latency hardware design for any given CNN model on the target device. Finally, we implement our designs on Xilinx Zynq ZC706 board and the deconvolution accelerator achieves a performance of 90.1 GOPS under 200MHz working frequency and a performance density of 0.10 GOPS/DSP using 32-bit quantization, which significantly outperforms previous designs on FPGAs. A real-time application of scene segmentation on Cityscapes Dataset is used to evaluate our CNN accelerator on Zynq ZC706 board, and the system achieves a performance of 107 GOPS and 0.12 GOPS/DSP using 16-bit quantization, and supports up to 17 frames per second for 512x512 image inputs with a power consumption of only 9.6W

    Oscillator model for dissipative QED in an inhomogeneous dielectric

    Full text link
    The Ullersma model for the damped harmonic oscillator is coupled to the quantised electromagnetic field. All material parameters and interaction strengths are allowed to depend on position. The ensuing Hamiltonian is expressed in terms of canonical fields, and diagonalised by performing a normal-mode expansion. The commutation relations of the diagonalising operators are in agreement with the canonical commutation relations. For the proof we replace all sums of normal modes by complex integrals with the help of the residue theorem. The same technique helps us to explicitly calculate the quantum evolution of all canonical and electromagnetic fields. We identify the dielectric constant and the Green function of the wave equation for the electric field. Both functions are meromorphic in the complex frequency plane. The solution of the extended Ullersma model is in keeping with well-known phenomenological rules for setting up quantum electrodynamics in an absorptive and spatially inhomogeneous dielectric. To establish this fundamental justification, we subject the reservoir of independent harmonic oscillators to a continuum limit. The resonant frequencies of the reservoir are smeared out over the real axis. Consequently, the poles of both the dielectric constant and the Green function unite to form a branch cut. Performing an analytic continuation beyond this branch cut, we find that the long-time behaviour of the quantised electric field is completely determined by the sources of the reservoir. Through a Riemann-Lebesgue argument we demonstrate that the field itself tends to zero, whereas its quantum fluctuations stay alive. We argue that the last feature may have important consequences for application of entanglement and related processes in quantum devices.Comment: 24 pages, 1 figur
    corecore