32,025 research outputs found

    Design Space Exploration of Neural Network Activation Function Circuits

    Full text link
    The widespread application of artificial neural networks has prompted researchers to experiment with FPGA and customized ASIC designs to speed up their computation. These implementation efforts have generally focused on weight multiplication and signal summation operations, and less on activation functions used in these applications. Yet, efficient hardware implementations of nonlinear activation functions like Exponential Linear Units (ELU), Scaled Exponential Linear Units (SELU), and Hyperbolic Tangent (tanh), are central to designing effective neural network accelerators, since these functions require lots of resources. In this paper, we explore efficient hardware implementations of activation functions using purely combinational circuits, with a focus on two widely used nonlinear activation functions, i.e., SELU and tanh. Our experiments demonstrate that neural networks are generally insensitive to the precision of the activation function. The results also prove that the proposed combinational circuit-based approach is very efficient in terms of speed and area, with negligible accuracy loss on the MNIST, CIFAR-10 and IMAGENET benchmarks. Synopsys Design Compiler synthesis results show that circuit designs for tanh and SELU can save between 3.13-7.69 and 4.45-8:45 area compared to the LUT/memory-based implementations, and can operate at 5.14GHz and 4.52GHz using the 28nm SVT library, respectively. The implementation is available at: https://github.com/ThomasMrY/ActivationFunctionDemo.Comment: 5 pages, 5 figures, 16 conferenc

    Fast and Accurate Bilateral Filtering using Gauss-Polynomial Decomposition

    Full text link
    The bilateral filter is a versatile non-linear filter that has found diverse applications in image processing, computer vision, computer graphics, and computational photography. A widely-used form of the filter is the Gaussian bilateral filter in which both the spatial and range kernels are Gaussian. A direct implementation of this filter requires O(σ2)O(\sigma^2) operations per pixel, where σ\sigma is the standard deviation of the spatial Gaussian. In this paper, we propose an accurate approximation algorithm that can cut down the computational complexity to O(1)O(1) per pixel for any arbitrary σ\sigma (constant-time implementation). This is based on the observation that the range kernel operates via the translations of a fixed Gaussian over the range space, and that these translated Gaussians can be accurately approximated using the so-called Gauss-polynomials. The overall algorithm emerging from this approximation involves a series of spatial Gaussian filtering, which can be implemented in constant-time using separability and recursion. We present some preliminary results to demonstrate that the proposed algorithm compares favorably with some of the existing fast algorithms in terms of speed and accuracy.Comment: To appear in the IEEE International Conference on Image Processing (ICIP 2015

    Accuracy and Stability of Computing High-Order Derivatives of Analytic Functions by Cauchy Integrals

    Full text link
    High-order derivatives of analytic functions are expressible as Cauchy integrals over circular contours, which can very effectively be approximated, e.g., by trapezoidal sums. Whereas analytically each radius r up to the radius of convergence is equal, numerical stability strongly depends on r. We give a comprehensive study of this effect; in particular we show that there is a unique radius that minimizes the loss of accuracy caused by round-off errors. For large classes of functions, though not for all, this radius actually gives about full accuracy; a remarkable fact that we explain by the theory of Hardy spaces, by the Wiman-Valiron and Levin-Pfluger theory of entire functions, and by the saddle-point method of asymptotic analysis. Many examples and non-trivial applications are discussed in detail.Comment: Version 4 has some references and a discussion of other quadrature rules added; 57 pages, 7 figures, 6 tables; to appear in Found. Comput. Mat

    Neuromorphic Hardware In The Loop: Training a Deep Spiking Network on the BrainScaleS Wafer-Scale System

    Full text link
    Emulating spiking neural networks on analog neuromorphic hardware offers several advantages over simulating them on conventional computers, particularly in terms of speed and energy consumption. However, this usually comes at the cost of reduced control over the dynamics of the emulated networks. In this paper, we demonstrate how iterative training of a hardware-emulated network can compensate for anomalies induced by the analog substrate. We first convert a deep neural network trained in software to a spiking network on the BrainScaleS wafer-scale neuromorphic system, thereby enabling an acceleration factor of 10 000 compared to the biological time domain. This mapping is followed by the in-the-loop training, where in each training step, the network activity is first recorded in hardware and then used to compute the parameter updates in software via backpropagation. An essential finding is that the parameter updates do not have to be precise, but only need to approximately follow the correct gradient, which simplifies the computation of updates. Using this approach, after only several tens of iterations, the spiking network shows an accuracy close to the ideal software-emulated prototype. The presented techniques show that deep spiking networks emulated on analog neuromorphic devices can attain good computational performance despite the inherent variations of the analog substrate.Comment: 8 pages, 10 figures, submitted to IJCNN 201
    corecore