32,025 research outputs found
Design Space Exploration of Neural Network Activation Function Circuits
The widespread application of artificial neural networks has prompted
researchers to experiment with FPGA and customized ASIC designs to speed up
their computation. These implementation efforts have generally focused on
weight multiplication and signal summation operations, and less on activation
functions used in these applications. Yet, efficient hardware implementations
of nonlinear activation functions like Exponential Linear Units (ELU), Scaled
Exponential Linear Units (SELU), and Hyperbolic Tangent (tanh), are central to
designing effective neural network accelerators, since these functions require
lots of resources. In this paper, we explore efficient hardware implementations
of activation functions using purely combinational circuits, with a focus on
two widely used nonlinear activation functions, i.e., SELU and tanh. Our
experiments demonstrate that neural networks are generally insensitive to the
precision of the activation function. The results also prove that the proposed
combinational circuit-based approach is very efficient in terms of speed and
area, with negligible accuracy loss on the MNIST, CIFAR-10 and IMAGENET
benchmarks. Synopsys Design Compiler synthesis results show that circuit
designs for tanh and SELU can save between 3.13-7.69 and 4.45-8:45 area
compared to the LUT/memory-based implementations, and can operate at 5.14GHz
and 4.52GHz using the 28nm SVT library, respectively. The implementation is
available at: https://github.com/ThomasMrY/ActivationFunctionDemo.Comment: 5 pages, 5 figures, 16 conferenc
Fast and Accurate Bilateral Filtering using Gauss-Polynomial Decomposition
The bilateral filter is a versatile non-linear filter that has found diverse
applications in image processing, computer vision, computer graphics, and
computational photography. A widely-used form of the filter is the Gaussian
bilateral filter in which both the spatial and range kernels are Gaussian. A
direct implementation of this filter requires operations per
pixel, where is the standard deviation of the spatial Gaussian. In
this paper, we propose an accurate approximation algorithm that can cut down
the computational complexity to per pixel for any arbitrary
(constant-time implementation). This is based on the observation that the range
kernel operates via the translations of a fixed Gaussian over the range space,
and that these translated Gaussians can be accurately approximated using the
so-called Gauss-polynomials. The overall algorithm emerging from this
approximation involves a series of spatial Gaussian filtering, which can be
implemented in constant-time using separability and recursion. We present some
preliminary results to demonstrate that the proposed algorithm compares
favorably with some of the existing fast algorithms in terms of speed and
accuracy.Comment: To appear in the IEEE International Conference on Image Processing
(ICIP 2015
Accuracy and Stability of Computing High-Order Derivatives of Analytic Functions by Cauchy Integrals
High-order derivatives of analytic functions are expressible as Cauchy
integrals over circular contours, which can very effectively be approximated,
e.g., by trapezoidal sums. Whereas analytically each radius r up to the radius
of convergence is equal, numerical stability strongly depends on r. We give a
comprehensive study of this effect; in particular we show that there is a
unique radius that minimizes the loss of accuracy caused by round-off errors.
For large classes of functions, though not for all, this radius actually gives
about full accuracy; a remarkable fact that we explain by the theory of Hardy
spaces, by the Wiman-Valiron and Levin-Pfluger theory of entire functions, and
by the saddle-point method of asymptotic analysis. Many examples and
non-trivial applications are discussed in detail.Comment: Version 4 has some references and a discussion of other quadrature
rules added; 57 pages, 7 figures, 6 tables; to appear in Found. Comput. Mat
Neuromorphic Hardware In The Loop: Training a Deep Spiking Network on the BrainScaleS Wafer-Scale System
Emulating spiking neural networks on analog neuromorphic hardware offers
several advantages over simulating them on conventional computers, particularly
in terms of speed and energy consumption. However, this usually comes at the
cost of reduced control over the dynamics of the emulated networks. In this
paper, we demonstrate how iterative training of a hardware-emulated network can
compensate for anomalies induced by the analog substrate. We first convert a
deep neural network trained in software to a spiking network on the BrainScaleS
wafer-scale neuromorphic system, thereby enabling an acceleration factor of 10
000 compared to the biological time domain. This mapping is followed by the
in-the-loop training, where in each training step, the network activity is
first recorded in hardware and then used to compute the parameter updates in
software via backpropagation. An essential finding is that the parameter
updates do not have to be precise, but only need to approximately follow the
correct gradient, which simplifies the computation of updates. Using this
approach, after only several tens of iterations, the spiking network shows an
accuracy close to the ideal software-emulated prototype. The presented
techniques show that deep spiking networks emulated on analog neuromorphic
devices can attain good computational performance despite the inherent
variations of the analog substrate.Comment: 8 pages, 10 figures, submitted to IJCNN 201
- …