466 research outputs found

    Design Space Exploration of Neural Network Activation Function Circuits

    Full text link
    The widespread application of artificial neural networks has prompted researchers to experiment with FPGA and customized ASIC designs to speed up their computation. These implementation efforts have generally focused on weight multiplication and signal summation operations, and less on activation functions used in these applications. Yet, efficient hardware implementations of nonlinear activation functions like Exponential Linear Units (ELU), Scaled Exponential Linear Units (SELU), and Hyperbolic Tangent (tanh), are central to designing effective neural network accelerators, since these functions require lots of resources. In this paper, we explore efficient hardware implementations of activation functions using purely combinational circuits, with a focus on two widely used nonlinear activation functions, i.e., SELU and tanh. Our experiments demonstrate that neural networks are generally insensitive to the precision of the activation function. The results also prove that the proposed combinational circuit-based approach is very efficient in terms of speed and area, with negligible accuracy loss on the MNIST, CIFAR-10 and IMAGENET benchmarks. Synopsys Design Compiler synthesis results show that circuit designs for tanh and SELU can save between 3.13-7.69 and 4.45-8:45 area compared to the LUT/memory-based implementations, and can operate at 5.14GHz and 4.52GHz using the 28nm SVT library, respectively. The implementation is available at: https://github.com/ThomasMrY/ActivationFunctionDemo.Comment: 5 pages, 5 figures, 16 conferenc

    Elementary function generators for neural-network emulators

    Full text link

    Precise Multi-Neuron Abstractions for Neural Network Certification

    Full text link
    Formal verification of neural networks is critical for their safe adoption in real-world applications. However, designing a verifier which can handle realistic networks in a precise manner remains an open and difficult challenge. In this paper, we take a major step in addressing this challenge and present a new framework, called PRIMA, that computes precise convex approximations of arbitrary non-linear activations. PRIMA is based on novel approximation algorithms that compute the convex hull of polytopes, leveraging concepts from computational geometry. The algorithms have polynomial complexity, yield fewer constraints, and minimize precision loss. We evaluate the effectiveness of PRIMA on challenging neural networks with ReLU, Sigmoid, and Tanh activations. Our results show that PRIMA is significantly more precise than the state-of-the-art, verifying robustness for up to 16%, 30%, and 34% more images than prior work on ReLU-, Sigmoid-, and Tanh-based networks, respectively

    Open- and Closed-Loop Neural Network Verification using Polynomial Zonotopes

    Full text link
    We present a novel approach to efficiently compute tight non-convex enclosures of the image through neural networks with ReLU, sigmoid, or hyperbolic tangent activation functions. In particular, we abstract the input-output relation of each neuron by a polynomial approximation, which is evaluated in a set-based manner using polynomial zonotopes. While our approach can also can be beneficial for open-loop neural network verification, our main application is reachability analysis of neural network controlled systems, where polynomial zonotopes are able to capture the non-convexity caused by the neural network as well as the system dynamics. This results in a superior performance compared to other methods, as we demonstrate on various benchmarks

    Digital Implementation of Bio-Inspired Spiking Neuronal Networks

    Get PDF
    Spiking Neural Network as the third generation of artificial neural networks offers a promising solution for future computing, prosthesis, robotic and image processing applications. This thesis introduces digital designs and implementations of building blocks of a Spiking Neural Networks (SNNs) including neurons, learning rule, and small networks of neurons in the form of a Central Pattern Generator (CPG) which can be used as a module in control part of a bio-inspired robot. The circuits have been developed using Verilog Hardware Description Language (VHDL) and simulated through Modelsim and compiled and synthesised by Altera Qurtus Prime software for FPGA devices. Astrocyte as one of the brain cells controls synaptic activity between neurons by providing feedback to neurons. A novel digital hardware is proposed for neuron-synapseastrocyte network based on the biological Adaptive Exponential (AdEx) neuron and Postnov astrocyte cell model. The network can be used for implementation of large scale spiking neural networks. Synthesis of the designed circuits shows that the designed astrocyte circuit is able to imitate its biological model and regulate the synapse transmission, successfully. In addition, synthesis results confirms that the proposed design uses less than 1% of available resources of a VIRTEX II FPGA which saves up to 4.4% of FPGA resources in comparison to other designs. Learning rule is an essential part of every neural network including SNN. In an SNN, a special type of learning called Spike Timing Dependent Plasticity (STDP) is used to modify the connection strength between the spiking neurons. A pair-based STDP (PSTDP) works on pairs of spikes while a Triplet-based STDP (TSTDP) works on triplets of spikes to modify the synaptic weights. A low cost, accurate, and configurable digital architectures are proposed for PSTDP and TSTDP learning models. The proposed circuits have been compared with the state of the art methods like Lookup Table (LUT), and Piecewise Linear approximation (PWL). The circuits can be employed in a large-scale SNN implementation due to their compactness and configurability. Most of the neuron models represented in the literature are introduced to model the behavior of a single neuron. Since there is a large number of neurons in the brain, a population-based model can be helpful in better understanding of the brain functionality, implementing cognitive tasks and studying the brain diseases. Gaussian Wilson-Cowan model as one of the population-based models represents neuronal activity in the neocortex region of the brain. A digital model is proposed for the GaussianWilson-Cowan and examined in terms of dynamical and timing behavior. The evaluation indicates that the proposed model is able to generate the dynamical behavior as the original model is capable of. Digital architectures are implemented on an Altera FPGA board. Experimental results show that the proposed circuits take maximum 2% of the resources of a Stratix Altera board. In addition, static timing analysis indicates that the circuits can work in a maximum frequency of 244 MHz

    A Novel Systolic Parallel Hardware Architecture for the FPGA Acceleration of Feedforward Neural Networks

    Get PDF
    New chips for machine learning applications appear, they are tuned for a specific topology, being efficient by using highly parallel designs at the cost of high power or large complex devices. However, the computational demands of deep neural networks require flexible and efficient hardware architectures able to fit different applications, neural network types, number of inputs, outputs, layers, and units in each layer, making the migration from software to hardware easy. This paper describes novel hardware implementing any feedforward neural network (FFNN): multilayer perceptron, autoencoder, and logistic regression. The architecture admits an arbitrary input and output number, units in layers, and a number of layers. The hardware combines matrix algebra concepts with serial-parallel computation. It is based on a systolic ring of neural processing elements (NPE), only requiring as many NPEs as neuron units in the largest layer, no matter the number of layers. The use of resources grows linearly with the number of NPEs. This versatile architecture serves as an accelerator in real-time applications and its size does not affect the system clock frequency. Unlike most approaches, a single activation function block (AFB) for the whole FFNN is required. Performance, resource usage, and accuracy for several network topologies and activation functions are evaluated. The architecture reaches 550 MHz clock speed in a Virtex7 FPGA. The proposed implementation uses 18-bit fixed point achieving similar classification performance to a floating point approach. A reduced weight bit size does not affect the accuracy, allowing more weights in the same memory. Different FFNN for Iris and MNIST datasets were evaluated and, for a real-time application of abnormal cardiac detection, a x256 acceleration was achieved. The proposed architecture can perform up to 1980 Giga operations per second (GOPS), implementing the multilayer FFNN of up to 3600 neurons per layer in a single chip. The architecture can be extended to bigger capacity devices or multi-chip by the simple NPE ring extension
    • …
    corecore