466 research outputs found
Design Space Exploration of Neural Network Activation Function Circuits
The widespread application of artificial neural networks has prompted
researchers to experiment with FPGA and customized ASIC designs to speed up
their computation. These implementation efforts have generally focused on
weight multiplication and signal summation operations, and less on activation
functions used in these applications. Yet, efficient hardware implementations
of nonlinear activation functions like Exponential Linear Units (ELU), Scaled
Exponential Linear Units (SELU), and Hyperbolic Tangent (tanh), are central to
designing effective neural network accelerators, since these functions require
lots of resources. In this paper, we explore efficient hardware implementations
of activation functions using purely combinational circuits, with a focus on
two widely used nonlinear activation functions, i.e., SELU and tanh. Our
experiments demonstrate that neural networks are generally insensitive to the
precision of the activation function. The results also prove that the proposed
combinational circuit-based approach is very efficient in terms of speed and
area, with negligible accuracy loss on the MNIST, CIFAR-10 and IMAGENET
benchmarks. Synopsys Design Compiler synthesis results show that circuit
designs for tanh and SELU can save between 3.13-7.69 and 4.45-8:45 area
compared to the LUT/memory-based implementations, and can operate at 5.14GHz
and 4.52GHz using the 28nm SVT library, respectively. The implementation is
available at: https://github.com/ThomasMrY/ActivationFunctionDemo.Comment: 5 pages, 5 figures, 16 conferenc
Precise Multi-Neuron Abstractions for Neural Network Certification
Formal verification of neural networks is critical for their safe adoption in
real-world applications. However, designing a verifier which can handle
realistic networks in a precise manner remains an open and difficult challenge.
In this paper, we take a major step in addressing this challenge and present a
new framework, called PRIMA, that computes precise convex approximations of
arbitrary non-linear activations. PRIMA is based on novel approximation
algorithms that compute the convex hull of polytopes, leveraging concepts from
computational geometry. The algorithms have polynomial complexity, yield fewer
constraints, and minimize precision loss. We evaluate the effectiveness of
PRIMA on challenging neural networks with ReLU, Sigmoid, and Tanh activations.
Our results show that PRIMA is significantly more precise than the
state-of-the-art, verifying robustness for up to 16%, 30%, and 34% more images
than prior work on ReLU-, Sigmoid-, and Tanh-based networks, respectively
Open- and Closed-Loop Neural Network Verification using Polynomial Zonotopes
We present a novel approach to efficiently compute tight non-convex
enclosures of the image through neural networks with ReLU, sigmoid, or
hyperbolic tangent activation functions. In particular, we abstract the
input-output relation of each neuron by a polynomial approximation, which is
evaluated in a set-based manner using polynomial zonotopes. While our approach
can also can be beneficial for open-loop neural network verification, our main
application is reachability analysis of neural network controlled systems,
where polynomial zonotopes are able to capture the non-convexity caused by the
neural network as well as the system dynamics. This results in a superior
performance compared to other methods, as we demonstrate on various benchmarks
Digital Implementation of Bio-Inspired Spiking Neuronal Networks
Spiking Neural Network as the third generation of artificial neural networks offers a promising solution for future computing, prosthesis, robotic and image processing applications. This thesis introduces digital designs and implementations of building blocks of a Spiking Neural Networks (SNNs) including neurons, learning rule, and small networks of neurons in the form of a Central Pattern Generator (CPG) which can be used as a module in control part of a bio-inspired robot. The circuits have been developed using Verilog Hardware Description Language (VHDL) and simulated through Modelsim and compiled and synthesised by Altera Qurtus Prime software for FPGA devices. Astrocyte as one of the brain cells controls synaptic activity between neurons by providing feedback to neurons. A novel digital hardware is proposed for neuron-synapseastrocyte network based on the biological Adaptive Exponential (AdEx) neuron and Postnov astrocyte cell model. The network can be used for implementation of large scale spiking neural networks. Synthesis of the designed circuits shows that the designed astrocyte circuit is able to imitate its biological model and regulate the synapse transmission, successfully. In addition, synthesis results confirms that the proposed design uses less than 1% of available resources of a VIRTEX II FPGA which saves up to 4.4% of FPGA resources in comparison to other designs. Learning rule is an essential part of every neural network including SNN. In an SNN, a special type of learning called Spike Timing Dependent Plasticity (STDP) is used to modify the connection strength between the spiking neurons. A pair-based STDP (PSTDP) works on pairs of spikes while a Triplet-based STDP (TSTDP) works on triplets of spikes to modify the synaptic weights. A low cost, accurate, and configurable digital architectures are proposed for PSTDP and TSTDP learning models. The proposed circuits have been compared with the state of the art methods like Lookup Table (LUT), and Piecewise Linear approximation (PWL). The circuits can be employed in a large-scale SNN implementation due to their compactness and configurability. Most of the neuron models represented in the literature are introduced to model the behavior of a single neuron. Since there is a large number of neurons in the brain, a population-based model can be helpful in better understanding of the brain functionality, implementing cognitive tasks and studying the brain diseases. Gaussian Wilson-Cowan model as one of the population-based models represents neuronal activity in the neocortex region of the brain. A digital model is proposed for the GaussianWilson-Cowan and examined in terms of dynamical and timing behavior. The evaluation indicates that the proposed model is able to generate the dynamical behavior as the original model is capable of. Digital architectures are implemented on an Altera FPGA board. Experimental results show that the proposed circuits take maximum 2% of the resources of a Stratix Altera board. In addition, static timing analysis indicates that the circuits can work in a maximum frequency of 244 MHz
A Novel Systolic Parallel Hardware Architecture for the FPGA Acceleration of Feedforward Neural Networks
New chips for machine learning applications appear, they are tuned for a specific topology, being efficient by using highly parallel designs at the cost of high power or large complex devices. However, the computational demands of deep neural networks require flexible and efficient hardware architectures able to fit different applications, neural network types, number of inputs, outputs, layers, and units in each layer, making the migration from software to hardware easy. This paper describes novel hardware implementing any feedforward neural network (FFNN): multilayer perceptron, autoencoder, and logistic regression. The architecture admits an arbitrary input and output number, units in layers, and a number of layers. The hardware combines matrix algebra concepts with serial-parallel computation. It is based on a systolic ring of neural processing elements (NPE), only requiring as many NPEs as neuron units in the largest layer, no matter the number of layers. The use of resources grows linearly with the number of NPEs. This versatile architecture serves as an accelerator in real-time applications and its size does not affect the system clock frequency. Unlike most approaches, a single activation function block (AFB) for the whole FFNN is required. Performance, resource usage, and accuracy for several network topologies and activation functions are evaluated. The architecture reaches 550 MHz clock speed in a Virtex7 FPGA. The proposed implementation uses 18-bit fixed point achieving similar classification performance to a floating point approach. A reduced weight bit size does not affect the accuracy, allowing more weights in the same memory. Different FFNN for Iris and MNIST datasets were evaluated and, for a real-time application of abnormal cardiac detection, a x256 acceleration was achieved. The proposed architecture can perform up to 1980 Giga operations per second (GOPS), implementing the multilayer FFNN of up to 3600 neurons per layer in a single chip. The architecture can be extended to bigger capacity devices or multi-chip by the simple NPE ring extension
- …