21 research outputs found

    Reduced-memory training and deployment of deep residual networks by stochastic binary quantization

    No full text
    Motivated by the goal of enabling energy-efficient and/or lower-cost hardware implementations of deep neural networks, we describe a method for modifying the standard backpropagation algorithm that significantly reduces the memory usage during training by up to a factor of 32 compared with standard single-precision floating point implementations. The method is inspired by recent work on feedback alignment in the context of seeking neurobiological correlates of backpropagationbased learning; similar to feedback alignment, we also calculate gradients imprecisely. Specifically, our method introduces stochastic binarization of hidden-unit activations for use in the backward pass, after they are no longer used in the forward pass. We show that without stochastic binarization the method is far less effective. As verification of the effectiveness of the method, we trained wide residual networks with 20 weight layers on the CIFAR-10 and CIFAR-100 image classification benchmarks, achieving error rates of 5.43%, 23.01% respectively. These error rates compare with 4.53% and 20.51% on the same network trained without stochastic binarization. Moreover, we also investigated learning binary-weights in deep residual networks and demonstrate, for the first time, that networks using binary weights at test time can perform equally to full-precision networks on CIFAR-10, with both achieving 4.5% error rate using a wide residual network with 20 layers of weights. On CIFAR-100, binary-weights at test time had an error of 22.28%, within 2% of the full-precision case

    A programmable axonal propagation delay circuit for time-delay spiking neural networks

    No full text
    We present an implementation of a programmable axonal propagation delay circuit which uses one first-order log-domain low-pass filter. Delays may be programmed in the 5-50ms range. It is designed to be a building block for time-delay spiking neural networks. It consists of a leaky-integrate-and-fire core, a spike generator circuit, and a delay adaptation circuit

    Single-bit-per-weight deep convolutional neural networks without batch-normalization layers for embedded systems

    No full text
    Batch-normalization (BN) layers are thought to be an integrally important layer type in today's state-of-the-art deep convolutional neural networks for computer vision tasks such as classification and detection. However, BN layers introduce complexity and computational overheads that are highly undesirable for training and/or inference on low-power custom hardware implementations of real-time embedded vision systems such as UAVs, robots and Internet of Things (IoT) devices. They are also problematic when batch sizes need to be very small during training, and innovations such as residual connections introduced more recently than BN layers could potentially have lessened their impact. In this paper we aim to quantify the benefits BN layers offer in image classification networks, in comparison with alternative choices. In particular, we study networks that use shifted-ReLU layers instead of BN layers. We found, following experiments with wide residual networks applied to the ImageNet, CIFAR 10 and CIFAR 100 image classification datasets, that BN layers do not consistently offer a significant advantage. We found that the accuracy margin offered by BN layers depends on the data set, the network size, and the bit-depth of weights. We conclude that in situations where BN layers are undesirable due to speed, memory or complexity costs, that using shifted-ReLU layers instead should be considered; we found they can offer advantages in all these areas, and often do not impose a significant accuracy cost

    Live demonstration : an FPGA-based emulation of an event-based vision sensor using commercially available camera

    No full text
    We will demonstrate an FPGA implementation of an event- based vision sensor using a commercially available frame- based camera. The demonstration setup consists of the host PC that includes the Quartus Prime software which is used to program and configure the FPGA, a commercially available 8-megapixel MIPI (Mobile Industry Processor Interface) camera kit, and a Cyclone V DE10-nano FPGA board, as shown in Fig. 1. The camera kit has been used to capture conventional frame- based images [1]. It is mounted on the FPGA board via the 2x20 pin general-purpose input-output port connector interface of the FPGA board. The FPGA board is used to process the digital pixel data, which are received from the camera, and to generate events. The generated events are displayed on a VGA monitor with predefined colors for each event behavior

    A compact reconfigurable mixed-signal implementation of synaptic plasticity in spiking neurons

    No full text
    We present a compact mixed-signal implementation of synaptic plasticity for both Spike Timing Dependent Plasticity (STDP) and Spike Timing Dependent Delay Plasticity (STDDP). The proposed mixed-signal implementation consists of an aVLSI time window generator and a digital adaptor. The weight and delay values are stored in a digital memory, and the adaptor will send these values to the time window generator using a digital spike of which the duration is modulated according to these values. The analogue time window generator will then generate a time window, which is required for the implementation of STDP and STDDP. The digital adaptor will carry out the weight/delay adaption using this time window. The aVLSI time window generator is compact (50 μm2 in IBM 130nm process) and we use a time multiplexing approach to achieve up to 65536 (64k) virtual digital adaptors with one physical adaptor, consuming only a fraction of the hardware resource on a Virtex 6 FPGA. Since the digital adaptor has been implemented on an FPGA, it can be easily reconfigured for different adaptation algorithms, which leaves it open for future development. Our mixed-signal implementation is therefore practical for implementing the synaptic plasticity in large-scale spiking neural networks running in real time. We show circuit simulation results illustrating both weight and delay adaptation

    An FPGA design framework for large-scale spiking neural networks

    No full text
    We present an FPGA design framework for large-scale spiking neural networks, particularly the ones with a high-density of connections or all-to-all connections. The proposed FPGA design framework is based on a reconfigurable neural layer, which is implemented using a time-multiplexing approach to achieve up to 200,000 virtual neurons with one physical neuron using only a fraction of the hardware resources in commercial-off-the-shelf FPGAs (even entry level ones). Rather than using a mathematical computational model, the physical neuron was efficiently implemented with a conductance-based model, of which the parameters were randomised between neurons to emulate the variance in biological neurons. Besides these building blocks, the proposed time-multiplexed reconfigurable neural layer has an address buffer, which will generate a fixed random weight for each connection on the fly for incoming spikes. This structure effectively reduces the usage of memory. After presenting the architecture of the proposed neural layer, we present a network with 23 proposed neural layers, each containing 64k neurons, yielding 1.5 M neurons and 92 G synapses with a total spike throughput of 1.2T spikes/s, while running in real-time on a Virtex 6 FPGA

    A generalised conductance-based silicon neuron for large-scale spiking neural networks

    No full text
    We present an analogue Very Large Scale Integration (aVLSI) implementation that uses first-order log-domain low-pass filters to implement a generalised conductance-based silicon neuron. It consists of a single synapse, which is capable of linearly summing both the excitatory and inhibitory post-synaptic currents (EPSC and IPSC) generated by the spikes arriving from different sources, a soma with a positive feedback circuit, a refractory period and spike-frequency adaptation circuit, and a high-speed synchronous Address Event Representation (AER) handshaking circuit. To increase programmability, the inputs to the neuron are digital spikes, the durations of which are modulated according to their weights. The proposed neuron is a compact design (∼170 μm2 in the IBM 130nm process). Our aVLSI generalised conductance-based neuron is therefore practical for large-scale reconfigurable spiking neural networks running in real time. Circuit simulations show that this neuron can emulate different spiking behaviours observed in biological neurons

    A mixed-signal implementation of a polychronous spiking neural network with delay adaption

    No full text
    We present a mixed-signal implementation of a re-configurable polychronous spiking neural network capable of storing and recalling spatio-temporal patterns. The proposed neural network contains one neuron array and one axon array. Spike Timing Dependent Delay Plasticity is used to fine-tune delays and add dynamics to the network. In our mixed-signal implementation, the neurons and axons have been implemented as both analog and digital circuits. The system thus consists of one FPGA, containing the digital neuron array and the digital axon array, and one analog IC containing the analog neuron array and the analog axon array. The system can be easily configured to use different combinations of each. We present and discuss the experimental results of all combinations of the analog and digital axon arrays and the analog and digital neuron arrays. The test results show that the proposed neural network is capable of successfully recalling more than 85% of stored patterns using both analog and digital circuits

    A compact neural core for digital implementation of the Neural Engineering Framework

    No full text
    The Neural Engineering Framework (NEF) is a tool that is capable of synthesising large-scale cognitive systems from subnetworks; and it has been used to construct SPAUN, which is the first brain model capable of performing cognitive tasks. It has been implemented on computers using high-level programming languages. However the software model runs much slower than real time, and therefore is not capable of performing for applications that need real-time control, such as interactive robotic systems. Here we present a compact neural core for digital implementation of the NEF on Field Programmable Gate Arrays (FPGAs) in real time. The proposed digital neural core consists of 64 neurons that are instantiated by a single physical neuron using a time-multiplexing approach. As NEF intrinsically uses a spike rate-encoding paradigm, rather than implementing spiking neurons and then measuring their firing rates, we chose to implement NEF with neurons that compute their firing rate directly. The neuron is efficiently implemented using a 9-bit fixed-point multiplier without the requirement of memory, the bandwidth of memory being the bottleneck for the time-multiplexing approach. The neural core uses only a fraction of the hardware resources in a commercial-off-the-shelf FPGA (even an entry level one) and can be easily programmed for different mathematical computations. Multiple cores can easily be combined to build real-time large-scale cognitive neural networks using the Neural Engineering Framework

    A neuromorphic implementation of mutliple spike-timing synaptic plasticity rules for large-scale neural networks

    Get PDF
    We present a neuromorphic implementation of multiple synaptic plasticity learning rules, which include both Spike Timing Dependent Plasticity (STDP) and Spike Timing Dependent Delay Plasticity (STDDP). We present a fully digital implementation as well as a mixed-signal implementation, both of which use a novel dynamic-assignment time-multiplexing approach and support up to 226 (64M) synaptic plasticity elements. Rather than implementing dedicated synapses for particular types of synaptic plasticity, we implemented a more generic synaptic plasticity adaptor array that is separate from the neurons in the neural network. Each adaptor performs synaptic plasticity according to the arrival times of the pre- and post-synaptic spikes assigned to it, and sends out a weighted or delayed pre-synaptic spike to the post-synaptic neuron in the neural network. This strategy provides great flexibility for building complex large-scale neural networks, as a neural network can be configured for multiple synaptic plasticity rules without changing its structure. We validate the proposed neuromorphic implementations with measurement results and illustrate that the circuits are capable of performing both STDP and STDDP. We argue that it is practical to scale the work presented here up to 236 (64G) synaptic adaptors on a current high-end FPGA platform
    corecore