4 research outputs found

    Dynamic Power Management for Neuromorphic Many-Core Systems

    Full text link
    This work presents a dynamic power management architecture for neuromorphic many core systems such as SpiNNaker. A fast dynamic voltage and frequency scaling (DVFS) technique is presented which allows the processing elements (PE) to change their supply voltage and clock frequency individually and autonomously within less than 100 ns. This is employed by the neuromorphic simulation software flow, which defines the performance level (PL) of the PE based on the actual workload within each simulation cycle. A test chip in 28 nm SLP CMOS technology has been implemented. It includes 4 PEs which can be scaled from 0.7 V to 1.0 V with frequencies from 125 MHz to 500 MHz at three distinct PLs. By measurement of three neuromorphic benchmarks it is shown that the total PE power consumption can be reduced by 75%, with 80% baseline power reduction and a 50% reduction of energy per neuron and synapse computation, all while maintaining temporary peak system performance to achieve biological real-time operation of the system. A numerical model of this power management model is derived which allows DVFS architecture exploration for neuromorphics. The proposed technique is to be used for the second generation SpiNNaker neuromorphic many core system

    Implementation of bioinspired algorithms on the neuromorphic VLSI system SpiNNaker 2

    Get PDF
    It is believed that neuromorphic hardware will accelerate neuroscience research and enable the next generation edge AI. On the other hand, brain-inspired algorithms are supposed to work efficiently on neuromorphic hardware. But both processes don't happen automatically. To efficiently bring together hardware and algorithm, optimizations are necessary based on the understanding of both sides. In this work, software frameworks and optimizations for efficient implementation of neural network-based algorithms on SpiNNaker 2 are proposed, resulting in optimized power consumption, memory footprint and computation time. In particular, first, a software framework including power management strategies is proposed to apply dynamic voltage and frequency scaling (DVFS) to the simulation of spiking neural networks, which is also the first-ever software framework running a neural network on SpiNNaker 2. The result shows the power consumption is reduced by 60.7% in the synfire chain benchmark. Second, numerical optimizations and data structure optimizations lead to an efficient implementation of reward-based synaptic sampling, which is one of the most complex plasticity algorithms ever implemented on neuromorphic hardware. The results show a reduction of computation time by a factor of 2 and energy consumption by 62%. Third, software optimizations are proposed which effectively exploit the efficiency of the multiply-accumulate array and the flexibility of the ARM core, which results in, when compared with Loihi, 3 times faster inference speed and 5 times lower energy consumption in a keyword spotting benchmark, and faster inference speed and lower energy consumption for adaptive control benchmark in high dimensional cases. The results of this work demonstrate the potential of SpiNNaker 2, explore its range of applications and also provide feedback for the design of the next generation neuromorphic hardware

    NengoFPGA: an FPGA Backend for the Nengo Neural Simulator

    Get PDF
    Low-power, high-speed neural networks are critical for providing deployable embedded AI applications at the edge. We describe a Xilinx FPGA implementation of Neural Engineering Framework (NEF) networks with online learning that outperforms mobile Nvidia GPU implementations by an order of magnitude or more. Specifically, we provide an embedded Python-capable PYNQ FPGA implementation supported with a Xilinx Vivado High-Level Synthesis (HLS) workflow that allows sub-millisecond implementation of adaptive neural networks with low-latency, direct I/O access to the physical world. The outcome of this work is NengoFPGA, a seamless and user-friendly extension to the neural compiler Python package Nengo. To reduce memory requirements and improve performance we tune the precision of the different intermediate variables in the code to achieve competitive absolute accuracy against slower and larger floating-point reference designs. The online learning component of the neural network exploits immediate feedback to adjust the network weights to best support a given arithmetic precision. As the space of possible design configurations of such quantized networks is vast and is subject to a target accuracy constraint, we use the Hyperopt hyper-parameter tuning tool instead of manual search to find Pareto optimal designs. Specifically, we are able to generate the optimized designs in under 500 short iterations of Vivado HLS C synthesis before running the complete Vivado place-and-route phase on that subset, a much longer process not conducive to rapid exploration. For neural network populations of 64–4096 neurons and 1–8 representational dimensions our optimized FPGA implementation generated by Hyperopt has a speedup of 10–484× over a competing cuBLAS implementation on the Jetson TX1 GPU while using 2.4–9.5× less power. Our speedups are a result of HLS-specific reformulation (15× improvement), precision adaptation (3× improvement), and low-latency direct I/O access (1000× improvement)
    corecore