4 research outputs found
Dynamic Power Management for Neuromorphic Many-Core Systems
This work presents a dynamic power management architecture for neuromorphic
many core systems such as SpiNNaker. A fast dynamic voltage and frequency
scaling (DVFS) technique is presented which allows the processing elements (PE)
to change their supply voltage and clock frequency individually and
autonomously within less than 100 ns. This is employed by the neuromorphic
simulation software flow, which defines the performance level (PL) of the PE
based on the actual workload within each simulation cycle. A test chip in 28 nm
SLP CMOS technology has been implemented. It includes 4 PEs which can be scaled
from 0.7 V to 1.0 V with frequencies from 125 MHz to 500 MHz at three distinct
PLs. By measurement of three neuromorphic benchmarks it is shown that the total
PE power consumption can be reduced by 75%, with 80% baseline power reduction
and a 50% reduction of energy per neuron and synapse computation, all while
maintaining temporary peak system performance to achieve biological real-time
operation of the system. A numerical model of this power management model is
derived which allows DVFS architecture exploration for neuromorphics. The
proposed technique is to be used for the second generation SpiNNaker
neuromorphic many core system
Implementation of bioinspired algorithms on the neuromorphic VLSI system SpiNNaker 2
It is believed that neuromorphic hardware will accelerate neuroscience research and enable the next generation edge AI. On the other hand, brain-inspired algorithms are supposed to work efficiently on neuromorphic hardware. But both processes don't happen automatically. To efficiently bring together hardware and algorithm, optimizations are necessary based on the understanding of both sides. In this work, software frameworks and optimizations for efficient implementation of neural network-based algorithms on SpiNNaker 2 are proposed, resulting in optimized power consumption, memory footprint and computation time. In particular, first, a software framework including power management strategies is proposed to apply dynamic voltage and frequency scaling (DVFS) to the simulation of spiking neural networks, which is also the first-ever software framework running a neural network on SpiNNaker 2. The result shows the power consumption is reduced by 60.7% in the synfire chain benchmark. Second, numerical optimizations and data structure optimizations lead to an efficient implementation of reward-based synaptic sampling, which is one of the most complex plasticity algorithms ever implemented on neuromorphic hardware. The results show a reduction of computation time by a factor of 2 and energy consumption by 62%. Third, software optimizations are proposed which effectively exploit the efficiency of the multiply-accumulate array and the flexibility of the ARM core, which results in, when compared with Loihi, 3 times faster inference speed and 5 times lower energy consumption in a keyword spotting benchmark, and faster inference speed and lower energy consumption for adaptive control benchmark in high dimensional cases. The results of this work demonstrate the potential of SpiNNaker 2, explore its range of applications and also provide feedback for the design of the next generation neuromorphic hardware
NengoFPGA: an FPGA Backend for the Nengo Neural Simulator
Low-power, high-speed neural networks are critical for providing deployable embedded AI
applications at the edge. We describe a Xilinx FPGA implementation of Neural Engineering
Framework (NEF) networks with online learning that outperforms mobile Nvidia GPU
implementations by an order of magnitude or more. Specifically, we provide an embedded
Python-capable PYNQ FPGA implementation supported with a Xilinx Vivado High-Level
Synthesis (HLS) workflow that allows sub-millisecond implementation of adaptive neural
networks with low-latency, direct I/O access to the physical world. The outcome of this
work is NengoFPGA, a seamless and user-friendly extension to the neural compiler Python
package Nengo. To reduce memory requirements and improve performance we tune the
precision of the different intermediate variables in the code to achieve competitive absolute
accuracy against slower and larger floating-point reference designs. The online learning
component of the neural network exploits immediate feedback to adjust the network weights
to best support a given arithmetic precision. As the space of possible design configurations
of such quantized networks is vast and is subject to a target accuracy constraint, we use
the Hyperopt hyper-parameter tuning tool instead of manual search to find Pareto optimal
designs. Specifically, we are able to generate the optimized designs in under 500 short
iterations of Vivado HLS C synthesis before running the complete Vivado place-and-route
phase on that subset, a much longer process not conducive to rapid exploration. For neural
network populations of 64–4096 neurons and 1–8 representational dimensions our optimized
FPGA implementation generated by Hyperopt has a speedup of 10–484× over a competing
cuBLAS implementation on the Jetson TX1 GPU while using 2.4–9.5× less power. Our
speedups are a result of HLS-specific reformulation (15× improvement), precision adaptation
(3× improvement), and low-latency direct I/O access (1000× improvement)