105 research outputs found

    Asynchronous spiking neurons, the natural key to exploit temporal sparsity

    Get PDF
    Inference of Deep Neural Networks for stream signal (Video/Audio) processing in edge devices is still challenging. Unlike the most state of the art inference engines which are efficient for static signals, our brain is optimized for real-time dynamic signal processing. We believe one important feature of the brain (asynchronous state-full processing) is the key to its excellence in this domain. In this work, we show how asynchronous processing with state-full neurons allows exploitation of the existing sparsity in natural signals. This paper explains three different types of sparsity and proposes an inference algorithm which exploits all types of sparsities in the execution of already trained networks. Our experiments in three different applications (Handwritten digit recognition, Autonomous Steering and Hand-Gesture recognition) show that this model of inference reduces the number of required operations for sparse input data by a factor of one to two orders of magnitudes. Additionally, due to fully asynchronous processing this type of inference can be run on fully distributed and scalable neuromorphic hardware platforms

    Algorithm and Architecture Co-design for High-performance Digital Signal Processing.

    Full text link
    CMOS scaling has been the driving force behind the revolution of digital signal processing (DSP) systems, but scaling is slowing down and the CMOS device is approaching its fundamental scaling limit. At the same time, DSP algorithms are continuing to evolve, so there is a growing gap between the increasing complexities of the algorithms and what is practically implementable. The gap can be bridged by exploring the synergy between algorithm and hardware design, using the so-called co-design techniques. In this thesis, algorithm and architecture co-design techniques are applied to X-ray computed tomography (CT) image reconstruction. Analysis of fixed-point quantization and CT geometry identifies an optimal word length and a mismatch between the object and projection grids. A water-filling buffer is designed to resolve the grid mismatch, and is combined with parallel fixed-point arithmetic units to improve the throughput. The analysis eventually leads to an out-of-order scheduling architecture that reduces the off-chip memory access by three orders of magnitude. The co-design techniques are further applied to the design of neural networks for sparse coding. Analysis of the neuron spiking dynamics leads to the optimal tuning of network size, spiking rate, and update step size to keep the spiking sparse. The resulting sparsity enables a bus-ring architecture to achieve both high throughput and scalability. A 65nm CMOS chip implementing the architecture demonstrates feature extraction at a throughput of 1.24G pixel/s at 1.0V and 310MHz. The error tolerance of sparse coding can be exploited to enhance the energy efficiency. As a natural next step after the sparse coding chip, a neural-inspired inference module (IM) is designed for object recognition. The object recognition chip consists of an IM based on sparse coding and an event-driven classifier. A learning co-processor is integrated on chip to enable on-chip learning. The throughput and energy efficiency are further improved using architectural techniques including sub-dividing the IM and classifier into modules and optimal pipelining. The result is a 65nm CMOS chip that performs sparse coding at 10.16G pixel/s at 1.0V and 635MHz. The co-design techniques can be applied to the design of other advanced DSP algorithms for emerging applications.PhDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113344/1/jungkook_1.pd

    Efficient Implementation of Stochastic Inference on Heterogeneous Clusters and Spiking Neural Networks

    Get PDF
    Neuromorphic computing refers to brain inspired algorithms and architectures. This paradigm of computing can solve complex problems which were not possible with traditional computing methods. This is because such implementations learn to identify the required features and classify them based on its training, akin to how brains function. This task involves performing computation on large quantities of data. With this inspiration, a comprehensive multi-pronged approach is employed to study and efficiently implement neuromorphic inference model using heterogeneous clusters to address the problem using traditional Von Neumann architectures and by developing spiking neural networks (SNN) for native and ultra-low power implementation. In this regard, an extendable high-performance computing (HPC) framework and optimizations are proposed for heterogeneous clusters to modularize complex neuromorphic applications in a distributed manner. To achieve best possible throughput and load balancing for such modularized architectures a set of algorithms are proposed to suggest the optimal mapping of different modules as an asynchronous pipeline to the available cluster resources while considering the complex data dependencies between stages. On the other hand, SNNs are more biologically plausible and can achieve ultra-low power implementation due to its sparse spike based communication, which is possible with emerging non-Von Neumann computing platforms. As a significant progress in this direction, spiking neuron models capable of distributed online learning are proposed. A high performance SNN simulator (SpNSim) is developed for simulation of large scale mixed neuron model networks. An accompanying digital hardware neuron RTL is also proposed for efficient real time implementation of SNNs capable of online learning. Finally, a methodology for mapping probabilistic graphical model to off-the-shelf neurosynaptic processor (IBM TrueNorth) as a stochastic SNN is presented with ultra-low power consumption

    Inference And Learning In Spiking Neural Networks For Neuromorphic Systems

    Get PDF
    Neuromorphic computing is a computing field that takes inspiration from the biological and physical characteristics of the neocortex system to motivate a new paradigm of highly parallel and distributed computing to take on the demands of the ever-increasing scale and computational complexity of machine intelligence esp. in energy-limited systems such as Edge devices, Internet-of-Things (IOT), and cyber physical systems (CPS). Spiking neural network (SNN) is often studied together with neuromorphic computing as the underlying computational model . Similar to the biological neural system, SNN is an inherently dynamic and stateful network. The state and output of SNN do not only dependent on the current input, but also dependent on the history information. Another distinct property of SNN is that the information is represented, transmitted, and processed as discrete spike events, also referred to as action potentials. All the processing happens in the neurons such that the computation itself is massively distributed and parallel. This enables low power information transmission and processing. However, it is inefficient to implement SNNs on traditional Von Neumann architecture due to the performance gap between memory and processor. This has led to the advent of energy-efficient large-scale neuromorphic hardware such as IBM\u27s TrueNorth and Intel\u27s Loihi that enables low power implementation of large-scale neural networks for real-time applications. And although spiking networks have theoretically been shown to have Turing-equivalent computing power, it remains a challenge to train deep SNNs; the threshold functions that generate spikes are discontinuous, so they do not have derivatives and cannot directly utilize gradient-based optimization algorithms for training. Biologically plausible learning mechanism spike-timing-dependent plasticity (STDP) and its variants are local in synapses and time but are unstable during training and difficult to train multi-layer SNNs. To better exploit the energy-saving features such as spike domain representation and stochastic computing provided by SNNs in neuromorphic hardware, and to address the hardware limitations such as limited data precision and neuron fan-in/fan-out constraints, it is necessary to re-design a neural network including its structure and computing. Our work focuses on low-level (activations, weights) and high-level (alternative learning algorithms) redesign techniques to enable inference and learning with SNNs in neuromorphic hardware. First, we focused on transforming a trained artificial neural network (ANN) to a form that is suitable for neuromorphic hardware implementation. Here, we tackle transforming Long Short-Term Memory (LSTM), a version of recurrent neural network (RNN) which includes recurrent connectivity to enable learning long temporal patterns. This is specifically a difficult challenge due to the inherent nature of RNNs and SNNs; the recurrent connectivity in RNNs induces temporal dynamics which require synchronicity, especially with the added complexity of LSTMs; and SNNs are asynchronous in nature. In addition, the constraints of the neuromorphic hardware provided a massive challenge for this realization. Thus, in this work, we invented a store-and-release circuit using integrate-and-fire neurons which allows the synchronization and then developed modules using that circuit to replicate various parts of the LSTM. These modules enabled implementation of LSTMs with spiking neurons on IBM’s TrueNorth Neurosynaptic processor. This is the first work to realize such LSTM networks utilizing spiking neurons and implement on a neuromorphic hardware. This opens avenues for the use of neuromorphic hardware in applications involving temporal patterns. Moving from mapping a pretrained ANN, we work on training networks on the neuromorphic hardware. Here, we first looked at the biologically plausible learning algorithm called STDP which is a Hebbian learning rule for learning without supervision. Simplified computational interpretations of STDP is either unstable and/or complex such that it is costly to implement on hardware. Thus, in this work, we proposed a stable version of STDP and applied intentional approximations for low-cost hardware implementation called Quantized 2-Power Shift (Q2PS) rule. With this version, we performed both unsupervised learning for feature extraction and supervised learning for classification in a multilayer SNN to achieve comparable to better accuracy on MNIST dataset compared to manually labelled two-layered networks. Next, we approached training multilayer SNNs on a neuromorphic hardware with backpropagation, a gradient-based optimization algorithm that forms the backbone of deep neural networks (DNN). Although STDP is biologically plausible, its not as robust for learning deep networks as backpropagation is for DNNs. However, backpropagation is not biologically plausible and not suitable to be directly applied to SNNs, neither can it be implemented on a neuromorphic hardware. Thus, in the first part of this work, we devise a set of approximations to transform backprogation to the spike domain such that it is suitable for SNNs. After the set of approximations, we adapted the connectivity and weight update rule in backpropagation to enable learning solely based on the locally available information such that it resembled a rate-based STDP algorithm. We called this Error-Modulated STDP (EMSTDP). In the next part of this work, we implemented EMSTDP on Intel\u27s Loihi neuromorphic chip to realize online in-hardware supervised learning of deep SNNs. This is the first realization of a fully spike-based approximation of backpropagation algorithm implemented on a neuromorphic processor. This is the first step towards building an autonomous machine that learns continuously from its environment and experiences

    Inference and Learning in Spiking Neural Networks for Neuromorphic Systems

    Get PDF
    Neuromorphic computing is a computing field that takes inspiration from the biological and physical characteristics of the neocortex system to motivate a new paradigm of highly parallel and distributed computing to take on the demands of the ever-increasing scale and computational complexity of machine intelligence esp. in energy-limited systems such as Edge devices, Internet-of-Things (IOT), and cyber physical systems (CPS). Spiking neural network (SNN) is often studied together with neuromorphic computing as the underlying computational model . Similar to the biological neural system, SNN is an inherently dynamic and stateful network. The state and output of SNN do not only dependent on the current input, but also dependent on the history information. Another distinct property of SNN is that the information is represented, transmitted, and processed as discrete spike events, also referred to as action potentials. All the processing happens in the neurons such that the computation itself is massively distributed and parallel. This enables low power information transmission and processing. However, it is inefficient to implement SNNs on traditional Von Neumann architecture due to the performance gap between memory and processor. This has led to the advent of energy-efficient large-scale neuromorphic hardware such as IBM\u27s TrueNorth and Intel\u27s Loihi that enables low power implementation of large-scale neural networks for real-time applications. And although spiking networks have theoretically been shown to have Turing-equivalent computing power, it remains a challenge to train deep SNNs; the threshold functions that generate spikes are discontinuous, so they do not have derivatives and cannot directly utilize gradient-based optimization algorithms for training. Biologically plausible learning mechanism spike-timing-dependent plasticity (STDP) and its variants are local in synapses and time but are unstable during training and difficult to train multi-layer SNNs. To better exploit the energy-saving features such as spike domain representation and stochastic computing provided by SNNs in neuromorphic hardware, and to address the hardware limitations such as limited data precision and neuron fan-in/fan-out constraints, it is necessary to re-design a neural network including its structure and computing. Our work focuses on low-level (activations, weights) and high-level (alternative learning algorithms) redesign techniques to enable inference and learning with SNNs in neuromorphic hardware. First, we focused on transforming a trained artificial neural network (ANN) to a form that is suitable for neuromorphic hardware implementation. Here, we tackle transforming Long Short-Term Memory (LSTM), a version of recurrent neural network (RNN) which includes recurrent connectivity to enable learning long temporal patterns. This is specifically a difficult challenge due to the inherent nature of RNNs and SNNs; the recurrent connectivity in RNNs induces temporal dynamics which require synchronicity, especially with the added complexity of LSTMs; and SNNs are asynchronous in nature. In addition, the constraints of the neuromorphic hardware provided a massive challenge for this realization. Thus, in this work, we invented a store-and-release circuit using integrate-and-fire neurons which allows the synchronization and then developed modules using that circuit to replicate various parts of the LSTM. These modules enabled implementation of LSTMs with spiking neurons on IBM\u27s TrueNorth Neurosynaptic processor. This is the first work to realize such LSTM networks utilizing spiking neurons and implement on a neuromorphic hardware. This opens avenues for the use of neuromorphic hardware in applications involving temporal patterns. Moving from mapping a pretrained ANN, we work on training networks on the neuromorphic hardware. Here, we first looked at the biologically plausible learning algorithm called STDP which is a Hebbian learning rule for learning without supervision. Simplified computational interpretations of STDP is either unstable and/or complex such that it is costly to implement on hardware. Thus, in this work, we proposed a stable version of STDP and applied intentional approximations for low-cost hardware implementation called Quantized 2-Power Shift (Q2PS) rule. With this version, we performed both unsupervised learning for feature extraction and supervised learning for classification in a multilayer SNN to achieve comparable to better accuracy on MNIST dataset compared to manually labelled two-layered networks. Next, we approached training multilayer SNNs on a neuromorphic hardware with backpropagation, a gradient-based optimization algorithm that forms the backbone of deep neural networks (DNN). Although STDP is biologically plausible, its not as robust for learning deep networks as backpropagation is for DNNs. However, backpropagation is not biologically plausible and not suitable to be directly applied to SNNs, neither can it be implemented on a neuromorphic hardware. Thus, in the first part of this work, we devise a set of approximations to transform backprogation to the spike domain such that it is suitable for SNNs. After the set of approximations, we adapted the connectivity and weight update rule in backpropagation to enable learning solely based on the locally available information such that it resembled a rate-based STDP algorithm. We called this Error-Modulated STDP (EMSTDP). In the next part of this work, we implemented EMSTDP on Intel\u27s Loihi neuromorphic chip to realize online in-hardware supervised learning of deep SNNs. This is the first realization of a fully spike-based approximation of backpropagation algorithm implemented on a neuromorphic processor. This is the first step towards building an autonomous machine that learns continuously from its environment and experiences
    • …
    corecore