143 research outputs found

    Energy Efficient Spiking Neuromorphic Architectures for Pattern Recognition

    Get PDF
    There is a growing concern over reliability, power consumption, and performance of traditional Von Neumann machines, especially when dealing with complex tasks like pattern recognition. In contrast, the human brain can address such problems with great ease. Brain-inspired neuromorphic computing has attracted much research interest, as it provides an appealing architectural solution to difficult tasks due to its energy efficiency, built-in parallelism, and potential scalability. Meanwhile, the inherent error resilience in neuro-computing allows promising opportunities for leveraging approximate computing for additional energy and silicon area benefits. This thesis focuses on energy efficient neuromorphic architectures which exploit parallel processing and approximate computing for pattern recognition. Firstly, two parallel spiking neural architectures are presented. The first architecture is based on spiking neural network with global inhibition (SNNGI), which integrates digital leaky integrate-and-fire spiking neurons to mimic their biological counterparts and the corresponding on-chip learning circuits for implementing the spiking timing dependent plasticity rules. In order to achieve efficient parallelization, this work addresses a number of critical issues pertaining to memory organization, parallel processing, hardware reuse for different operating modes, as well as the tradeoffs between throughput, area, and power overheads for different configurations. For the application of handwritten digit recognition, a promising training speedup of 13.5x and a recognition speedup of 25.8x over the serial SNNGI architecture are achieved. In spite of the 120MHz operating frequency, the 32-way parallel hardware design demonstrates a 59.4x training speedup over a 2.2GHz general-purpose CPU. Besides the SNNGI, we also propose another architecture based on the liquid state machine (LSM), a recurrent spiking neural network. The LSM architecture is fully parallelized and consists of randomly connected digital neurons in a reservoir and a readout stage, the latter of which is tuned by a bio-inspired learning rule. When evaluated using the TI46 speech benchmark, the FPGA LSM system demonstrates a runtime speedup of 88x over a 2.3GHz AMD CPU. In addition, approximate computing contributes significantly to the overall energy reduction of the proposed architectures. In particular, addition computations occupy a considerable portion of power and area in the neuromorphic systems, especially in the LSM. By exploiting the built-in resilience of neuro-computing, we propose a real-time reconfigurable approximate adder for FPGA implementation to reduce the energy consumption substantially. Although there exist many mature approximate adders, these designs lose their advantages in terms of area, power, and delay on the FPGA platform. Therefore, a novel approximate adder dedicated to the FPGA is necessary. The proposed adder is based on a carry skip model which reduces carry propagation delay and power, and the resulting errors are controlled by a proposed error analysis method. Also, a real-time adjustable precision mechanism is integrated to further reduce dynamic power consumption. Implemented on the Virtex-6 FPGA, it is shown that the proposed adder consumes 18.7% and 32.6% less power than the built-in Xilinx adder in two precision modes, respectively, and that the approximate adder in both modes is 1.32x faster and requires fewer FPGA resources. Besides the adders, the firing-activity based power gating for silent neurons and booth approximate multipliers are also introduced. These three proposed schemes have been applied to our neuromorphic systems. The approximate errors incurred by these schemes have been shown to be negligible, but energy reductions of up to 20% and 30.1% over the exact training computation are achieved for the SNNGI and LSM systems, respectively

    Architectures and Design of VLSI Machine Learning Systems

    Get PDF
    Quintillions of bytes of data are generated every day in this era of big data. Machine learning techniques are utilized to perform predictive analysis on these data, to reveal hidden relationships and dependencies and perform predictions of outcomes and behaviors. The obtained predictive models are used to interpret the existing data and predict new data information. Nowadays, most machine learning algorithms are realized by software programs running on general-purpose processors, which usually takes a huge amount of CPU time and introduces unbelievably high energy consumption. In comparison, a dedicated hardware design is usually much more efficient than software programs running on general-purpose processors in terms of runtime and energy consumption. Therefore, the objective of this dissertation is to develop efficient hardware architectures for mainstream machine learning algorithms, to provide a promising solution to addressing the runtime and energy bottlenecks of machine learning applications. However, it is a really challenging task to map complex machine learning algorithms to efficient hardware architectures. In fact, many important design decisions need to be made during the hardware development for efficient tradeoffs. In this dissertation, a parallel digital VLSI architecture for combined SVM training and classification is proposed. For the first time, cascade SVM, a powerful training algorithm, is leveraged to significantly improve the scalability of hardware-based SVM training and develop an efficient parallel VLSI architecture. The parallel SVM processors provide a significant training time speedup and energy reduction compared with the software SVM algorithm running on a general-purpose CPU. Furthermore, a liquid state machine based neuromorphic learning processor with integrated training and recognition is proposed. A novel theoretical measure of computational power is proposed to facilitate fast design space exploration of the recurrent reservoir. Three low-power techniques are proposed to improve the energy efficiency. Meanwhile, a 2-layer spiking neural network with global inhibition is realized on Silicon. In addition, we also present architectural design exploration of a brain-inspired digital neuromorphic processor architecture with memristive synaptic crossbar array, and highlight several synaptic memory access styles. Various analog-to-digital converter schemes have been investigated to provide new insights into the tradeoff between the hardware cost and energy consumption

    A Survey on Reservoir Computing and its Interdisciplinary Applications Beyond Traditional Machine Learning

    Full text link
    Reservoir computing (RC), first applied to temporal signal processing, is a recurrent neural network in which neurons are randomly connected. Once initialized, the connection strengths remain unchanged. Such a simple structure turns RC into a non-linear dynamical system that maps low-dimensional inputs into a high-dimensional space. The model's rich dynamics, linear separability, and memory capacity then enable a simple linear readout to generate adequate responses for various applications. RC spans areas far beyond machine learning, since it has been shown that the complex dynamics can be realized in various physical hardware implementations and biological devices. This yields greater flexibility and shorter computation time. Moreover, the neuronal responses triggered by the model's dynamics shed light on understanding brain mechanisms that also exploit similar dynamical processes. While the literature on RC is vast and fragmented, here we conduct a unified review of RC's recent developments from machine learning to physics, biology, and neuroscience. We first review the early RC models, and then survey the state-of-the-art models and their applications. We further introduce studies on modeling the brain's mechanisms by RC. Finally, we offer new perspectives on RC development, including reservoir design, coding frameworks unification, physical RC implementations, and interaction between RC, cognitive neuroscience and evolution.Comment: 51 pages, 19 figures, IEEE Acces

    Quantized Neural Networks and Neuromorphic Computing for Embedded Systems

    Get PDF
    Deep learning techniques have made great success in areas such as computer vision, speech recognition and natural language processing. Those breakthroughs made by deep learning techniques are changing every aspect of our lives. However, deep learning techniques have not realized their full potential in embedded systems such as mobiles, vehicles etc. because the high performance of deep learning techniques comes at the cost of high computation resource and energy consumption. Therefore, it is very challenging to deploy deep learning models in embedded systems because such systems have very limited computation resources and power constraints. Extensive research on deploying deep learning techniques in embedded systems has been conducted and considerable progress has been made. In this book chapter, we are going to introduce two approaches. The first approach is model compression, which is one of the very popular approaches proposed in recent years. Another approach is neuromorphic computing, which is a novel computing system that mimicks the human brain

    Opening the “Black Box” of Silicon Chip Design in Neuromorphic Computing

    Get PDF
    Neuromorphic computing, a bio-inspired computing architecture that transfers neuroscience to silicon chip, has potential to achieve the same level of computation and energy efficiency as mammalian brains. Meanwhile, three-dimensional (3D) integrated circuit (IC) design with non-volatile memory crossbar array uniquely unveils its intrinsic vector-matrix computation with parallel computing capability in neuromorphic computing designs. In this chapter, the state-of-the-art research trend on electronic circuit designs of neuromorphic computing will be introduced. Furthermore, a practical bio-inspired spiking neural network with delay-feedback topology will be discussed. In the endeavor to imitate how human beings process information, our fabricated spiking neural network chip has capability to process analog signal directly, resulting in high energy efficiency with small hardware implementation cost. Mimicking the neurological structure of mammalian brains, the potential of 3D-IC implementation technique with memristive synapses is investigated. Finally, applications on the chaotic time series prediction and the video frame recognition will be demonstrated

    FPGA Spiking Neural Processors With Supervised and Unsupervised Spike Timing Dependent Plasticity

    Get PDF
    Energy efficient architectures for brain inspired computing have been an active area of research with recent advances in the field of neuroscience. Spiking neural networks (SNN) are a class of artificial neural networks in which information is encoded in discrete spike events, closely resembling the biological brain. Liquid State Machine (LSM) is a computational model developed in theoretical neuroscience to describe information processing in recurrent neural circuits and can be used to model recurrent SNNs. LSM is composed of an input, reservoir and output layers. A major challenge in SNNs is training the network with discrete spiking events for which traditional loss functions and optimization techniques cannot be applied directly. Spike Timing Dependent Plasticity (STDP) is an unsupervised learning algorithm which updates synaptic weights based on time difference between spikes of pre synaptic and post synaptic neurons. STDP is a localized learning algorithm and induces self-organizing behaviors resulting in sparse network structures making it a suitable choice for low cost hardware implementation. SNNs are hardware friendly as presence or absence of a spike can be encoded using a binary digit. In this research, SNN processor with energy efficient architecture is developed and is implemented on Xilinx Zynq ZC706 FPGA platform. Hardware friendly learning rules based on STDP are proposed and reservoir and readout layers are trained with these learning algorithms. In order to achieve energy efficiency, sparsification algorithm utilizing STDP rule is proposed and implemented. On chip training and inference are carried out and it is shown that with the proposed unsupervised STDP for reservoir training and supervised STDP for readout training, classification performance of 95% is achieved for TI corpus speech data set. Classification performance, hardware overhead and power consumption of the processor with different learning schemes are reported
    corecore