12 research outputs found

    Energy Efficient Neocortex-Inspired Systems with On-Device Learning

    Get PDF
    Shifting the compute workloads from cloud toward edge devices can significantly improve the overall latency for inference and learning. On the contrary this paradigm shift exacerbates the resource constraints on the edge devices. Neuromorphic computing architectures, inspired by the neural processes, are natural substrates for edge devices. They offer co-located memory, in-situ training, energy efficiency, high memory density, and compute capacity in a small form factor. Owing to these features, in the recent past, there has been a rapid proliferation of hybrid CMOS/Memristor neuromorphic computing systems. However, most of these systems offer limited plasticity, target either spatial or temporal input streams, and are not demonstrated on large scale heterogeneous tasks. There is a critical knowledge gap in designing scalable neuromorphic systems that can support hybrid plasticity for spatio-temporal input streams on edge devices. This research proposes Pyragrid, a low latency and energy efficient neuromorphic computing system for processing spatio-temporal information natively on the edge. Pyragrid is a full-scale custom hybrid CMOS/Memristor architecture with analog computational modules and an underlying digital communication scheme. Pyragrid is designed for hierarchical temporal memory, a biomimetic sequence memory algorithm inspired by the neocortex. It features a novel synthetic synapses representation that enables dynamic synaptic pathways with reduced memory usage and interconnects. The dynamic growth in the synaptic pathways is emulated in the memristor device physical behavior, while the synaptic modulation is enabled through a custom training scheme optimized for area and power. Pyragrid features data reuse, in-memory computing, and event-driven sparse local computing to reduce data movement by ~44x and maximize system throughput and power efficiency by ~3x and ~161x over custom CMOS digital design. The innate sparsity in Pyragrid results in overall robustness to noise and device failure, particularly when processing visual input and predicting time series sequences. Porting the proposed system on edge devices can enhance their computational capability, response time, and battery life

    A Contribution Towards Intelligent Autonomous Sensors Based on Perovskite Solar Cells and Ta2O5/ZnO Thin Film Transistors

    Get PDF
    Many broad applications in the field of robotics, brain-machine interfaces, cognitive computing, image and speech processing and wearables require edge devices with very constrained power and hardware requirements that are challenging to realize. This is because these applications require sub-conscious awareness and require to be always “on”, especially when integrated with a sensor node that detects an event in the environment. Present day edge intelligent devices are typically based on hybrid CMOS-memristor arrays that have been so far designed for fast switching, typically in the range of nanoseconds, low energy consumption (typically in nano-Joules), high density and endurance (exceeding 1015 cycles). On the other hand, sensory-processing systems that have the same time constants and dynamics as their input signals, are best placed to learn or extract information from them. To meet this requirement, many applications are implemented using external “delay” in the memristor, in a process which enables each synapse to be modeled as a combination of a temporal delay and a spatial weight parameter. This thesis demonstrates a synaptic thin film transistor capable of inherent logic functions as well as compute-in-memory on similar time scales as biological events. Even beyond a conventional crossbar array architecture, we have relied on new concepts in reservoir computing to demonstrate a delay system reservoir with the highest learning efficiency of 95% reported to date, in comparison to equivalent two terminal memristors, using a single device for the task of image processing. The crux of our findings relied on enhancing our capability to model the unique physics of the device, in the scope of the current thesis, that is not amenable to conventional TCAD simulations. The model provides new insight into the redox characteristics of the gate current and paves way for assessment of device performance in compute-in-memory applications. The diffusion-based mechanism of the device, effectively enables time constants that have potential in applications such as gesture recognition and detection of cardiac arrythmia. The thesis also reports a new orientation of a solution processed perovskite solar cell with an efficiency of 14.9% that is easily integrable into an intelligent sensor node. We examine the influence of the growth orientation on film morphology and solar cell efficiency. Collectively, our work aids the development of more energy-efficient, powerful edge-computing sensor systems for upcoming applications of the IOT

    Semiconductor Memory Devices for Hardware-Driven Neuromorphic Systems

    Get PDF
    This book aims to convey the most recent progress in hardware-driven neuromorphic systems based on semiconductor memory technologies. Machine learning systems and various types of artificial neural networks to realize the learning process have mainly focused on software technologies. Tremendous advances have been made, particularly in the area of data inference and recognition, in which humans have great superiority compared to conventional computers. In order to more effectively mimic our way of thinking in a further hardware sense, more synapse-like components in terms of integration density, completeness in realizing biological synaptic behaviors, and most importantly, energy-efficient operation capability, should be prepared. For higher resemblance with the biological nervous system, future developments ought to take power consumption into account and foster revolutions at the device level, which can be realized by memory technologies. This book consists of seven articles in which most recent research findings on neuromorphic systems are reported in the highlights of various memory devices and architectures. Synaptic devices and their behaviors, many-core neuromorphic platforms in close relation with memory, novel materials enabling the low-power synaptic operations based on memory devices are studied, along with evaluations and applications. Some of them can be practically realized due to high Si processing and structure compatibility with contemporary semiconductor memory technologies in production, which provides perspectives of neuromorphic chips for mass production

    Ultra-low Power Circuits and Architectures for Neuromorphic Computing Accelerators with Emerging TFETs and ReRAMs

    Get PDF
    Neuromorphic computing using post-CMOS technologies is gaining increasing popularity due to its promising potential to resolve the power constraints in Von-Neumann machine and its similarity to the operation of the real human brain. To design the ultra-low voltage and ultra-low power analog-to-digital converters (ADCs) for the neuromorphic computing systems, we explore advantages of tunnel field effect transistor (TFET) analog-to-digital converters (ADCs) on energy efficiency and temperature stability. A fully-differential SAR ADC is designed using 20 nm TFET technology with doubled input swing and controlled comparator input common-mode voltage. To further increase the resolution of the ADC, we design an energy efficient 12-bit noise shaping (NS) successive-approximation register (SAR) ADC. The 2nd-order noise shaping architecture with multiple feed-forward paths is adopted and analyzed to optimize system design parameters. By utilizing tunnel field effect transistors (TFETs), the Delta-Sigma SAR is realized under an ultra-low supply voltage VDD with high energy efficiency. The stochastic neuron is a key for event-based probabilistic neural networks. We propose a stochastic neuron using a metal-oxide resistive random-access memory (ReRAM). The ReRAM\u27s conducting filament with built-in stochasticity is used to mimic the neuron\u27s membrane capacitor, which temporally integrates input spikes. A capacitor-less neuron circuit is designed, laid out, and simulated. The output spiking train of the neuron obeys the Poisson distribution. Based on the ReRAM based neuron, we propose a scalable and reconfigurable architecture that exploits the ReRAM-based neurons for deep Spiking Neural Networks (SNNs). In prior publications, neurons were implemented using dedicated analog or digital circuits that are not area and energy efficient. In our work, for the first time, we address the scaling and power bottlenecks of neuromorphic architecture by utilizing a single one-transistor-one-ReRAM (1T1R) cell to emulate the neuron. We show that the ReRAM-based neurons can be integrated within the synaptic crossbar to build extremely dense Process Element (PE)–spiking neural network in memory array–with high throughput. We provide microarchitecture and circuit designs to enable the deep spiking neural network computing in memory with an insignificant area overhead

    Contribution à la conception d'architecture de calcul auto-adaptative intégrant des nanocomposants neuromorphiques et applications potentielles

    Get PDF
    Dans cette thèse, nous étudions les applications potentielles des nano-dispositifs mémoires émergents dans les architectures de calcul. Nous montrons que des architectures neuro-inspirées pourraient apporter l'efficacité et l'adaptabilité nécessaires à des applications de traitement et de classification complexes pour la perception visuelle et sonore. Cela, à un cout moindre en termes de consommation énergétique et de surface silicium que les architectures de type Von Neumann, grâce à une utilisation synaptique de ces nano-dispositifs. Ces travaux se focalisent sur les dispositifs dit memristifs , récemment (ré)-introduits avec la découverte du memristor en 2008 et leur utilisation comme synapse dans des réseaux de neurones impulsionnels. Cela concerne la plupart des technologies mémoire émergentes : mémoire à changement de phase Phase-Change Memory (PCM), Conductive-Bridging RAM (CBRAM), mémoire résistive Resistive RAM (RRAM)... Ces dispositifs sont bien adaptés pour l'implémentation d'algorithmes d'apprentissage non supervisés issus des neurosciences, comme Spike-Timing-Dependent Plasticity (STDP), ne nécessitant que peu de circuit de contrôle. L'intégration de dispositifs memristifs dans des matrices, ou crossbar , pourrait en outre permettre d'atteindre l'énorme densité d'intégration nécessaire pour ce type d'implémentation (plusieurs milliers de synapses par neurone), qui reste hors de portée d'une technologie purement en Complementary Metal Oxide Semiconductor (CMOS). C'est l'une des raisons majeures pour lesquelles les réseaux de neurones basés sur la technologie CMOS n'ont pas eu le succès escompté dans les années 1990. A cela s'ajoute la relative complexité et inefficacité de l'algorithme d'apprentissage de rétro-propagation du gradient, et ce malgré tous les aspects prometteurs des architectures neuro-inspirées, tels que l'adaptabilité et la tolérance aux fautes. Dans ces travaux, nous proposons des modèles synaptiques de dispositifs memristifs et des méthodologies de simulation pour des architectures les exploitant. Des architectures neuro-inspirées de nouvelle génération sont introduites et simulées pour le traitement de données naturelles. Celles-ci tirent profit des caractéristiques synaptiques des nano-dispositifs memristifs, combinées avec les dernières avancées dans les neurosciences. Nous proposons enfin des implémentations matérielles adaptées pour plusieurs types de dispositifs. Nous évaluons leur potentiel en termes d'intégration, d'efficacité énergétique et également leur tolérance à la variabilité et aux défauts inhérents à l'échelle nano-métrique de ces dispositifs. Ce dernier point est d'une importance capitale, puisqu'il constitue aujourd'hui encore la principale difficulté pour l'intégration de ces technologies émergentes dans des mémoires numériques.In this thesis, we study the potential applications of emerging memory nano-devices in computing architecture. More precisely, we show that neuro-inspired architectural paradigms could provide the efficiency and adaptability required in some complex image/audio processing and classification applications. This, at a much lower cost in terms of power consumption and silicon area than current Von Neumann-derived architectures, thanks to a synaptic-like usage of these memory nano-devices. This work is focusing on memristive nano-devices, recently (re-)introduced by the discovery of the memristor in 2008 and their use as synapses in spiking neural network. In fact, this includes most of the emerging memory technologies: Phase-Change Memory (PCM), Conductive-Bridging RAM (CBRAM), Resistive RAM (RRAM)... These devices are particularly suitable for the implementation of natural unsupervised learning algorithms like Spike-Timing-Dependent Plasticity (STDP), requiring very little control circuitry.The integration of memristive devices in crossbar array could provide the huge density required by this type of architecture (several thousand synapses per neuron), which is impossible to match with a CMOS-only implementation. This can be seen as one of the main factors that hindered the rise of CMOS-based neural network computing architectures in the nineties, among the relative complexity and inefficiency of the back-propagation learning algorithm, despite all the promising aspects of such neuro-inspired architectures, like adaptability and fault-tolerance. In this work, we propose synaptic models for memristive devices and simulation methodologies for architectural design exploiting them. Novel neuro-inspired architectures are introduced and simulated for natural data processing. They exploit the synaptic characteristics of memristives nano-devices, along with the latest progresses in neurosciences. Finally, we propose hardware implementations for several device types. We assess their scalability and power efficiency potential, and their robustness to variability and faults, which are unavoidable at the nanometric scale of these devices. This last point is of prime importance, as it constitutes today the main difficulty for the integration of these emerging technologies in digital memories.PARIS11-SCD-Bib. électronique (914719901) / SudocSudocFranceF

    Energy-Efficient Recurrent Neural Network Accelerators for Real-Time Inference

    Full text link
    Over the past decade, Deep Learning (DL) and Deep Neural Network (DNN) have gone through a rapid development. They are now vastly applied to various applications and have profoundly changed the life of hu- man beings. As an essential element of DNN, Recurrent Neural Networks (RNN) are helpful in processing time-sequential data and are widely used in applications such as speech recognition and machine translation. RNNs are difficult to compute because of their massive arithmetic operations and large memory footprint. RNN inference workloads used to be executed on conventional general-purpose processors including Central Processing Units (CPU) and Graphics Processing Units (GPU); however, they have un- necessary hardware blocks for RNN computation such as branch predictor, caching system, making them not optimal for RNN processing. To accelerate RNN computations and outperform the performance of conventional processors, previous work focused on optimization methods on both software and hardware. On the software side, previous works mainly used model compression to reduce the memory footprint and the arithmetic operations of RNNs. On the hardware side, previous works also designed domain-specific hardware accelerators based on Field Pro- grammable Gate Arrays (FPGA) or Application Specific Integrated Circuits (ASIC) with customized hardware pipelines optimized for efficient pro- cessing of RNNs. By following this software-hardware co-design strategy, previous works achieved at least 10X speedup over conventional processors. Many previous works focused on achieving high throughput with a large batch of input streams. However, in real-time applications, such as gaming Artificial Intellegence (AI), dynamical system control, low latency is more critical. Moreover, there is a trend of offloading neural network workloads to edge devices to provide a better user experience and privacy protection. Edge devices, such as mobile phones and wearable devices, are usually resource-constrained with a tight power budget. They require RNN hard- ware that is more energy-efficient to realize both low-latency inference and long battery life. Brain neurons have sparsity in both the spatial domain and time domain. Inspired by this human nature, previous work mainly explored model compression to induce spatial sparsity in RNNs. The delta network algorithm alternatively induces temporal sparsity in RNNs and can save over 10X arithmetic operations in RNNs proven by previous works. In this work, we have proposed customized hardware accelerators to exploit temporal sparsity in Gated Recurrent Unit (GRU)-RNNs and Long Short-Term Memory (LSTM)-RNNs to achieve energy-efficient real-time RNN inference. First, we have proposed DeltaRNN, the first-ever RNN accelerator to exploit temporal sparsity in GRU-RNNs. DeltaRNN has achieved 1.2 TOp/s effective throughput with a batch size of 1, which is 15X higher than its related works. Second, we have designed EdgeDRNN to accelerate GRU-RNN edge inference. Compared to DeltaRNN, EdgeDRNN does not rely on on-chip memory to store RNN weights and focuses on reducing off-chip Dynamic Random Access Memory (DRAM) data traffic using a more scalable architecture. EdgeDRNN have realized real-time inference of large GRU-RNNs with submillisecond latency and only 2.3 W wall plug power consumption, achieving 4X higher energy efficiency than commercial edge AI platforms like NVIDIA Jetson Nano. Third, we have used DeltaRNN to realize the first-ever continuous speech recognition sys- tem with the Dynamic Audio Sensor (DAS) as the front-end. The DAS is a neuromorphic event-driven sensor that produces a stream of asyn- chronous events instead of audio data sampled at a fixed sample rate. We have also showcased how an RNN accelerator can be integrated with an event-driven sensor on the same chip to realize ultra-low-power Keyword Spotting (KWS) on the extreme edge. Fourth, we have used EdgeDRNN to control a powered robotic prosthesis using an RNN controller to replace a conventional proportional–derivative (PD) controller. EdgeDRNN has achieved 21 μs latency of running the RNN controller and could maintain stable control of the prosthesis. We have used DeltaRNN and EdgeDRNN to solve these problems to prove their value in solving real-world problems. Finally, we have applied the delta network algorithm on LSTM-RNNs and have combined it with a customized structured pruning method, called Column-Balanced Targeted Dropout (CBTD), to induce spatio-temporal sparsity in LSTM-RNNs. Then, we have proposed another FPGA-based accelerator called Spartus, the first RNN accelerator that exploits spatio- temporal sparsity. Spartus achieved 9.4 TOp/s effective throughput with a batch size of 1, the highest among present FPGA-based RNN accelerators with a power budget around 10 W. Spartus can complete the inference of an LSTM layer having 5 million parameters within 1 μs
    corecore