28 research outputs found
A Heterogeneous In-Memory Computing Cluster for Flexible End-to-End Inference of Real-World Deep Neural Networks
Deployment of modern TinyML tasks on small battery-constrained IoT devices requires high computational energy efficiency. Analog In-Memory Computing (IMC) using non-volatile memory (NVM) promises major efficiency improvements in deep neural network (DNN) inference and serves as on-chip memory storage for DNN weights. However, IMC's functional flexibility limitations and their impact on performance, energy, and area efficiency are not yet fully understood at the system level. To target practical end-to-end loT applications, IMC arrays must be enclosed in heterogeneous programmable systems, introducing new system-level challenges which we aim at addressing in this work. We present a heterogeneous tightly-coupled clustered architecture integrating 8 RISC-V cores, an in-memory computing accelerator (IMA), and digital accelerators. We benchmark the system on a highly heterogeneous workload such as the Bottleneck layer from a MobileNetV2, showing 11.5x performance and 9.5x energy efficiency improvements, compared to highly optimized parallel execution on the cores. Furthermore, we explore the requirements for end-to-end inference of a full mobile-grade DNN (MobileNetV2) in terms of IMC array resources, by scaling up our heterogeneous architecture to a multi-array accelerator. Our results show that our solution, on the end-to-end inference of the MobileNetV2, is one order of magnitude better in terms of execution latency than existing programmable architectures and two orders of magnitude better than state-of-the-art heterogeneous solutions integrating in-memory computing analog cores
Scale up your In-Memory Accelerator: Leveraging Wireless-on-Chip Communication for AIMC-based CNN Inference
Analog In-Memory Computing (AIMC) is emerging as a disruptive paradigm for heterogeneous computing, potentially delivering orders of magnitude better peak performance and efficiency over traditional digital signal processing architectures on Matrix-Vector multiplication. However, to sustain this throughput in real-world applications, AIMC tiles must be supplied with data at very high bandwidth and low latency; this poses an unprecedented pressure on the on-chip communication infrastructure, which becomes the system's performance and efficiency bottleneck. In this context, the performance and plasticity of emerging on-chip wireless communication paradigms provide the required breakthrough to up-scale on-chip communication in large AIMC devices. This work presents a many-tile AIMC architecture with inter-tile wireless communication that integrates multiple heterogeneous computing clusters, embedding a mix of parallel RISC-V cores and AIMC tiles. We perform an extensive design space exploration of the proposed architecture and discuss the benefits of exploiting emerging on-chip communication technologies such as wireless transceivers in the millimeter-wave and terahertz band
Graphene-based Wireless Agile Interconnects for Massive Heterogeneous Multi-chip Processors
The main design principles in computer architecture have recently shifted from a monolithic scaling-driven approach to the development of heterogeneous architectures that tightly co-integrate multiple specialized processor and memory chiplets. In such data-hungry multi-chip architectures, current Networksin- Package (NiPs) may not be enough to cater to their heterogeneous and fast-changing communication demands. This position paper makes the case for wireless in-package networking as the enabler of efficient and versatile wired-wireless interconnect fabrics for massive heterogeneous processors. To that end, the use of graphene-based antennas and transceivers with unique frequency-beam reconfigurability in the terahertz band is proposed. The feasibility of such a wireless vision and the main research challenges towards its realization are analyzed from the technological, communications, and computer architecture perspectives
Spontaneous sparse learning for PCM-based memristor neural networks
Neural networks trained by backpropagation have achieved tremendous successes on numerous intelligent tasks. However, naive gradient-based training and updating methods on memristors impede applications due to intrinsic material properties. Here, we built a 39nm 1Gb phase change memory (PCM) memristor array and quantified the unique resistance drift effect. On this basis, spontaneous sparse learning (SSL) scheme that leverages the resistance drift to improve PCM-based memristor network training is developed. During training, SSL regards the drift effect as spontaneous consistency-based distillation process that reinforces the array weights at the high-resistance state continuously unless the gradient-based method switches them to low resistance. Experiments show that the SSL not only helps the convergence of network with better performance and sparsity controllability without additional computation in handwritten digit classification. This work promotes the learning algorithms with the intrinsic properties of memristor devices, opening a new direction for development of neuromorphic computing chips
Impact of conductance drift on multi-PCM synaptic architectures
In-memory computing with nanoscale memristive devices such as phase-change memory (PCM) has emerged as an alternative to conventional von Neumann systems to train deep neural networks (DNN) where a synaptic weight is represented by the device conductance. However, PCM devices exhibit temporal evolution of the conductance values referred to as the conductance drift, which poses challenges for maintaining synaptic weights reliably. Based on the mean behavior of 10,000 GST-based PCM devices, we observe that the drift coefficient is dependent on the conductance value. Moreover, we show that PCM drift is re-initialized and the drift history is erased after the application of even partial SET pulses. This is regardless of how much the device has drifted. With models capturing these features, we show that drift has a detrimental impact on training DNNs, but drift resilience can be significantly improved with a recently proposed multi-PCM synaptic architecture
Multi-ReRAM synapses for artificial neural network training
Metal-oxide-based resistive memory devices (ReRAM) are being actively researched as synaptic elements of neuromorphic co-processors for training deep neural networks (DNNs). However, device-level non-idealities are posing significant challenges. In this work we present a multi-ReRAM-based synaptic architecture with a counter-based arbitration scheme that shows significant promise. We present a 32×2 crossbar array comprising Pt/HfO2/Ti/TiN-based ReRAM devices with multi-level storage capability and bidirectional conductance response. We study the device characteristics in detail and model the conductance response. We show through simulations that an in-situ trained DNN with a multi-ReRAM synaptic architecture can perform handwritten digit classification task with high accuracies, only 2% lower than software simulations using floating point precision, despite the stochasticity, nonlinearity and large conductance change granularity associated with the devices. Moreover, we show that a network can achieve accuracies > 80% even with just binary ReRAM devices with this architecture
Nonvolatile memory crossbar arrays for non-von neumann computing
In the conventional vonNeumann (VN) architecture, data?both operands and operations to be performed on those operands?makes its way frommemory to a dedicated central processor.With the end of Dennard scaling and the resulting slowdown in Moore��s law, the IT industry is turning its attention to non-Von Neumann (non-VN) architectures, and in particular, to computing architectures motivated by the human brain. One family of such non-VN computing architectures is artificial neural networks (ANNs). To be competitive with conventional architectures, such ANNs will need to be massively parallel, with many neurons interconnected using a vast number of synapses,working together efficiently to compute problems of significant interest. Emerging nonvolatile memories, such as phase-change memory (PCM) or resistive memory (RRAM), could prove very helpful for this, by providing inherently analog synaptic behavior in densely packed crossbar arrays suitable for on-chip learning.We discuss our recent research investigating the characteristics needed from such nonvolatile memory elements for implementation of high-performance ANNs. We describe experiments on a 3-layer perceptron network with 164,885 synapses, each implemented using 2 NVM devices. A variant of the backpropagation weight update rule suitable for NVM+selector crossbar arrays is shown and implemented in a mixed hardware?software experiment using an available, non-crossbar PCM array. Extensive tolerancing results are enabled by precise matching of our NN simulator to the conditions of the hardware experiment. This tolerancing shows clearly that NVM-based neural networks are highly resilient to random effects (NVM variability, yield, and stochasticity), but highly sensitive to gradient effects that act to steer all synaptic weights. Simulations of ANNs with both PCM and non-filamentary bipolar RRAM based on Pr1?xCaxMnO3 (PCMO) are also discussed. PCM exhibits smooth, slightly nonlinear partial-SET (conductance increase) behavior, but the asymmetry of its abrupt RESET introduces difficulties; in contrast, PCMO offers continuous conductance change in both directions, but exhibits significant nonlinearities (degree of conductance change depends strongly on absolute conductance). The quantitative impacts of these issues on ANN performance (classification accuracy) are discussed. ? Springer (India) Pvt. Ltd. 2017.11Nscopu
A phase-change memory model for neuromorphic computing
Phase-change memory (PCM) is an emerging non-volatile memory technology that is based on the reversible and rapid phase transition between the amorphous and crystalline phases of certain phase-change materials. The ability to alter the conductance levels in a controllable way makes PCM devices particularly well-suited for synaptic realizations in neuromorphic computing. A key attribute that enables this application is the progressive crystallization of the phase-change material and subsequent increase in device conductance by the successive application of appropriate electrical pulses. There is significant inter-and intra-device randomness associated with this cumulative conductance evolution, and it is essential to develop a statistical model to capture this. PCM also exhibits a temporal evolution of the conductance values (drift), which could also influence applications in neuromorphic computing. In this paper, we have developed a statistical model that describes both the cumulative conductance evolution and conductance drift. This model is based on extensive characterization work on 10 000 memory devices. Finally, the model is used to simulate the supervised training of both spiking and non-spiking artificial neuronal networks. Published by AIP Publishing