25 research outputs found

    End-to-End Benchmarking of Chiplet-Based In-Memory Computing

    Get PDF
    In-memory computing (IMC)-based hardware reduces latency and energy consumption for compute-intensive machine learning (ML) applications. Several SRAM/RRAM-based IMC hardware architectures to accelerate ML applications have been proposed in the literature. However, crossbar-based IMC hardware poses several design challenges. We first discuss the different ML algorithms recently adopted in the literature. We then discuss the hardware implications of ML algorithms. Next, we elucidate the need for IMC architecture and the different components within a conventional IMC architecture. After that, we introduce the need for 2.5D or chiplet-based architectures. We then discuss the different benchmarking simulators proposed for monolithic IMC architectures. Finally, we describe an end-to-end chiplet-based IMC benchmarking simulator, SIAM

    Towards Efficient In-memory Computing Hardware for Quantized Neural Networks: State-of-the-art, Open Challenges and Perspectives

    Full text link
    The amount of data processed in the cloud, the development of Internet-of-Things (IoT) applications, and growing data privacy concerns force the transition from cloud-based to edge-based processing. Limited energy and computational resources on edge push the transition from traditional von Neumann architectures to In-memory Computing (IMC), especially for machine learning and neural network applications. Network compression techniques are applied to implement a neural network on limited hardware resources. Quantization is one of the most efficient network compression techniques allowing to reduce the memory footprint, latency, and energy consumption. This paper provides a comprehensive review of IMC-based Quantized Neural Networks (QNN) and links software-based quantization approaches to IMC hardware implementation. Moreover, open challenges, QNN design requirements, recommendations, and perspectives along with an IMC-based QNN hardware roadmap are provided

    Hybrid Memristor-CMOS Computer for Artificial Intelligence: from Devices to Systems

    Full text link
    Neuromorphic computing systems, which aim to mimic the function and structure of the human brain, is a promising approach to overcome the limitations of conventional computing systems such as the von-Neumann bottleneck. Recently, memristors and memristor crossbars have been extensively studied for neuromorphic system implementations due to the ability of memristor devices to emulate biological synapses, thus providing benefits such as co-located memory/logic operations and massive parallelism. A memristor is a two-terminal device whose resistance is modulated by the history of external stimulation. The principle of the resistance modulation, or resistance switching, for a typical oxide-based memristor, is based on oxygen vacancy migration in the oxide layer through ion drift and diffusion. When applied in computing systems, the memristor is often formed in a crossbar structure and used to perform vector-matrix multiplication operations. Since the values in the matrix can be stored as the device conductance values of the crossbar array, when an input vector is applied as voltage pulses with different pulse amplitudes or different pulse widths to the rows of the crossbar, the currents or charges collected at the columns of the crossbar correspond to the resulting VMM outputs, following Ohm’s law and Kirchhoff’s current law. This approach makes it possible to use physics to execute direct computing of this data-intensive task, both in-memory and in parallel in a single step. First of all, I will present a comprehensive physical model of the TaOx-based memristor device where the internal parameters including electric field, temperature, and VO concentration are self-consistently solved to accurately describe the device operation. Starting from the initial Forming process, the model quantitatively captures the dynamic RS behavior, and can reliably reproduce Set/Reset cycling in a self-consistent manner. Beyond clarifying the nature of the Forming and Set/Reset processes, a bulk-like doping effect was revealed by the model during Set and supported by experimental results. This phenomenon can lead to linear analog conductance modulation with a large dynamic range, which is very beneficial for low-power neuromorphic computing applications. Second, an integrated memristor/CMOS system consisting of a 54×108 passive memristor crossbar array directly fabricated on a CMOS chip is presented. The system includes all necessary analog/digital circuitry (including analog-digital converters and digital-analog converters), digital buses, and a programmable processor to control the digital and analog components to form a complete hardware system for neuromorphic computing applications. With the fully-integrated and reprogrammable chip, we experimentally demonstrated three popular models – a perceptron network, a sparse coding network, and a bilayer principal component analysis system with an unsupervised feature extraction layer and a supervised classification layer – all on the same chip. Beyond VMM operations, the internal dynamics of memristors allow the system to natively process temporal features in the input data. Specifically, a WOx-based memristor with short-term memory effect caused by spontaneous oxygen vacancy diffusion was utilized to implement a reservoir computing system to process temporal information. The spatial information of a digit image can be converted into streaming inputs fed into the memristor reservoir, leading to 100% accuracy for simple 4×5 digit recognition and 88.1% accuracy for the MNIST data set. The system was also employed for solving other nonlinear tasks such as emulating a second-order nonlinear system.PHDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/155040/1/seulee_1.pd

    New Logic-In-Memory Paradigms: An Architectural and Technological Perspective

    Get PDF
    Processing systems are in continuous evolution thanks to the constant technological advancement and architectural progress. Over the years, computing systems have become more and more powerful, providing support for applications, such as Machine Learning, that require high computational power. However, the growing complexity of modern computing units and applications has had a strong impact on power consumption. In addition, the memory plays a key role on the overall power consumption of the system, especially when considering data-intensive applications. These applications, in fact, require a lot of data movement between the memory and the computing unit. The consequence is twofold: Memory accesses are expensive in terms of energy and a lot of time is wasted in accessing the memory, rather than processing, because of the performance gap that exists between memories and processing units. This gap is known as the memory wall or the von Neumann bottleneck and is due to the different rate of progress between complementary metal-oxide semiconductor (CMOS) technology and memories. However, CMOS scaling is also reaching a limit where it would not be possible to make further progress. This work addresses all these problems from an architectural and technological point of view by: (1) Proposing a novel Configurable Logic-in-Memory Architecture that exploits the in-memory computing paradigm to reduce the memory wall problem while also providing high performance thanks to its flexibility and parallelism; (2) exploring a non-CMOS technology as possible candidate technology for the Logic-in-Memory paradigm

    A Construction Kit for Efficient Low Power Neural Network Accelerator Designs

    Get PDF
    Implementing embedded neural network processing at the edge requires efficient hardware acceleration that couples high computational performance with low power consumption. Driven by the rapid evolution of network architectures and their algorithmic features, accelerator designs are constantly updated and improved. To evaluate and compare hardware design choices, designers can refer to a myriad of accelerator implementations in the literature. Surveys provide an overview of these works but are often limited to system-level and benchmark-specific performance metrics, making it difficult to quantitatively compare the individual effect of each utilized optimization technique. This complicates the evaluation of optimizations for new accelerator designs, slowing-down the research progress. This work provides a survey of neural network accelerator optimization approaches that have been used in recent works and reports their individual effects on edge processing performance. It presents the list of optimizations and their quantitative effects as a construction kit, allowing to assess the design choices for each building block separately. Reported optimizations range from up to 10'000x memory savings to 33x energy reductions, providing chip designers an overview of design choices for implementing efficient low power neural network accelerators

    SPRING: A Sparsity-Aware Reduced-Precision Monolithic 3D CNN Accelerator Architecture for Training and Inference

    Full text link
    CNNs outperform traditional machine learning algorithms across a wide range of applications. However, their computational complexity makes it necessary to design efficient hardware accelerators. Most CNN accelerators focus on exploring dataflow styles that exploit computational parallelism. However, potential performance speedup from sparsity has not been adequately addressed. The computation and memory footprint of CNNs can be significantly reduced if sparsity is exploited in network evaluations. To take advantage of sparsity, some accelerator designs explore sparsity encoding and evaluation on CNN accelerators. However, sparsity encoding is just performed on activation or weight and only in inference. It has been shown that activation and weight also have high sparsity levels during training. Hence, sparsity-aware computation should also be considered in training. To further improve performance and energy efficiency, some accelerators evaluate CNNs with limited precision. However, this is limited to the inference since reduced precision sacrifices network accuracy if used in training. In addition, CNN evaluation is usually memory-intensive, especially in training. In this paper, we propose SPRING, a SParsity-aware Reduced-precision Monolithic 3D CNN accelerator for trainING and inference. SPRING supports both CNN training and inference. It uses a binary mask scheme to encode sparsities in activation and weight. It uses the stochastic rounding algorithm to train CNNs with reduced precision without accuracy loss. To alleviate the memory bottleneck in CNN evaluation, especially in training, SPRING uses an efficient monolithic 3D NVM interface to increase memory bandwidth. Compared to GTX 1080 Ti, SPRING achieves 15.6X, 4.2X and 66.0X improvements in performance, power reduction, and energy efficiency, respectively, for CNN training, and 15.5X, 4.5X and 69.1X improvements for inference

    Simulation and implementation of novel deep learning hardware architectures for resource constrained devices

    Get PDF
    Corey Lammie designed mixed signal memristive-complementary metal–oxide–semiconductor (CMOS) and field programmable gate arrays (FPGA) hardware architectures, which were used to reduce the power and resource requirements of Deep Learning (DL) systems; both during inference and training. Disruptive design methodologies, such as those explored in this thesis, can be used to facilitate the design of next-generation DL systems

    Architecture and Circuit Design Optimization for Compute-In-Memory

    Get PDF
    The objective of the proposed research is to optimize computing-in-memory (CIM) design for accelerating Deep Neural Network (DNN) algorithms. As compute peripheries such as analog-to-digital converter (ADC) introduce significant overhead in CIM inference design, the research first focuses on the circuit optimization for inference acceleration and proposes a resistive random access memory (RRAM) based ADC-free in-memory compute scheme. We comprehensively explore the trade-offs involving different types of ADCs and investigate a new ADC design especially suited for the CIM, which performs the analog shift-add for multiple weight significance bits, improving the throughput and energy efficiency under similar area constraints. Furthermore, we prototype an ADC-free CIM inference chip design with a fully-analog data processing manner between sub-arrays, which can significantly improve the hardware performance over the conventional CIM designs and achieve near-software classification accuracy on ImageNet and CIFAR-10/-100 dataset. Secondly, the research focuses on hardware support for CIM on-chip training. To maximize hardware reuse of CIM weight stationary dataflow, we propose the CIM training architectures with the transpose weight mapping strategy. The cell design and periphery circuitry are modified to efficiently support bi-directional compute. A novel solution of signed number multiplication is also proposed to handle the negative input in backpropagation. Finally, we propose an SRAM-based CIM training architecture and comprehensively explore the system-level hardware performance for DNN on-chip training based on silicon measurement results.Ph.D
    corecore