149 research outputs found

    Thermal profiling in CMOS/memristor hybrid architectures

    Get PDF
    CMOS/memristor hybrid architectures combine conventional CMOS processing elements with thin-film memristor-based crossbar circuits for high-density reconfigurable systems. These architectures have received an explosive growth in research over the past few years due to the first practical demonstration of a thin-film memristor in 2008. The reliability and lifetimes of both the CMOS and memristor partitions of these architectures are severely affected by temperature variations across the chip. Therefore, it is expected that dynamic thermal management (DTM) mechanisms will be needed to improve their reliability and lifetime. This thesis explores one aspect of DTM--thermal profiling--in a CMOS/memristor memory architecture. A temperature sensing resistive random access memory (TSRRAM) was designed. Temperature information is extracted from the TSRRAM by measuring the write time of thin-film memristors. Active and passive sensing mechanisms are also introduced as means for DTM algorithms to determine the thermal profile of the chip. Crosstherm, a simulation framework, was developed to analyze the effects of temperature variations in CMOS/memristor architectures. The TSRRAM design was simulated using the Crosstherm framework for four CMOS processor benchmarks. Passive sensing produced a mean absolute sensor error across all benchmarks of 2.14 K. The size of the DTM unit\u27s memory was also shown to have a significant impact on the accuracy of extracted thermal data during passive sensing. Active sensing was also demonstrated to show the effect of dynamic adjustment of sensor resolution on the accuracy of hotspot temperature estimations

    TSV placement optimization for liquid cooled 3D-ICs with emerging NVMs

    Get PDF
    Three dimensional integrated circuits (3D-ICs) are a promising solution to the performance bottleneck in planar integrated circuits. One of the salient features of 3D-ICs is their ability to integrate heterogeneous technologies such as emerging non-volatile memories (NVMs) in a single chip. However, thermal management in 3D-ICs is a significant challenge, owing to the high heat flux (~ 250 W/cm2). Several research groups have focused either on run-time or design-time mechanisms to reduce the heat flux and did not consider 3D-ICs with heterogeneous stacks. The goal of this work is to achieve a balanced thermal gradient in 3D-ICs, while reducing the peak temperatures. In this research, placement algorithms for design-time optimization and choice of appropriate cooling mechanisms for run-time modulation of temperature are proposed. Specifically, an architectural framework which introduce weight-based simulated annealing (WSA) algorithm for thermal-aware placement of through silicon vias (TSVs) with inter-tier liquid cooling is proposed for design-time. In addition, integrating a dedicated stack of emerging NVMs such as RRAM, PCRAM and STTRAM, a run-time simulation framework is developed to analyze the thermal and performance impact of these NVMs in 3D-MPSoCs with inter-tier liquid cooling. Experimental results of WSA algorithm implemented on MCNC91 and GSRC benchmarks demonstrate up to 11 K reduction in the average temperature across the 3D-IC chip. In addition, power density arrangement in WSA improved the uniformity by 5%. Furthermore, simulation results of PARSEC benchmarks with NVM L2 cache demonstrates a temperature reduction of 12.5 K (RRAM) compared to SRAM in 3D-ICs. Especially, RRAM has proved to be thermally efficient replacement for SRAM with 34% lower energy delay product (EDP) and 9.7 K average temperature reduction

    Thermal Heating in ReRAM Crossbar Arrays: Challenges and Solutions

    Full text link
    Increasing popularity of deep-learning-powered applications raises the issue of vulnerability of neural networks to adversarial attacks. In other words, hardly perceptible changes in input data lead to the output error in neural network hindering their utilization in applications that involve decisions with security risks. A number of previous works have already thoroughly evaluated the most commonly used configuration - Convolutional Neural Networks (CNNs) against different types of adversarial attacks. Moreover, recent works demonstrated transferability of the some adversarial examples across different neural network models. This paper studied robustness of the new emerging models such as SpinalNet-based neural networks and Compact Convolutional Transformers (CCT) on image classification problem of CIFAR-10 dataset. Each architecture was tested against four White-box attacks and three Black-box attacks. Unlike VGG and SpinalNet models, attention-based CCT configuration demonstrated large span between strong robustness and vulnerability to adversarial examples. Eventually, the study of transferability between VGG, VGG-inspired SpinalNet and pretrained CCT 7/3x1 models was conducted. It was shown that despite high effectiveness of the attack on the certain individual model, this does not guarantee the transferability to other models.Comment: 18 page

    Reconfigurable RRAM-based computing: A Case study for reliability enhancement

    Get PDF
    Emerging hybrid-CMOS nanoscale devices and architectures offer greater degree of integration and performance capabilities. However, the high power densities, hard error frequency, process variations, and device wearout affect the overall system reliability. Reactive design techniques, such as redundancy, account for component failures by mitigating them to prevent system failures. These techniques incur high area and power overhead. This research focuses on exploring hybrid CMOS/Resistive RAM (RRAM) architectures that enhance the system reliability by performing computation in RRAM cache whenever CMOS logic units fail, essentially masking the area overhead of redundant logic when not in use. The proposed designs are validated using the Gem5 performance simulator and McPAT power simulator running single-core SPEC2006 benchmarks and multi-core PARSEC benchmarks. The simulation results are used to evaluate the efficacy of reliability enhancement techniques using RRAM. The average runtime when using RRAM for functional unit replacement was between ~1.5 and ~2.5 times longer than the baseline for a single-core architecture, ~1.25 and ~2 times longer for an 8-core architecture, and ~1.2 and ~1.5 times longer for a 16-core architecture. Average energy consumption when using RRAM for functional unit replacement was between ~2 and ~5 times more than the baseline for a single-core architecture, and ~1.25 and ~2.75 times more for multi-core architectures. The performance degradation and energy consumption increase is justified by the prevention of system failure and enhanced reliability. Overall, the proposed architecture shows promise for use in multi-core systems. Average performance degradation decreases as more cores are used due to more total functional units being available, preventing a slow RRAM functional unit from becoming a bottleneck

    FeFET-based Binarized Neural Networks Under Temperature-dependent Bit Errors

    Get PDF
    Ferroelectric FET (FeFET) is a highly promising emerging non-volatile memory (NVM) technology, especially for binarized neural network (BNN) inference on the low-power edge. The reliability of such devices, however, inherently depends on temperature. Hence, changes in temperature during run time manifest themselves as changes in bit error rates. In this work, we reveal the temperature-dependent bit error model of FeFET memories, evaluate its effect on BNN accuracy, and propose countermeasures. We begin on the transistor level and accurately model the impact of temperature on bit error rates of FeFET. This analysis reveals temperature-dependent asymmetric bit error rates. Afterwards, on the application level, we evaluate the impact of the temperature-dependent bit errors on the accuracy of BNNs. Under such bit errors, the BNN accuracy drops to unacceptable levels when no countermeasures are employed. We propose two countermeasures: (1) Training BNNs for bit error tolerance by injecting bit flips into the BNN data, and (2) applying a bit error rate assignment algorithm (BERA) which operates in a layer-wise manner and does not inject bit flips during training. In experiments, the BNNs, to which the countermeasures are applied to, effectively tolerate temperature-dependent bit errors for the entire range of operating temperature

    A Construction Kit for Efficient Low Power Neural Network Accelerator Designs

    Get PDF
    Implementing embedded neural network processing at the edge requires efficient hardware acceleration that couples high computational performance with low power consumption. Driven by the rapid evolution of network architectures and their algorithmic features, accelerator designs are constantly updated and improved. To evaluate and compare hardware design choices, designers can refer to a myriad of accelerator implementations in the literature. Surveys provide an overview of these works but are often limited to system-level and benchmark-specific performance metrics, making it difficult to quantitatively compare the individual effect of each utilized optimization technique. This complicates the evaluation of optimizations for new accelerator designs, slowing-down the research progress. This work provides a survey of neural network accelerator optimization approaches that have been used in recent works and reports their individual effects on edge processing performance. It presents the list of optimizations and their quantitative effects as a construction kit, allowing to assess the design choices for each building block separately. Reported optimizations range from up to 10'000x memory savings to 33x energy reductions, providing chip designers an overview of design choices for implementing efficient low power neural network accelerators
    • …
    corecore