79 research outputs found

    Embracing the Unreliability of Memory Devices for Neuromorphic Computing

    Full text link
    The emergence of resistive non-volatile memories opens the way to highly energy-efficient computation near- or in-memory. However, this type of computation is not compatible with conventional ECC, and has to deal with device unreliability. Inspired by the architecture of animal brains, we present a manufactured differential hybrid CMOS/RRAM memory architecture suitable for neural network implementation that functions without formal ECC. We also show that using low-energy but error-prone programming conditions only slightly reduces network accuracy

    Always-On 674uW @ 4GOP/s Error Resilient Binary Neural Networks with Aggressive SRAM Voltage Scaling on a 22nm IoT End-Node

    Full text link
    Binary Neural Networks (BNNs) have been shown to be robust to random bit-level noise, making aggressive voltage scaling attractive as a power-saving technique for both logic and SRAMs. In this work, we introduce the first fully programmable IoT end-node system-on-chip (SoC) capable of executing software-defined, hardware-accelerated BNNs at ultra-low voltage. Our SoC exploits a hybrid memory scheme where error-vulnerable SRAMs are complemented by reliable standard-cell memories to safely store critical data under aggressive voltage scaling. On a prototype in 22nm FDX technology, we demonstrate that both the logic and SRAM voltage can be dropped to 0.5Vwithout any accuracy penalty on a BNN trained for the CIFAR-10 dataset, improving energy efficiency by 2.2X w.r.t. nominal conditions. Furthermore, we show that the supply voltage can be dropped to 0.42V (50% of nominal) while keeping more than99% of the nominal accuracy (with a bit error rate ~1/1000). In this operating point, our prototype performs 4Gop/s (15.4Inference/s on the CIFAR-10 dataset) by computing up to 13binary ops per pJ, achieving 22.8 Inference/s/mW while keeping within a peak power envelope of 674uW - low enough to enable always-on operation in ultra-low power smart cameras, long-lifetime environmental sensors, and insect-sized pico-drones.Comment: Submitted to ISICAS2020 journal special issu

    FeFET-based Binarized Neural Networks Under Temperature-dependent Bit Errors

    Get PDF
    Ferroelectric FET (FeFET) is a highly promising emerging non-volatile memory (NVM) technology, especially for binarized neural network (BNN) inference on the low-power edge. The reliability of such devices, however, inherently depends on temperature. Hence, changes in temperature during run time manifest themselves as changes in bit error rates. In this work, we reveal the temperature-dependent bit error model of FeFET memories, evaluate its effect on BNN accuracy, and propose countermeasures. We begin on the transistor level and accurately model the impact of temperature on bit error rates of FeFET. This analysis reveals temperature-dependent asymmetric bit error rates. Afterwards, on the application level, we evaluate the impact of the temperature-dependent bit errors on the accuracy of BNNs. Under such bit errors, the BNN accuracy drops to unacceptable levels when no countermeasures are employed. We propose two countermeasures: (1) Training BNNs for bit error tolerance by injecting bit flips into the BNN data, and (2) applying a bit error rate assignment algorithm (BERA) which operates in a layer-wise manner and does not inject bit flips during training. In experiments, the BNNs, to which the countermeasures are applied to, effectively tolerate temperature-dependent bit errors for the entire range of operating temperature

    CUTIE: Beyond PetaOp/s/W Ternary DNN Inference Acceleration with Better-than-Binary Energy Efficiency

    Get PDF
    We present a 3.1 POp/s/W fully digital hardware accelerator for ternary neural networks. CUTIE, the Completely Unrolled Ternary Inference Engine, focuses on minimizing non-computational energy and switching activity so that dynamic power spent on storing (locally or globally) intermediate results is minimized. This is achieved by 1) a data path architecture completely unrolled in the feature map and filter dimensions to reduce switching activity by favoring silencing over iterative computation and maximizing data re-use, 2) targeting ternary neural networks which, in contrast to binary NNs, allow for sparse weights which reduce switching activity, and 3) introducing an optimized training method for higher sparsity of the filter weights, resulting in a further reduction of the switching activity. Compared with state-of-the-art accelerators, CUTIE achieves greater or equal accuracy while decreasing the overall core inference energy cost by a factor of 4.8x-21x
    corecore