41 research outputs found

    Bit slicing approaches for variability aware ReRAM CIM macros

    No full text
    Computation-in-Memory accelerators based on resistive switching devices represent a promising approach to realize future information processing systems. These architectures promise orders of magnitudes lower energy consumption for certain tasks, while also achieving higher throughputs than other special purpose hardware such as GPUs, due to their analog computation nature. Due to device variability issues, however, a single resistive switching cell usually does not achieve the resolution required for the considered applications. To overcome this challenge, many of the proposed architectures use an approach called bit slicing, where generally multiple low-resolution components are combined to realize higher resolution blocks. In this paper, we will present an analog accelerator architecture on the circuit level, which can be used to perform Vector-Matrix-Multiplications or Matrix-Matrix-Multiplications. The architecture consists of the 1T1R crossbar array, the optimized select circuitry and an ADC. The components are designed to handle the variability of the resistive switching cells, which is verified through our verified and physical compact model. We then use this architecture to compare different bit slicing approaches and discuss their tradeoffs

    Memristive Devices for Time Domain Compute-in-Memory

    No full text
    Analog compute schemes and compute-in-memory (CIM) have emerged in an effort to reduce the increasing power hunger of convolutional neural networks (CNNs), which exceeds the constraints of edge devices. Memristive device types are a relatively new offering with interesting opportunities for unexplored circuit concepts. In this work, the use of memristive devices in cascaded time-domain CIM (TDCIM) is introduced with the primary goal of reducing the size of fully unrolled architectures. The different effects influencing the determinism in memristive devices are outlined together with reliability concerns. Architectures for binary as well as multibit multiply and accumulate (MAC) cells are presented and evaluated. As more involved circuits offer more accurate compute result, a tradeoff between design effort and accuracy comes into the picture. To further evaluate this tradeoff, the impact of variations on overall compute accuracy is discussed. The presented cells reach an energy/OP of 0.23 fJ at a size of 1.2 μm21.2~{\mu{ }}\text{m}^{2} for binary and 6.04 fJ at 3.2 μm23.2~\mu \text{m}^{2} for 4×44\times 4 bit MAC operations
    corecore