2 research outputs found

    BLADE: A BitLine Accelerator for Devices on the Edge

    Get PDF
    The increasing ubiquity of edge devices in the consumer market, along with their ever more computationally expensive workloads, necessitate corresponding increases in computing power to support such workloads. In-memory computing is attractive in edge devices as it reuses preexisting memory elements, thus limiting area overhead. Additionally, in-SRAM Computing (iSC) efficiently performs computations on spatially local data found in a variety of emerging edge device workloads. We therefore propose, implement, and benchmark BLADE, a BitLine Accelerator for Devices on the Edge. BLADE is an iSC architecture that can perform massive SIMD-like complex operations on hundreds to thousands of operands simultaneously. We implement BLADE in 28nm CMOS and demonstrate its functionality down to 0.6V, lower than any conventional state-of-the-art iSC architecture. We also benchmark BLADE in conjunction with a full Linux software stack in the gem5 architectural simulator, providing a robust demonstration of its performance gain in comparison to an equivalent embedded processor equipped with a NEON SIMD co-processor. We benchmark BLADE with three emerging edge device workloads, namely cryptography, high efficiency video coding, and convolutional neural networks, and demonstrate 4x, 6x, and 3x performance improvement, respectively, in comparison to a baseline CPU/NEON processor at an equivalent power budget

    High-Density 4T SRAM Bitcell in 14-nm 3-D CoolCube Technology Exploiting Assist Techniques

    Get PDF
    International audienceIn this paper, we present a high-density four-transistor (4T) static random access memory (SRAM) bitcell design for 3-D CoolCube technology platform based on 14-nm fully depleted-silicon on insulator MOS transistors to show the compatibility between the 4T SRAM and the 3-D design and the considerable density gain that they can achieve when combined. The 4T SRAM bitcell has been characterized to investigate the critical operations in terms of stability (retention and read) taking into account the post-layout parasitic elements. Thus, failure mechanisms are exposed and explained. Based on this paper, a data-dependent dynamic back-biasing scheme improving the bitcell stability is developed. A specific read-assist circuit is also proposed in order to enable a large number of bitcells per column in a memory array. Finally, the designed bitcell offers up to 30% area gain compared to a planar six-transistor SRAM bitcell in the same technology node
    corecore